
Soumya S
Lead Data Engineer
Skills

See my services

Portfolio
Work experience
Lead Data Engineer (Lancesoft)
McKinsey & Company • Full-time
May 2025 - Present • 1 yr
Designed and deployed a low-latency real-time ingestion pipeline (AWS API Gateway → Kinesis Firehose → S3 → Snowflake) for multi-source event data (Zoom, Cvent, Salesforce, Mulesoft, Splashthat), enabling a unified Event360 platform tracking 500+ global events annually. Engineered data normalization, deduplication, and cleansing workflows, improving accuracy, reducing duplicates by 30%, and ensuring seamless downstream analytics. Developed a dynamic event capture mechanism to ingest 100% of incoming event streams, increasing coverage by 25%. Resolved data quality issues by analyzing upstream sources, correcting pipeline logic, and implementing automated validation and reconciliation in Snowflake. Migrated Snowflake authentication for Spark connectors in Glue from legacy username/password to secure OAuth, enhancing compliance and security. Enhanced Clientlink datasets with new business-critical columns, updating Glue scripts, Athena schemas, and Snowflake procedures/tables, improving reporting and decision-making.
Senior Data Engineer
NAB • Full-time
Mar 2023 - May 2025 • 2 yrs 2 mos
- Engineered and maintained scalable real-time and batch data pipelines using AWS Glue, Spark, Airflow and Kinesis, contributing to customer behavior analytics and personalization initiatives. - Engineered an end-to-end data pipeline to collect, store, and process realtime service outage data from CloudWatch and IT teams, utilized AWS Glue for ETL transformation, executed Athena queries for actionable insights, and automated monthly reporting, reducing manual tasks by 50% and enabling faster decision-making. - Designed distributed data workflows for identity resolution and fraud detection by integrating relational and NoSQL databases, enhancing customer targeting and lifetime value strategies. - Revamped ETL processes and optimized SQL queries, reducing execution time by 15%.
Data Engineer
Amazon • Full-time
Feb 2020 - Feb 2023 • 3 yrs
Implemented AWS-based data solutions (Glue, Athena, Redshift, S3), reducing infrastructure costs by 20% using Parquet format and optimized query strategies. Optimized Redshift data models by using sort & distribution keys, reducing query execution time by 30% and improving storage efficiency for analytical workloads. Developed the Dwell Time Metric by leveraging clickstream data and automated ETL workflows, enabling deeper insights into customer engagement. Supported new marketplace launches by integrating and validating large datasets via Python and SQL ETL pipelines, ensuring accurate and timely data delivery. Improved system scalability and cost-efficiency by implementing serverless architectures, auto-scaling, and right-sizing AWS resources.