s
soumya_saxena_

Soumya S

@soumya_saxena_

Lead Data Engineer

India
English, Hindi
About me
As a Senior Data Engineer at NAB, I leverage my expertise in cloud computing and data engineering to design, develop, and maintain scalable data pipelines. I optimize complex queries and dashboards using AWS cloud, Pyspark, SQL, and Python, ensuring timely and accurate data delivery.... Read more

Skills

s
soumya_saxena_
Soumya S
Offline • 

See my services

Data Warehouse
I will build etl data pipelines using python, pyspark, sql and AWS

Portfolio

Work experience

McKinsey_& Company

Lead Data Engineer (Lancesoft)

McKinsey & Company • Full-time

May 2025 - Present1 yr

Designed and deployed a low-latency real-time ingestion pipeline (AWS API Gateway → Kinesis Firehose → S3 → Snowflake) for multi-source event data (Zoom, Cvent, Salesforce, Mulesoft, Splashthat), enabling a unified Event360 platform tracking 500+ global events annually. Engineered data normalization, deduplication, and cleansing workflows, improving accuracy, reducing duplicates by 30%, and ensuring seamless downstream analytics. Developed a dynamic event capture mechanism to ingest 100% of incoming event streams, increasing coverage by 25%. Resolved data quality issues by analyzing upstream sources, correcting pipeline logic, and implementing automated validation and reconciliation in Snowflake. Migrated Snowflake authentication for Spark connectors in Glue from legacy username/password to secure OAuth, enhancing compliance and security. Enhanced Clientlink datasets with new business-critical columns, updating Glue scripts, Athena schemas, and Snowflake procedures/tables, improving reporting and decision-making.

Senior Data Engineer

NAB • Full-time

Mar 2023 - May 20252 yrs 2 mos

- Engineered and maintained scalable real-time and batch data pipelines using AWS Glue, Spark, Airflow and Kinesis, contributing to customer behavior analytics and personalization initiatives. - Engineered an end-to-end data pipeline to collect, store, and process realtime service outage data from CloudWatch and IT teams, utilized AWS Glue for ETL transformation, executed Athena queries for actionable insights, and automated monthly reporting, reducing manual tasks by 50% and enabling faster decision-making. - Designed distributed data workflows for identity resolution and fraud detection by integrating relational and NoSQL databases, enhancing customer targeting and lifetime value strategies. - Revamped ETL processes and optimized SQL queries, reducing execution time by 15%.

Amazon

Data Engineer

Amazon • Full-time

Feb 2020 - Feb 20233 yrs

Implemented AWS-based data solutions (Glue, Athena, Redshift, S3), reducing infrastructure costs by 20% using Parquet format and optimized query strategies. Optimized Redshift data models by using sort & distribution keys, reducing query execution time by 30% and improving storage efficiency for analytical workloads. Developed the Dwell Time Metric by leveraging clickstream data and automated ETL workflows, enabling deeper insights into customer engagement. Supported new marketplace launches by integrating and validating large datasets via Python and SQL ETL pipelines, ensuring accurate and timely data delivery. Improved system scalability and cost-efficiency by implementing serverless architectures, auto-scaling, and right-sizing AWS resources.