a
archu18

Archana S

@archu18

Data Engineer

India
English, Hindi, Malayalam
About me
I am a Data Engineer with expert proficiency in building scalable Lakehouse architectures using Databricks, PySpark, and Delta Lake. I have a proven track record of orchestrating end-to-end ETL/ELT pipelines processing hundreds of GBs of data daily. I am deeply experienced in cloud infrastructure, data modeling, and automating data quality frameworks to drive business intelligence.... Read more

Skills

a
archu18
Archana S
Offline • 

See my services

Resume Writing
I will review and improve your data engineering resume
Programming & Tech
I will optimize sql queries and fix database issues

Work experience

Deloitte

Data Engineer

Deloitte • Full-time

Mar 2026 - Present3 mos

Core Responsibility: Design, develop, and optimize scalable Databricks-based data engineering solutions for enterprise-wide analytics platforms, enabling real-time data integration, transformation, and consumption across multiple business functions. Project: EDH Core Data Platform • Developed and maintained scalable data ingestion and transformation pipelines using Databricks, PySpark, and SQL for enterprise-wide analytics initiatives. • Implemented data transformation and validation processes across Bronze, Silver, and Gold layers to ensure data quality and consistency. • Optimized PySpark and SQL workloads, improving pipeline performance and supporting growing data volumes. • Collaborated with business stakeholders to translate requirements into scalable data engineering solutions. • Supported data governance, metadata management, and monitoring processes to enhance platform reliability and operational efficiency. • Contributed to cloud-based data modernization initiatives, enabling self-service analytics and data-driven decision-making across the organization.

Technologies

Data Engineer

Technologies • Full-time

Jul 2022 - Feb 20263 yrs 7 mos

Core Responsibility: Lead the architecture and development of cloud-native data platforms, migrating legacy systems to Databricks and optimizing pipelines for high-volume data ingestion. Project 1: Telecom Data Lakehouse & Revenue Analytics (Databricks/PySpark) • Architected a centralized Data Lakehouse solution on Databricks, ingesting semi-structured Call Detail Records (CDRs) and telemetry data to support critical revenue reporting. • Engineered highly optimized PySpark pipelines to process over 250 GB of streaming and batch data daily, reducing data latency from 24 hours to near real-time. • Implemented Delta Lake architecture (Bronze/Silver/Gold layers) to standardize raw telecom data, ensuring ACID transactions and schema enforcement across the data lifecycle. • Optimized complex SQL transformation logic within the silver layer, reducing query execution time. • Developed automated Python scripts to validate data integrity between source systems and the Data Lake, identifying and resolving anomalies in a 5 TB historical dataset. Project 2: Intelligent Data Agents & Quality Framework (Python/Cloud) • Designed the backend data infrastructure for AI-driven Data Agents, creating robust pipelines that aggregate unstructured data from diverse sources into a structured Knowledge Base. • Built a comprehensive Data Quality Framework using Python, automating schema validation and anomaly detection which increased the reliability of analytical outputs by 20%. • Modeled complex business entities to support context-aware querying, ensuring scalability for the agents to handle increasing request loads. • Collaborated with data scientists to translate business logic into efficient data access layers, accelerating the deployment of intelligent agents. Project 3: Campaign Management System & Migration • Led the end-to-end migration of a campaign management database to the cloud, writing custom PySpark scripts to transfer and transform 300 GB of data with zero data loss.