
Henry
Software Developer
Skills

See my services


Work experience
Data Engineer / Data Scientist
Reddit • Full-time
Jan 2021 - Present • 5 yrs 4 mos
Designed and implemented large-scale data pipelines using Python, Spark, and Airflow that process 10TB+ of daily user interaction data from Reddit’s platform, improving data freshness by 65%. Built machine learning models in Python (scikit-learn, XGBoost, PyTorch) to detect toxic content, spam, and bot activity, reducing harmful content exposure by 42% and increasing user retention. Developed real-time analytics dashboards and recommendation features using Golang microservices and Kafka, enabling subreddit moderators and the product team to make data-driven decisions at scale. Optimized query performance on Reddit’s Presto/Trino + Hive data lake, reducing average dashboard load time from 18s to under 3s. Collaborated cross-functionally with Product, Engineering, and Trust & Safety teams to define key metrics (DAU, engagement, retention) and A/B test frameworks that drove measurable product improvements. Mentored 4 junior data engineers on best practices in code optimization, testing, and production deployment using Rust for performance-critical components. Conducted deep-dive analyses on user behavior using SQL + Python, uncovering insights that influenced major platform changes (e.g., feed ranking improvements and community health initiatives).