Data extraction from PDFs, government portals and scanned documents
India
English, Hindi
About me
I turn inaccessible data into structured datasets. My specialty: scanned PDFs, image-based documents, and government portals with CAPTCHAs.
Recent: I led an AltNews project digitising 12.8 lakh voter records from scanned ECI electoral roll PDFs - image-only files behind CAPTCHA-protected downloads. The pipeline I built powers published investigations.
What I deliver: PDF to Excel/CSV, web scraping, data cleaning, statistical analysis, public-records extraction.
Send me a sample - I'll tell you within a day what I can deliver and how fast.... Read more