I will help you take the correct business decisions by scraping & analyzing Reddit posts & comments; here is an example workflow based on my last project, finding out the fastest way for freelancers to find their first client:
- Decide on what information to extract: field, outreach method, time to first client.
- Decide from what subreddits: r/freelance, r/webdev... (here, for example, I removed r/smallbusiness because it included stories about business owners, not freelancers).
- Decide on what time to include: included only posts after 2024 since methods for finding freelance work rapidly evolve.
- Downloading the relevant archives for those subreddits.
- Setting up a wide filter that removes some of the irrelevant data.
- Setting up a second filter that uses Gemini 3.0 Pro, which keeps only relevant data and extracts the appropriate fields.
- Normalizing keywords (people write sometimes "web dev," sometimes "web development").
- Extracting statistics.
For this project, one conclusion was, for example, that in-person cold outreach was a very successful method for new freelancers with a median time to first client of 1.5 days.