The pain of data projects
The worst feeling is when there is no API to get the data for your new project. Sometimes you get a glimmer of hope. You find a company with the data you need but then you find out the data costs a lot of money and can get outdated very quickly. I know these feelings because that’s what I faced.
I once started a project to build a search engine to find more software engineering jobs. The MVP had ~10 data sources. But I needed a lot more data. I didn’t have a lot of money to purchase jobs data. And I couldn’t find a reliable API that fit my needs without excessive rate limits.
No API. No Problem!
That’s when I dove deep into the world of web scraping. I wanted at least 2,000 jobs for the project. For 6 months I had a singular focus on learning how to scrape websites. At first it was difficult. Every website was unique. There was not a single technique that always worked.
However, with every website I scraped, I wrote down the working technique. Over time instead of getting right to coding and banging my head against the wall trying to get it to work, I would run a series of quick tests to determine if it was possible to get the data. I saved so much time, and my success rate went up 10 fold.
Sharing the techniques with you
At the end of my deep dive I counted up the number of jobs I was able to acquire in my database, to my surprise I exceeded my expectations by a long shot. I scraped over 70,000 job descriptions from more than 2,500 job boards!
With these techniques, I went on to scrape 50,000 digital marketing companies from Clutch. I even scraped 3,000 companies from Crunchbase, a top site for B2B data.
Through that time I would regularly help those in the r/webscraping subreddit. I noticed people would have trouble scraping a website but when I applied my system, I got the data. That’s when I noticed I had a winning method I could share with other developers to level up their web scraping skills.
Web scraping knowledge is one of the most valuable assets you could own, in an age where data is more precious than gold. Once you learn these techniques you’ll save a bunch of time and money, allowing you to deliver a high quality project.
What you will learn
In this video course, I’ll share with you the same framework of steps I use to successfully public website data even if they are protected by anti-bot detection software.
- How to scan websites for the best web scraping techniques
- How to pick the right storage option for your data,
- How to determine the what option to use to scale up your operation
What is in this course
Here is the table of contents for what to expect.
- Intro: Who am I
- 4S Method: Scan, Scrape, Store, Scale
- Checklist for scanning a website
- Scanning easy website: Wikipedia.com
- Scanning easy website: Amazon.com
- Scanning easy website: Nike.com
- Scanning hard website: Airtable
- Scanning hard website: Crunchbase.com
- Scanning hard website: Linkedin.com
- Scanning hard website: Shein.com
- Tools you can use to scrape
- Where you can store your data
- Options for scaling your operation based on project size
- Let’s Connect
Frequently Asked Questions
Who is this course for?
- Programmers who have some experience with web scraping
- Programmers who want to scrape data from public websites
- Programmers who want learn advanced techniques in web scraping
- Programmers familiar with Python programming language
- Programmers who want a framework for analyzing the best way to scrape data
What is NOT included in the course?
- Code snippets
- How to scrape websites behind a login page
Why should I listen to you? Who are you?
My name is Steven Natera. I am a software engineer with a decade of experience. I’ve used these techniques to scrape over 50,000 job postings for my job board search engine product at TechStackLeads.com
In my career, I worked for both small and large companies. In my previous role as an SRE at Twitter, I was on a team that managed over 500k servers, processing more than 5 million requests per second.
What is your refund policy?
If you're not 100% satisfied with the purchase, or it's not what you were expecting, just reply to the download email, or contact me within 30 days, and you'll get a full refund. No questions asked.