Web Scraping for Beginners

33 ratings

The pain of data projects

The worst feeling is when there is no API to get the data for your new project. Sometimes you get a glimmer of hope. You find a company with the data you need but then you find out the data costs a lot of money and can get outdated very quickly. I know these feelings because that’s what I faced.

I once started a project to build a search engine to find more software engineering jobs. The MVP had ~10 data sources. But I needed a lot more data. I didn’t have a lot of money to purchase jobs data. And I couldn’t find a reliable API that fit my needs without excessive rate limits.

No API. No Problem!

That’s when I dove deep into the world of web scraping. I wanted at least 2,000 jobs for the project. For 6 months I had a singular focus on learning how to scrape websites. At first it was difficult. Every website was unique. There was not a single technique that always worked.

However, with every website I scraped, I wrote down the working technique. Over time instead of getting right to coding and banging my head against the wall trying to get it to work, I would run a series of quick tests to determine if it was possible to get the data. I saved so much time, and my success rate went up 10 fold.

Sharing the techniques with you

At the end of my deep dive I counted up the number of jobs I was able to acquire in my database, to my surprise I exceeded my expectations by a long shot. I scraped over 70,000 job descriptions from more than 2,500 job boards!

With these techniques, I went on to scrape 50,000 digital marketing companies from Clutch. I even scraped 3,000 companies from Crunchbase, a top site for B2B data.

Through that time I would regularly help those in the r/webscraping subreddit. I noticed people would have trouble scraping a website but when I applied my system, I got the data. That’s when I noticed I had a winning method I could share with other developers to level up their web scraping skills.

Web scraping knowledge is one of the most valuable assets you could own, in an age where data is more precious than gold. Once you learn these techniques you’ll save a bunch of time and money, allowing you to deliver a high quality project.

What you will learn

In this video course, I’ll share with you the same framework of steps I use to successfully public website data even if they are protected by anti-bot detection software.

You’ll learn:

How to scan websites for the best web scraping techniques
How to pick the right storage option for your data,
How to determine the what option to use to scale up your operation

What is in this course

Here is the table of contents for what to expect.

Intro: Who am I
4S Method: Scan, Scrape, Store, Scale
Checklist for scanning a website
Scanning easy website: Wikipedia.com
Scanning easy website: Amazon.com
Scanning easy website: Nike.com
Scanning hard website: Airtable
Scanning hard website: Crunchbase.com
Scanning hard website: Linkedin.com
Scanning hard website: Shein.com
Tools you can use to scrape
Where you can store your data
Options for scaling your operation based on project size
Recap
Let’s Connect

Frequently Asked Questions

Who is this course for?

Programmers who have some experience with web scraping
Programmers who want to scrape data from public websites
Programmers who want learn advanced techniques in web scraping

Programmers familiar with Python programming language
Programmers who want a framework for analyzing the best way to scrape data

What is NOT included in the course?

Code snippets
How to scrape websites behind a login page

Why should I listen to you? Who are you?

My name is Steven Natera. I am a software engineer with a decade of experience. I’ve used these techniques to scrape over 50,000 job postings for my job board search engine product at TechStackLeads.com

In my career, I worked for both small and large companies. In my previous role as an SRE at Twitter, I was on a team that managed over 500k servers, processing more than 5 million requests per second.

What is your refund policy?

If you're not 100% satisfied with the purchase, or it's not what you were expecting, just reply to the download email, or contact me within 30 days, and you'll get a full refund. No questions asked.

Buy this

Watch link provided after purchase

Size

1.21 GB

Duration

56 minutes

Resolution

1080p

Ratings

4.7

(33 ratings)

5 stars

91%

4 stars

3 stars

2 stars

1 star