M2: Gathering Data - Data Formats and Web Scraping¶
This second module provides an introduction to different data formats you may encounter and ways to gather data from the web.
Pre-recorded Lectures¶
The pre-recorded lectures are available here. You can also find the videos under the “Panopto” tab on the CAPP 30122 canvas site.
The lectures are a series of approx 5-20 minute videos divided into 5 sections:
2.1 - Data Format
2.2 - Gathering Data from the Web
2.3 - Basics of HTML
2.4 - Web Scraping with Beautiful Soup
Resources¶
Installing BeautifulSoup¶
To install BeautifulSoup on your class VM, run:
sudo pip3 install --upgrade beautifulsoup4
sudo pip3 install --upgrade html5lib
The password is the usual student account password.
Zoom Sessions¶
You will find the links to the Zoom sessions on Canvas.
- Week 2
Wednesday, January 20th: Review Beautiful Soup and Lab
Friday, January 22nd: Beautiful Soup & Selenium(extended example, Q&A)
Programming Assignment¶
Programming Assignment #2, due Saturday, January 30th at 9pm CDT