M2: Gathering Data - Data Formats and Web Scraping

This second module provides an introduction to different data formats you may encounter and ways to gather data from the web.

Pre-recorded Lectures

The pre-recorded lectures are available here. You can also find the videos under the “Panopto” tab on the CAPP 30122 canvas site.

The lectures are a series of approx 5-20 minute videos divided into 5 sections:

  • 2.1 - Data Format

  • 2.2 - Gathering Data from the Web

  • 2.3 - Basics of HTML

  • 2.4 - Web Scraping with Beautiful Soup

Installing BeautifulSoup

To install BeautifulSoup on your class VM, run:

sudo pip3 install --upgrade beautifulsoup4
sudo pip3 install --upgrade html5lib

The password is the usual student account password.

Zoom Sessions

You will find the links to the Zoom sessions on Canvas.

  • Week 2
    • Wednesday, January 20th: Review Beautiful Soup and Lab

    • Friday, January 22nd: Beautiful Soup & Selenium(extended example, Q&A)

Programming Assignment

Programming Assignment #2, due Saturday, January 30th at 9pm CDT