CMSC 12300 and CAPP 30123: Computer Science with Applications III

The University of Chicago, Spring 2017

Syllabus

This syllabus, last updated March 28, shows my plans for the course. I have posted it on the web to facilitate updates over the span of the quarter. I reserve the right to make changes; for instance, we are currently in an uncertain situation around our access to cloud computing resources and, thus, our ability to conduct the projects.

Course Staff

Instructor: Matthew Wachs
Email: mwachs
Office: E 127
Office Hours: Monday 3-5pm, Tuesday 4-5pm, Wednesday 3-5pm, Friday 3-5pm. (Please don't hesitate to email if you'd like to meet at another time.)

TA: Lang Yu

TA: Yuanwei Fang

Course Components

The course consists of:

Lectures
Labs: give you the opportunity to practice with real Big Data environments and get immediate feedback and assistance from course staff (not graded)
Programming assignments: give you more in-depth practice on selected material than would be possible in the labs; cumulatively, they contribute 50% towards your final grade
Project: an extended, open-ended team project, similar to the project last quarter; you will propose, check in with me, and give a final presentation alongside submitting your code. The theme of the project is answering hypotheses on large data sets. Projects will count for the balance of your final grade

Topics

This course is about Big Data: the challenges of working with it, and the solutions that have ben developed to successfully overcome them. Topics include:

Algorithms:
- considerations and changes needed when moving from smaller data sets to large ones
- analysis of computational time and memory requirements and how they scale with data size
- methods and conceptual frameworks for dividing up the work of an algorithm into separate tasks that can be run in parallel on multiple computing resources
C: an expansion of your programming skills to include, in your repertoire, arguably the most-widely used language in the world, a lingua franca for computer scientists, a low-level language that offers higher performance than interpreted languages such as Python
Big Data and cloud computing environments and programming paradigms:
- MapReduce
- Hadoop
- Multi-process and multi-threaded programming
- Concurrency and synchronization primitives (mutexes, condition variables)
- MPI

Academic Honesty

The University's rules on academic honesty apply equally to this course as they did in the prior courses in the sequence and will be rigorously and rigidly enforced. If you have any doubts, questions, or concerns, please ask, particularly in advance.

Textbook

This course does not have a textbook. However, this book is highly relevant to the course and is available online at no cost; you may find it of value.