CMSC 12300 and CAPP 30123: Computer Science with Applications III
The University of Chicago, Spring 2017
Syllabus
This syllabus, last updated March 28, shows my plans for the course. I have
posted it on the web to facilitate updates over the span of the quarter.
I reserve the right to make changes; for instance, we are currently in an
uncertain situation around our access to cloud computing resources and, thus,
our ability to conduct the projects.
Course Staff
Instructor: Matthew Wachs
Email: mwachs
Office: E 127
Office Hours: Monday 3-5pm, Tuesday 4-5pm, Wednesday 3-5pm, Friday 3-5pm. (Please don't hesitate to email if you'd like to meet at another time.)
TA: Lang Yu
TA: Yuanwei Fang
Course Components
The course consists of:
- Lectures
- Labs: give you the opportunity to practice with real Big Data environments and get immediate feedback and assistance from course staff (not graded)
- Programming assignments: give you more in-depth practice on selected material than would be possible in the labs; cumulatively, they contribute 50% towards your final grade
- Project: an extended, open-ended team project, similar to the project last quarter; you will propose, check in with me, and give a final presentation alongside submitting your code. The theme of the project is answering hypotheses on large data sets. Projects will count for the balance of your final grade
Topics
This course is about Big Data: the challenges of working with it, and the solutions that have ben developed to successfully overcome them. Topics include:
- Algorithms:
- considerations and changes needed when moving from smaller data sets to large ones
- analysis of computational time and memory requirements and how they scale with data size
- methods and conceptual frameworks for dividing up the work of an algorithm into separate tasks that can be run in parallel on multiple computing resources
- C: an expansion of your programming skills to include, in your repertoire,
arguably the most-widely used language in the world, a lingua franca for
computer scientists, a low-level language that offers higher performance than
interpreted languages such as Python
- Big Data and cloud computing environments and programming paradigms:
- MapReduce
- Hadoop
- Multi-process and multi-threaded programming
- Concurrency and synchronization primitives (mutexes, condition variables)
- MPI
Academic Honesty
The University's rules on academic honesty apply equally to this course as they did in the prior courses in the sequence and will be rigorously and rigidly enforced. If you have any doubts, questions, or concerns, please ask, particularly in advance.
Textbook
This course does not have a textbook. However, this book is highly relevant to the course and is available online at no cost; you may find it of value.