CMSC 12300 Computer Science with Applications-3
Lecturer: Borja Sotomayor
E-mail: borja AT cs DOT uchicago DOT edu
Office: Ryerson 151
Office hours: By appointment.
TA: Gustav Larsson
E-mail: larsson AT uchicago DOT edu
Office: Ryerson 177
Office hours: TBD
Lectures: TuTh 4:00-5:20 in Ryerson 276
- Course Syllabus
- Programming Assignments
- CMSC 12300 repository on GitHub
- CMSC 12300 wiki on GitHub (additional references, project resources, etc. are here)
- Piazza discussion group
This course is the third in a three-quarter sequence that teaches computational thinking and skills to students in the sciences, mathematics, economics, etc. The course revolves around core ideas behind the management and computation of large volumes of data ("Big Data"). Topics include (1) Statistical methods for large data analysis, (2) Parallelism and concurrency, including models of parallelism and synchronization primitives, and (3) Distributed computing, including distributed architectures and the algorithms and techniques that enable these architectures to be fault-tolerant, reliable, and scalable.
Students will continue to use R, and will also learn C++ and distributed computing tools and platforms, including Amazon AWS and Hadoop. This course includes a project where students will have to formulate hypotheses about a large dataset, develop statistical models to test those hypothesis, implement a prototype that performs an initial exploration of the data, and a final system to process the entire dataset.
CMSC 12200, or instructor's consent, is a prerequisite for taking this course.
This course has no required textbooks.