Course Schedule Spring 2018

This schedule is subject to change. Please check back frequently.

Week	Date	Topic	Readings
Week 1	Mar 27	Introduction, Architectures, and Tradeoffs
Week 1	Mar 29	Shared Nothing vs. Shared Memory vs. Shared Whatever
Week 2	Apr 3	Cloud-based Query Processing	Assigned: See slides: Data Warehousing in the Cloud (the end of Shared Nothing). David Dewitt and Willis Lang
Week 2	Apr 5	Distributed Query processing	Assigned: The State of the Art in Distributed Query Processing. Donald Kossmann - ACM Computing Surveys (CSUR) 2000
Week 3	Apr 10	Distributed Query Processing (Continued)	Assigned: Kossmann Survey
Week 3	Apr 12	Geographic Distribution	Assigned: Global Analytics in the Face of Bandwidth and Regulatory Constraints. Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Thomas Jungblut, Jitu Padhye, and George Varghese - NSDI 2015. Transparency In its Place: The case against transparent access to geographically distributed data. Jim Gray, Tandem TR 89.1, Feb 1989
Week 4	Apr 17	Dealing with Heterogeniety	Assigned: Just-In-Time Data Virtualization: Lightweight Data Management with ViDa. Manos Karpathiotakis, Ioannis Alagiannis, Thomas Heinis, Miguel Branco, Anastasia Ailamaki - CIDR 2015 (note: added late - no write ups expected) Also: Look at SparkSQL Remote Access Operators and JSON Schema Inference: Spark SQL: Relational Data Processing in Spark.Michael Armbrust, Reynold Xin. Cheng Lian, Yin Huai, Davies Liu, Joseph Bradley, Xiangrui Meng, Tomer Kaftan, Michael Franklin, Ali Ghodsi, Matei Zaharia - SIGMOD 2015 (see mostly sections 4.4 and 5)
Week 4	Apr 19	Heterogeniety II	Assigned: Weld: A Common Runtime for High Performance Data Analytics . Palkar et. al - CIDR 2017 Read Also (no summary needed): Adaptive Query Processing on Raw Data. Manos Karpathiotakis, Miguel Branco, Ioannis Alagiannis, Anastasia Ailamaki - VLDB 2014.
Week 5	Apr 24	Stream Processing I	Assigned: Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica - SOSP 2013. Read Also (no summary needed): Structured Streaming In Apache Spark: A new high-lvel API for streaming. Matei Zaharia, Tathagata Das, Michael Armbrust and Reynold Xin - Databricks Blog Post - July 28, 2016. Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3. Joseph Torres, Michael Armbrust, Tathagata Das and Shiziong Zhu - Databricks Blog Post - March 20, 2018.
Week 5	Apr 26	Stream Processing II	Assigned: The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing Akidau et al. VLDB 2016. Read Also (no summary needed): Millwheel: Fault-Tolerant Stream Processing at Internet Scale Akidau et al. VLDB 2013.
Week 6	May 1	Distributed ML - Model Serving	Assigned: Clipper: A Low-Latency Online Prediction Serving System. Dan Crankshaw, Guilo Zhao, Michael Franklin, Joseph Gonzalez, Ion Stoica - NSDI 2017.
Week 6	May 3	Learned Indexes (Why not?)	Assigned: The Case for Learned Index Structures. Tim Kraska, Alex Beutel, Ed Chi, Jeff Dean, Neoklis Polyzotis - SIGMOD 2018 (to appear). Read Also (Various responses - no summary needed): The Case for B-Tree Index StructuresThomas Neumann. Don't Throw Out Your Algorithms Book Just Yet: Classical Data Structures That Can Outperform Learned IndexesPeter Bailis, Kai Sheng Tai, Pratiksha Thaker, and Matei Zaharia.
Week 7	May 8	Parameter Servers	Assigned: Scaling Distributed Machine Learning with the Parameter Server . Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su. OSDI 2014.
Week 7	May 10	Distributed Deep Learning (Why not?)	Assigned: GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server. Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, Eric P. Xing. Eurosys 2016. Also take a look at (Two other systems - no summary needed): SPARKNET: Training Deep Networks in Spark Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan. ICLR 2016 BigDL: A Distributed Deep Learning Framework for Big Data Jason Dai et al (Intel, Tencent, Alibaba). Arxiv Paper 2018.
Week 8	May 15	Graph Systems	Assigned: GraphX: Graph Processing in a Distributed Dataflow Framework. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica. OSDI 2014. Also take a look at (Two other systems - no summary needed): Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Yucheng Low et al., VLDB 2012 Pregel: A System for Large-Scale Graph Processing Grzegorz Malewiscz et al. SIGMOD 2010.
Week 8	May 17	Towards Data Markets	Assigned (2 short papers): Data Markets in the Cloud: An Opportunity for the Database Community.Magda Balazinska, Bill Howe, and Dan Suciu, VLDB 2011. Why Data Citation is a Computational Problem Peter Buneman, Susan Davidson, James Frew, CACM Sept. 2016.
Week 9	May 22	Provenance and Lineage	Assigned: Smoke: Fine-grained Lineage at Interactive Speed. Fotis Psallidas and Eugene Wu, VLDB 2018. Also take a look at (no summary needed): Diagnosing Machine Learning Pipelines with Fine-grained Lineage. Zhao Zhang, Evan R. Sparks, Michael J. Franklin. HPDC 2017. Provenance in Databases: Why, How, and Where. James Cheney, Laura Chiticariu, Wang-Chiew Tan. Now Publishers, 2009. (This is a long, comprehensive survey focused on semantic issues - check out the first section for an overview.)
Week 9	May 24	Systems for ML and Advanced Analytics (brainstorming)	Assigned: A Berkeley View of Systems Challenges for AI. Ion Stoica et al., UC Berkeley Technical Report No. UCB/EECS-2017-159, October 2017. Also take a look at (no summary needed): Proceedings of the First SysML Conference. Stanford, CA. Feburary 2018. Skim the Posters for interesting topics; Some of the videos are interesting too if you have time.
Week 10	May 29	NO MEETING	Prof. Franklin Away - No Meeting Today
Week 10	May 31	PROJECT REPORTS	Assigned: PROJECT REPORTS (15 min each)

Unfortunately, some articles require a paid subscription to a journal or digital library. These articles are linked via the UChicago library proxy, and you must authenticate with your CNetID to view them.