v CMSC 33501 - Course Schedule

Course Schedule Spring 2018

This schedule is subject to change. Please check back frequently.


Week Date Topic Readings
Week 1 Mar 27
Introduction, Architectures, and Tradeoffs
Mar 29
Shared Nothing vs. Shared Memory vs. Shared Whatever
Week 2 Apr 3
Cloud-based Query Processing

Assigned:

Apr 5
Distributed Query processing

Assigned:

Week 3 Apr 10
Distributed Query Processing (Continued)

Assigned:

  • Kossmann Survey

    Apr 12
    Geographic Distribution

    Assigned:

  • Global Analytics in the Face of Bandwidth and Regulatory Constraints. Ashish Vulimiri, Carlo Curino, Brighten Godfrey, Thomas Jungblut, Jitu Padhye, and George Varghese - NSDI 2015.
  • Transparency In its Place: The case against transparent access to geographically distributed data. Jim Gray, Tandem TR 89.1, Feb 1989
  • Week 4 Apr 17
    Dealing with Heterogeniety

    Assigned:

  • Just-In-Time Data Virtualization: Lightweight Data Management with ViDa. Manos Karpathiotakis, Ioannis Alagiannis, Thomas Heinis, Miguel Branco, Anastasia Ailamaki - CIDR 2015 (note: added late - no write ups expected)
  • Also:

  • Look at SparkSQL Remote Access Operators and JSON Schema Inference: Spark SQL: Relational Data Processing in Spark.Michael Armbrust, Reynold Xin. Cheng Lian, Yin Huai, Davies Liu, Joseph Bradley, Xiangrui Meng, Tomer Kaftan, Michael Franklin, Ali Ghodsi, Matei Zaharia - SIGMOD 2015 (see mostly sections 4.4 and 5)

    Apr 19
    Heterogeniety II

    Assigned:

  • Weld: A Common Runtime for High Performance Data Analytics . Palkar et. al - CIDR 2017
  • Read Also (no summary needed):

  • Adaptive Query Processing on Raw Data. Manos Karpathiotakis, Miguel Branco, Ioannis Alagiannis, Anastasia Ailamaki - VLDB 2014.
  • Week 5 Apr 24
    Stream Processing I

    Assigned:

  • Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica - SOSP 2013.
  • Read Also (no summary needed):

  • Structured Streaming In Apache Spark: A new high-lvel API for streaming. Matei Zaharia, Tathagata Das, Michael Armbrust and Reynold Xin - Databricks Blog Post - July 28, 2016.
  • Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3. Joseph Torres, Michael Armbrust, Tathagata Das and Shiziong Zhu - Databricks Blog Post - March 20, 2018.

    Apr 26
    Stream Processing II

    Assigned:

  • The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing Akidau et al. VLDB 2016.
  • Read Also (no summary needed):

  • Millwheel: Fault-Tolerant Stream Processing at Internet Scale Akidau et al. VLDB 2013.
  • Week 6 May 1
    Distributed ML - Model Serving

    Assigned:

  • Clipper: A Low-Latency Online Prediction Serving System. Dan Crankshaw, Guilo Zhao, Michael Franklin, Joseph Gonzalez, Ion Stoica - NSDI 2017.
  • May 3
    Learned Indexes (Why not?)

    Assigned:

  • The Case for Learned Index Structures. Tim Kraska, Alex Beutel, Ed Chi, Jeff Dean, Neoklis Polyzotis - SIGMOD 2018 (to appear).
  • Read Also (Various responses - no summary needed):

  • The Case for B-Tree Index StructuresThomas Neumann.
  • Don't Throw Out Your Algorithms Book Just Yet: Classical Data Structures That Can Outperform Learned IndexesPeter Bailis, Kai Sheng Tai, Pratiksha Thaker, and Matei Zaharia.
  • Week 7 May 8
    Parameter Servers

    Assigned:

  • Scaling Distributed Machine Learning with the Parameter Server . Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su. OSDI 2014.

    May 10
    Distributed Deep Learning (Why not?)

    Assigned:

  • GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server. Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, Eric P. Xing. Eurosys 2016.
  • Also take a look at (Two other systems - no summary needed):

  • SPARKNET: Training Deep Networks in Spark Philipp Moritz, Robert Nishihara, Ion Stoica, Michael I. Jordan. ICLR 2016
  • BigDL: A Distributed Deep Learning Framework for Big Data Jason Dai et al (Intel, Tencent, Alibaba). Arxiv Paper 2018.
  • Week 8 May 15
    Graph Systems

    Assigned:

  • GraphX: Graph Processing in a Distributed Dataflow Framework. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, Ion Stoica. OSDI 2014.
  • Also take a look at (Two other systems - no summary needed):

  • Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Yucheng Low et al., VLDB 2012
  • Pregel: A System for Large-Scale Graph Processing Grzegorz Malewiscz et al. SIGMOD 2010.

    May 17
    Towards Data Markets

    Assigned (2 short papers):

  • Data Markets in the Cloud: An Opportunity for the Database Community.Magda Balazinska, Bill Howe, and Dan Suciu, VLDB 2011.
  • Why Data Citation is a Computational Problem Peter Buneman, Susan Davidson, James Frew, CACM Sept. 2016.
  • Week 9 May 22
    Provenance and Lineage

    Assigned:

  • Smoke: Fine-grained Lineage at Interactive Speed. Fotis Psallidas and Eugene Wu, VLDB 2018.
  • Also take a look at (no summary needed):

  • Diagnosing Machine Learning Pipelines with Fine-grained Lineage. Zhao Zhang, Evan R. Sparks, Michael J. Franklin. HPDC 2017.
  • Provenance in Databases: Why, How, and Where. James Cheney, Laura Chiticariu, Wang-Chiew Tan. Now Publishers, 2009.
    (This is a long, comprehensive survey focused on semantic issues - check out the first section for an overview.)

    May 24
    Systems for ML and Advanced Analytics (brainstorming)

    Assigned:

  • A Berkeley View of Systems Challenges for AI. Ion Stoica et al., UC Berkeley Technical Report No. UCB/EECS-2017-159, October 2017.
  • Also take a look at (no summary needed):

  • Proceedings of the First SysML Conference. Stanford, CA. Feburary 2018.
    Skim the Posters for interesting topics; Some of the videos are interesting too if you have time.
  • Week 10 May 29
    NO MEETING

    Prof. Franklin Away - No Meeting Today

    May 31
    PROJECT REPORTS

    Assigned:

  • PROJECT REPORTS (15 min each)
  • Unfortunately, some articles require a paid subscription to a journal or digital library. These articles are linked via the UChicago library proxy, and you must authenticate with your CNetID to view them.