M3: CUDA Memory Model and Performance

In this third module, we begin diving into the CUDA memory hierarchy and its importance in the performance of CUDA programs.

Pre-recorded Lectures

Note: The pre-recorded videos for M3 will be posted after Wednesday’s lecture.

The pre-recorded lectures are available here: M3 Videos. You can also find the videos under the “Panopto” tab on the MPCS 52072 canvas site.

The lectures are a series of approx 20-30 minute videos divided into the following sections:

  • 3.1 - Multidimensional Grids and Data

  • 3.2 - Shared Memory Demo: Stencil 1D

  • 3.3 - Performance Metrics & Data Locality

  • 3.4 - Matrix-Matrix Multiplication & Tiling

  • 3.5 - DRAM Architecture & Memory Coalescing

  • 3.6 - Shared Memory Optimization

  • 3.7 - Sharing Data within a Warp

Resources/Readings

  • Programming Massively Parallel Processors: A Hands-on Approach
    • Chapter 4

    • Chapter 5

    • Chapter 6

All slide and code materials will be accessible via the course repository.

Synchronous Session (Remote Lecture)

As a reminder here are the dates and times for the synchronous session for this module:

Week 3

  • Dates/Times
    • Wednesday June 26th @ 5:30pm-7:20pm

  • Session Outline
    • Introduction to CUDA Memory Model

    • Demo: Stencil 1D

Week 4

  • Dates/Times
    • Wednesday July 3rd @ 5:30pm-7:20pm

  • Session Outline
    • CUDA Type Qualifiers (Review)

    • CUDA Memory Management API calls

    • Data Locality (Global vs Shared Memory)

Week 5

  • Dates/Times
    • Wednesday July 10th @ 5:30pm-7:20pm

  • Session Outline
    • Deeper Dive into Shared Memory

Assignment

Assignments are always due on Friday evenings.

  • Project #1, due Wednesday July 24th at 11:59pm CDT