M3: CUDA Memory Model and Performance¶

In this third module, we begin diving into the CUDA memory hierarchy and its importance in the performance of CUDA programs.

Pre-recorded Lectures¶

Note: The pre-recorded videos for M3 will be posted after Tuesday’s lecture.

The pre-recorded lectures are available here: M3 Videos. You can also find the videos under the “Panopto” tab on the MPCS 52072 canvas site.

The lectures are a series of approx 20-30 minute videos divided into the following sections:

3.1 Shared Memory Demo: Stencil 1D
3.2 Multidimensional Grids and Data
3.3 - Performance Metrics & Data Locality
3.4 - Matrix-Matrix Multiplication & Tiling
3.5 - DRAM Architecture & Memory Coalescing
3.6 - Shared Memory Optimization

Resources/Readings¶

Programming Massively Parallel Processors: A Hands-on Approach
- Chapter 4 (Week 3)
- Chapter 5 (Week 4)
- Chapter 6 (Week 4)

All slide and code materials will be accessible via the course repository.

Synchronous Session (Remote Lecture)¶

As a reminder here are the dates and times for the synchronous session for this module:

Week 3¶

Dates/Times
- Wednesday April 8th @ 5:30pm-7:20pm
Session Outline
- Introduction to CUDA Memory Model
- Shared Memory Demo: Stencil 1D
- Multidimensional Grids and Data

Week 4¶

Dates/Times
- Wednesday April 15th @ 5:30pm-7:20pm
Session Outline
- CUDA Memory Management API calls
- Performance Consideration: Data Locality (Global vs Shared Memory)

Assignment¶

Assignments are always due on Friday evenings.

Homework #3, due Friday April 18th at 11:59pm CDT
Project #1, due Sunday May 4th at 11:59pm CDT