M3: CUDA Memory Model and Performance¶
In this third module, we begin diving into the CUDA memory hierarchy and its importance in the performance of CUDA programs.
Pre-recorded Lectures¶
Note: The pre-recorded videos for M3 will be posted after Wednesday’s lecture.
The pre-recorded lectures are available here: M3 Videos. You can also find the videos under the “Panopto” tab on the MPCS 52072 canvas site.
The lectures are a series of approx 20-30 minute videos divided into the following sections:
3.1 - Multidimensional Grids and Data
3.2 - Shared Memory Demo: Stencil 1D
3.3 - Performance Metrics & Data Locality
3.4 - Matrix-Matrix Multiplication & Tiling
3.5 - DRAM Architecture & Memory Coalescing
3.6 - Shared Memory Optimization
3.7 - Sharing Data within a Warp
Resources/Readings¶
- Programming Massively Parallel Processors: A Hands-on Approach
Chapter 4
Chapter 5
Chapter 6
All slide and code materials will be accessible via the course repository.
Synchronous Session (Remote Lecture)¶
As a reminder here are the dates and times for the synchronous session for this module:
Week 3¶
- Dates/Times
Wednesday June 26th @ 5:30pm-7:20pm
- Session Outline
Introduction to CUDA Memory Model
Demo: Stencil 1D
Week 4¶
- Dates/Times
Wednesday July 3rd @ 5:30pm-7:20pm
- Session Outline
CUDA Type Qualifiers (Review)
CUDA Memory Management API calls
Data Locality (Global vs Shared Memory)
Week 5¶
- Dates/Times
Wednesday July 10th @ 5:30pm-7:20pm
- Session Outline
Deeper Dive into Shared Memory
Assignment¶
Assignments are always due on Friday evenings.
Homework #3, due Friday July 5th at 11:59pm CDT
Project #1, due Wednesday July 24th at 11:59pm CDT