M4: GPU Patterns and Structures

In this fourth module, we focus on parallel patterns and structures that benefit from GPU acceleration. The patterns discussed in this module can be applied across many different algorithms to help solve them on a GPU. We will provide the naive implementation and then show how each can be improved via optimizations discussed in previous modules. The module will cover the following:

  • Histograms

  • Reductions

  • Map

  • Tiling (Stencil)

  • Scan

  • Gather/Scatter

  • Stream compaction

Pre-recorded Lectures

Note: The pre-recorded videos for M4 will be posted after Tuesday’s lecture.

The pre-recorded lectures are available here: M4 Videos. You can also find the videos under the “Panopto” tab on the MPCS 52072 canvas site.

The lectures are a series of approx 20-30 minute videos divided into the following sections:

  • 4.1 Parallel Prefix Scan (Part 1)

  • 4.2 Parallel Prefix Scan (Part 2)

  • 4.3 Parallel Reduction (Part 1)

  • 4.4 Parallel Reduction (Part 2)

  • 4.5 Parallel Reduction (Part 3)

Resources/Readings

  • Programming Massively Parallel Processors: A Hands-on Approach
    • Chapter 9

    • Chapter 10

All slide and code materials will be accessible via the course repository.

Synchronous Session (In-Person Lecture)

As a reminder here are the dates and times for the synchronous session for this module:

Week 5

  • Dates/Times
    • Tuesday April 22nd @ 5:30pm-7:20pm

  • Session Outline
    • Histograms

    • Parallel scan.

Week 6

  • Dates/Times
    • Tuesday April 29th @ 5:30pm-7:20pm

  • Session Outline
    • Parallel Scan (Revisited)

    • Map

    • Gather/Scatter

    • Reductions [time permitting]

Assignment