Homework #3¶

Due: Friday, July 5th at 11:59pm

In this homework, we will look at defining our grid and block dimensions for two dimensions since we will be working with multidimensonal data. We will also explore looking at shared memory.

Peanut Cluster and GPU Partitions¶

If you are not familiar with submitting jobs to a cluster (i.e., the Peanut cluster) please make sure to watch the following video and read over the SLURM documentation provided by our tech-staff before beginning the assignment:

We will grade all assignments using the Peanut cluster’s GPUs and all programming assignments must work correctly on one of these machines. If you have a NVIDIA GPU installed on your local machine then you can work; however, your program must still work on one of the cluster’s GPU nodes, no exceptions.

You have the option batching your programs to the following cluster nodes

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
gpu-all up 4:00:00 2 idle gpu[2-3]
titan up 4:00:00 1 idle gpu3
pascal up 4:00:00 1 idle gpu2
quadro up 4:00:00 1 idle gpu1

Please be cautious of using titan since it’s been known to have stalling problems so we advise not using it until it’s been reliably fixed.

Creating Your Private Repository¶

To actually get your private repository, you will need this invitation URL:

HW3 invitation (Please check the Post “HW 3 is ready” Ed)

When you click on an invitation URL, you will have to complete the following steps:

You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.

Note

If you are on the waiting list for this course you will not have a repository made for you until you are admitted into the course. I will post the starter code on Ed so you can work on the assignment until you are admitted into the course.

You must click “Accept this assignment” or your repository will not actually be created.
After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.
You now need to clone your repository (i.e., download it to your machine).
- Make sure you’ve set up SSH access on your GitHub account.
- For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52072-sum24/hw3-GITHUB-USERNAME.git.
- If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.

Programming Problem: Median Filtering Algorithm¶

Median filtering is a non-linear digital filtering technique used to remove noise from an image or signal. Noise reduction is typical helpful as pre-processing step to improve the results of later digital/signal processing (e.g., edge detection on an image). The algorithm works by sliding a window (or stencil) of a fixed size over each pixel in the image, sorting the pixel values within the stencil, and replacing the center pixel with the median value of the sorted list.

Here are the steps to perform median filtering:

Grayscale the image: You will use the code from hw2 to first grayscale the image.
Choose a window/stencil size (e.g., 3x3, 5x5): The stencil size determines the neighborhood around each pixel that will be considered. Larger windows may remove more noise but can also blur the image. The program provides you with a RADIUS variable that represents the 2D stencil size. The stencil matrix will always be square (i.e., RADIUSxRADIUS).
Slide the stencil over the image: The stencil moves pixel by pixel across the entire image. For each position of the stencil:
- Extract the pixel values within the stencil.
- Sort these pixel values.
- Select the median value (the middle value in the sorted list).
Replace the center pixel: The center pixel of the stencil is replaced with the median value obtained from the previous step.
Handle image boundaries: Special care is needed at the edges of the image where the stencil might extend beyond the boundary. We will use the naive/simple approach of padding the outside boundaries to zero for neighbors falling outside the image boundary.

Consider a radius=3 (i.e., 3x3 stencil) applied to a part of a grayscale image where the algorithm is currently at pixel value equal to 70:

Original 3x3 stencil:     Sorted Values:                               Median Value:
[ 30,  20,  10 ]          [ 10, 20, 30, 40, 50, 60, 70, 80, 90 ]       [ 50 ]
[ 80,  70,  60 ]
[ 90,  40,  50 ]

Center Pixel Replacement:
70 (center pixel) -> 50 (median value)

For this homework, we will continue working with PNG image data but now defining your CUDA program in multidimensional manner. Make sure you fulling understand how to use the provided starter code files

png_flatten.h and png_flatten.c - The same PNG library file from the previous assignment that you will use to load and save the 1D array of pixel data.
median.cu- the main CUDA file that will implement the median filtering on the image. The implementation is of this file is discussed in the upcoming sections.

Compiling and Running¶

We have provided a Makefile to easily compile and generate your CUDA program named median. Update the sbatch script file named hw3-job.sh with information about your directory structure and CNet credentials (similar to the hw1) and sbatch the script.

$ sbatch hw3-job.sh

The script builds and executes the median program on a GPU partition. The median program uses the provided test file test_img.png to produce a median filtered version that is saved to the test_img_out.png file. If you logged into the server using Visual Studio Code you should be able to code test_img_out.png to see the median filtered image.

GPU Version¶

By default the sequential version of the application is ran. To run the GPU version, you must specify a -p flag along with the grid and block dimensions. Please look at the global USAGE variable in median.cu for details.

For the remainder of assignment, you will augment the median.cu and write code to perform timing measurements on your program.

Task 1: Implement a Sequential Version¶

Inside median.cu, implement a sequential version of the median filter algorithm by implementing the following function

void execute_sequential(char *in_file, char *out_file, int radius){
  /** TODO: IMPLEMENT ME! **/
}

You are not allowed to modify its function signature. This function must do the following

Load the in_file image by using the png_flatten library functions.
Grayscale the image. You can use the provided image_to_grayscale function.
Implement code that peforms the median filtering algorithm on the image as described above. You will need to use a sorting algorithm for sorting the neighboring values. You can choose to implement any sorting algorithm.
Save the result to out_file.

Function decomposition is important so please implement helper functions where necessary.

Task 2: Implement a GPU Version¶

Inside median.cu, implement a GPU version of the median filter algorithm by implementing the following function

void execute_GPU(char *in_file, char *out_file,
               int grid_dim_x, int grid_dim_y, int block_dim_x, int block_dim_y,
               int radius) {

   /** TODO: IMPLEMENT ME! */
}

You are not allowed to modify its function signature. This function must do the following

Load the in_file image by using the png_flatten library functions.
Use your grayscale kernel from hw2 to grayscale the image. Use the variables grid_dim_x, grid_dim_y, and block_dim_x, and block_dim_y to launch your kernel. This means you will need to update the grayscale kernel from hw2 to work in a two-dimensional way.
Implement code that performs the median filtering algorithm on the image as described above. This must be implemented in a separate kernel function. The implementation must use shared memory for the stencil operations such that it minimizes global memory access. You must share the sorting algorithm code implemented in Task 1* to help sort a threads neighboring values. Use the variables grid_dim_x, grid_dim_y, and block_dim_x, and block_dim_y to launch your kernel.
Save the result to out_file.

Function decomposition is important so please implement helper functions where necessary.

Task 3: Verifying Correctness¶

To test your implementation, implement a function verify_gpu_results to verify that the GPU’s output matches the expected output produced by your sequential code. You can determine the function arguments and return type for this function. The main goal of is to make sure your GPU code is working as expected.

We will use the same test images from homework #2:

HW2 Image Data : There should be a download arrow icon on the left side to download the zip folder.

Only look at the images inside the in directory. You can ignore every other directory and files as it does not pertain to this assignment. DO NOT COMMIT THIS DIRECTORY TO YOUR REPOSITORY. These are very large files and committing this directory will result in a penalty!

Task 4: Timing Tests¶

For this assignment, we will only look at the computational speedup and not the speedup of the overall program. We will time the execution of the sequential code and GPU code. You will produce speedup graphs only for two files: data/in/big/IMG_2029.png and data/in/big/IMG_4069.png. The x-axis will represent different configurations of grid and block dimensions. You will choose these configurations yourself. First start with small sizes and then work way to larger grid and block sizes. You need to choose 6 different configurations to plot. The y-axis represents the speedup for that specific image. The below graph provides a visual of what your graph should resemble.

Note

This graph uses fake data! These are not actual timings for the files. We are only showing you the graph for visual purposes. The x ticks at the bottom read “Grid(x,y) Block(x,y)”. You will fill in the x and y values with your choosen dimensions.

You must generate the speedup graphs using a sbatch script named generate_graphs.sh. This means you cannot hand generate the graphs. We should be able to run sbatch generate-graphs.sh with us just changing the SBATCH configuration settings and the graphs should be produced. You can generate the graphs using gnu-plot, python script, java, etc.

Here a few additional requirements:

You will observe that timings vary a little bit each time you run your program. Please run every experiment at least 10 times, and use the average time from those runs.
Make make sure to title the graph, and label each axis. Make sure to adjust your y-axis range so that we can accurately see the values. That is, if most of your values fall between a range of [0,1] then don’t make your range [0,14].
The names for each graph file will be the name of the data directory versions (i.e., speedup-IMG_2029.png and speedup-IMG_4069.png).
The script that generates teh graphs must be a file called generate-graphs.sh
the generate-graphs.sh script so that it contains all commands (or calls another script that does) to reproduce your timings and your plot, i.e. the experiment should be fully automated with just calling the script as:
```
sbatch generate-graphs.sh
```

`README.md` file¶

Inside the hw3/README.md file, provide explanation on your results. Focus on answering the following:

Where are you getting speedups in your graphs and why?
What affect does the grid and block sizes have on the performance your GPU implementation?

One-two paragraphs is sufficient for answering these questions.

Grading¶

Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our Assignment Rubric page.)

The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:

Task 1: 20%
Task 2: 50%
Task 3: 10%
Task 4: 20%

Submission¶

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #3” assignment page via two ways,

Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repository you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

Note

For either option, you must upload the entire directory structure; otherwise, your automated test grade will not run correctly and you will be penalized if we have to manually run the tests. Going with the first option will do this automatically for you. You can always add additional directories and files (and even files/directories inside the stater directories) but the default directory/file structure must not change.

Depending on the assignment, once you submit your work, an “autograder” will run. This autograder should produce the same test results as when you run the code yourself; if it doesn’t, please let us know so we can look into it. A few other notes:

You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy.
Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).
Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.

Homework #3¶

Peanut Cluster and GPU Partitions¶

Creating Your Private Repository¶

Programming Problem: Median Filtering Algorithm¶

Compiling and Running¶

GPU Version¶

Task 1: Implement a Sequential Version¶

Task 2: Implement a GPU Version¶

Task 3: Verifying Correctness¶

Task 4: Timing Tests¶

README.md file¶

Grading¶

Submission¶

`README.md` file¶