cs154-2019 Lab 4

Introduction

By the end of this lab, you should have a feel for creating and debugging C programs that use pthreads. The debugging is assisted with another tool inside valgrind: helgrind (remember that in Lab 3 you learned about the memcheck tool for finding memory errors).

Do this lab on a CSIL Linux machine. Although we are not grading the lab, we will know that you haven't completed the lab by looking at your repository, and we may ask you to complete the lab if you are having basic problems with pthreads that could have been resolved by doing this lab.

Intro

Starting from the top of your cs154-spr-19-CNETID checkout, create the readme.txt file that you will put your answers to the Exercises in:
$ svn update
$ cd lab4
$ whoami > readme.txt
$ echo >> readme.txt
$ svn add readme.txt
$ svn commit -m "for lab4 answers" readme.txt
Note: make your answers in readme.txt as terse as you'd like.

In your lab4/intro directory is a main.c that demonstrates basic pthread behavior (the program is a modified version of the first example program from this pthread tutorial, which looks good). Compile and run the program with:

$ gcc -Wall -g -o main main.c -lpthread
$ ./main
$ ./main
$ ./main
The output should look something like:
main: thr[0] is pthread 18798336
main: thr[1] is pthread 10401536
main: thr[2] is pthread 2008832
main: thr[3] is pthread 4288583424
thr_func(18798336): hello from thread[0]
thr_func(4288583424): hello from thread[3]
main: thr[4] is pthread 3743319808
thr_func(2008832): hello from thread[2]
thr_func(10401536): hello from thread[1]
thr_func(3743319808): hello from thread[4]
main: thread 0 started running after 0.000128 seconds
main: thread 1 started running after 0.000295 seconds
main: thread 2 started running after 0.000484 seconds
main: thread 3 started running after 0.000516 seconds
main: thread 4 started running after 0.001164 seconds
but with every run things will be a little different. The pthread functions used here have already been covered in lecture, but this may be the first time that you have seen a self-contained working example of pthread code. For this part, you should carefully inspect this code and answer the following in your readme.txt
  1. Why does the ordering of the output lines vary with each run (based on the lecture and the textbook)?
  2. Running threads execute the "start routine" passed as the third argument to pthread_create, in this case thr_func. How is data passed to the start routine, and how does the start routine receive data passed to it?
  3. How do thread functions return data to pthread_join?
  4. Use valgrind (the default memcheck tool) to demonstrate the memory leak in this program. Fix the memory leak and svn commit a fixed version of main.c

Exercise 1

In the ex1 subdirectory of your lab4 directory, there is a simple program with a data race on a global variable. First, compile and run the program to see the results of the data race:
$ cd ex1
$ gcc -Wall -g -o main main.c -lpthread
$ ./main
$ ./main
$ ./main
$ ./main
The output of this program will be between 10001 and 20000 but will vary randomly. Rarely will such bugs be so simple to find by inspecting the code, and setting breakpoints is not usually productive. However the Valgrind tool Helgrind has thread error detection, which quickly helps find data race issues like the one in this code. To run Helgrind against this code, use the following command:
$ valgrind --tool=helgrind ./main
Helgrind will run for a minute or so, performing an analysis of the program. It will print out a set of errors about the data race. Helgrind numbers different threads of execution and then reports each of the read or write data races within those threads.

In your readme.txt, describe what the data race in this program is. Then, describe how Helgrind identifies it: if you didn't already know about the data race, what are the important lines of Helgrind output to understand, to see where in the source code the problem lies, and what variable was involved? Keep in mind that the "printf("&var = %p\n", &var);" line was added to simplify this exercise.

Exercise 2

A large program often has many different objects, each of which have their own locking requirements. When writing parallel code that uses multiple objects, each with their own associated locking, it's important to watch the order in which those objects are accessed.

Compile and then run the program in the ex2 subdirectory of lab4. There is a potential deadlock.

$ gcc -Wall -g -o main main.c -lpthread
$ ./main
$ ./main
$ ./main
$ ./main
It may print out 200 each of odds and evens, but with enough runs, it will lock up. If it does, use CTRL+C to cancel execution.

Try using the Helgrind as described in the previous exercise. It will print out information on lock order violations. Once Helgrind has printed information about Thread #3, you may press CTRL+C to stop execution of the program.

In your readme.txt file, copy the information about the lock order violation and explain in your own words what is happening in the code. Use names of variables in your description. If you have trouble interpreting the output, see the helgrind documentation. You do not need to fix the code.

(This was a mix of the lab4 originally written for cs154 by Lars Bergstrom, new content by Gordon Kindlmann for cs154-2015, and minor update for cs154-2019)