Project 2: Smart Traffic Monitoring¶

Due: Wednesday, May 21st at 11:59pm

In this assignment, you will build a GPU-accelerated backend for a smart city traffic monitoring system. Your goal is to process large volumes of traffic event data from a network of street sensors and compute per-sensor hazard scores in real time.

Rather than being told exactly what primitives to use, you must determine where and how to apply CUDA operations such as filtering, aggregation, scan, and scatter. Your solution should be modular, performant, and capable of overlapping CPU input and GPU execution using CUDA streams.

Using a GPU Resource¶

You are permitted to complete this assignment using either:

The Midway3 cluster (via RCC), or
The Peanut GPU cluster on the CS Linux servers

Please choose the environment that works best for you.

Creating Your Private Repository¶

To actually get your private repository, you will need this invitation URL:

Project 2 invitation (Please check the Post “Project 2 is ready” Ed)

When you click on an invitation URL, you will have to complete the following steps:

You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.

Note

If you are on the waiting list for this course you will not have a repository made for you until you are admitted into the course. I will post the starter code on Ed so you can work on the assignment until you are admitted into the course.

You must click “Accept this assignment” or your repository will not actually be created.
After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.
You now need to clone your repository (i.e., download it to your machine).
- Make sure you’ve set up SSH access on your GitHub account.
- For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52072-spr25/proj2-GITHUB-USERNAME.git.
- If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.

Project 2: Smart Traffic Monitoring Overview¶

Image the Ciy of Chicago deployed a network of smart sensors at major intersections to detect and report real-time traffic activity. Each sensor emits events that describe detected vehicles, including speed, type, and a confidence score for the detection. These events are collected in batches and sent to a GPU processing application.

Your task is to process each batch of events as they arrive, compute meaningful metrics per sensor, and determine which areas of the city may be experiencing high traffic risk.

To handle many simultaneous event batches efficiently, your program will manage a pool of CUDA streams. Each stream is used to launch GPU work for a given batch. Your system should be able to keep multiple GPU streams in flight, reusing streams as batches finish.

Task 1: Understanding the Program Usage¶

The following provides an explanation of the program usage for the monitoring system:

./traffic_monitor [-s N] <log_filename>

Arg/Flag	Description
`<log_filename>`	is the path to a CSV log file where your program will write structured output.
`-s <int>`	is an optional flag to specify the number of concurrent CUDA streams (default: 4)

Input File Format¶

Traffic events are streamed via standard input (stdin) as space-separated values. Each batch is delimited by:

BEGIN <batch_id> <num_events> <unique_ids>
sensor_id speed vehicle_type timestamp confidence
...
END <batch_id>

The input stream ends with:

DONE

Each BEGIN <batch_id> and END <batch_id> match exactly and define one batch. <batch_id> is an integer that represents the unique identifier for a batch. <num_events> is an integer that represents the total number of events in a batch. <unique_ids> is the total number of unique sensor_ids in a batch.
DONE signals that no more batches will be sent. You can assume after DONE is provided an EOF is sent to indicate the closing of stdin.
Batch IDs are integers starting at 0 or higher and do not need to be sequential.

Each event line includes the following fields:

sensor_id speed vehicle_type timestamp confidence

Where:

sensor_id — Integer between 0 and NUM_SENSORS - 1. Represents the traffic sensor that recorded the event. Assume NUM_SENSORS=1000.
speed — A floating-point value representing the vehicle’s speed in miles per hour (mph). Example: 32.4, 55.0, 72.3
vehicle_type - An integer encoding the vehicle category. For this assignment there will only be three categories given
- 0: car
- 1: truck
- 2: motorcycle
timestamp — A floating-point value (in seconds) representing when the event was detected relative to a global clock (e.g., 1035.25 means 1035.25 seconds since monitoring began). This allows for fine-grained timing resolution between events within a batch.
confidence — A float between 0.0 and 1.0 representing how confident the sensor is in this event’s accuracy.

Output¶

Your program must not print anything to standard output or standard error.

Instead, all results must be written to the log file provided as a command-line argument. This includes:

The number of valid events per sensor in each batch
The hazard score per sensor
The severity level for each sensor

This log must be structured as a CSV file.

See the following section for format requirements.

Task 2: Implementing the Traffic Monitoring System¶

Your program must process batches of traffic events as they arrive. Each batch should be processed asynchronously using CUDA streams and GPU kernels.

You are responsible for designing the data-parallel pipeline that computes the final per-sensor results. Think carefully about what operations are needed to produce the expected output. Refer to the log format and example outputs in this specification (see sections Log Output Format and Example Input and Output) to guide your implementation strategy.

Your solution must:

Find valid events (i.e, filter and clean raw event data)

Remove any events where the confidence score is below 0.75. Only high-confidence events should be included in further processing.

Additional filtering constraints:
- Events must also have vehicle_speed > 5.0 to be considered valid (ignores slow/misread detections).
- Trucks (vehicle_type = 1) are allowed with a lower confidence threshold of 0.65 if their speed is above 40 mph.
Organize and group relevant data (e.g., by sensor).
Compute meaningful metrics for each sensor such as:
- Number of valid events
- Total hazard score
- Risk severity classification

The specific operations you use (e.g., histogram, scan, reduction, compaction, scatter, etc.) are not prescribed and you must implement these operations yourself and not use libraries with the exception that you may use the Thurst or Cub library’s sort function only. You must determine the most effective way to structure the pipeline to support these goals. The use of CUDA streams for overlapping execution across batches is required.

Hazard Score Calculation¶

For each valid event, you must compute a risk value that contributes to the total hazard score for the event’s sensor. Use the following formula:

decay        = expf(-0.01f * (current_time - timestamp))
speed_factor = sinf(vehicle_speed * 0.1609f)
type_factor  = {1.0, 1.5, 2.0}[vehicle_type]
risk         = 100.0f * decay * speed_factor * type_factor

Where:

current_time is the maximum timestamp among all events in the batch. This ensures that the decay term emphasizes more recent events.
vehicle_speed is in miles per hour.
speed_factor is a weighting function based on the speed of the vehicle. It’s designed to scale the contribution of an event to the overall hazard score based on how fast the vehicle is moving.
type_factor is a fixed multiplier based on vehicle_type: 0 = car → 1.0, 1 = truck → 1.5, 2 = motorcycle → 2.0
decay weights recent events more heavily

The final hazard score per sensor is the sum of all risk values for that sensor’s valid events.

Severity Classification¶

In addition to computing the final risk per sensor, your implementation must classify sensors by severity:

LOW: score < 100
MEDIUM: 100–249
HIGH: ≥ 250

This classification must appear in the CSV log output for each sensor.

Task 3: Output Logging¶

All output and final processing must be performed inside a registered CUDA stream callback function.

That is:

GPU kernels should run asynchronously in CUDA streams
After a stream finishes, a callback must:
- Aggregate and process per-sensor results
- Format and write the output to the log file

Use cudaStreamAddCallback(...). No host-side output is allowed outside the callback.

Log Output Format¶

Your program must write all output to a CSV log file whose path is provided as the first positional argument on the command line:

./traffic_monitor [-s N] <log_filename>

If the specified <log_filename> already exists, your program must overwrite it rather than append to it.

The first line of the file must be a CSV header:

timestamp,batch_id,sensor_id,valid_event_count,hazard_score,severity

Each following line must contain exactly one row for a sensor in a processed batch. Example:

2025-05-16 13:21:54,0,2,2,203.93,MEDIUM
2025-05-16 13:21:54,0,3,3,468.18,HIGH

Where:

timestamp is the local system time when the row (i.e., log entry) is written
batch_id is the batch index (from the input)
sensor_id is the numeric ID of the sensor
valid_event_count is how many valid events that sensor had in this batch
hazard_score is the total risk value
severity is one of: LOW, MEDIUM, or HIGH

Generating Timestamps in

You must include a timestamp in each log row using the format YYYY-MM-DD HH:MM:SS.

To generate this timestamp in C, use the following pattern:

#include <stdio.h>
#include <time.h>

void get_timestamp(char* buffer, size_t buffer_size) {
    time_t now = time(NULL);
    struct tm* tm_info = localtime(&now);
    strftime(buffer, buffer_size, "%Y-%m-%d %H:%M:%S", tm_info);
}

// Usage example:
char timestamp[32];
get_timestamp(timestamp, sizeof(timestamp));
fprintf(log_file, "%s,...", timestamp);

Example Input and Output¶

Inside the data\sample.in file, we have the following data

BEGIN 0 6 3
1 25.0 0 1000.0 0.95
2 45.0 1 1000.0 0.85
2 55.0 1 1000.1 0.88
3 45.0 2 1000.0 0.9
3 50.0 2 1000.1 0.9
3 55.0 2 1000.2 0.92
END 0
DONE

You can use file redirection to redirect stdin to come from this file

$ bin/traffic_monitor sample_log.csv < data/sample.in

then the expected CSV output inside sample_log.csv would be:

timestamp,batch_id,sensor_id,valid_event_count,hazard_score,severity
2025-05-16 13:21:54,0,1,1,-76.98,LOW
2025-05-16 13:21:54,0,2,2,203.93,MEDIUM
2025-05-16 13:21:54,0,3,3,468.18,HIGH

Testing and Validation¶

A comprehensive test suite is provided in the tests/ directory. Each test is self-contained and structured as follows:

tests/
├── test001/
│   ├── test001.in    # Input events (BEGIN...END...DONE format)
│   └── test001.csv   # Expected log output (CSV)
├── test002/
├── ...
└── run_tests.sh      # Shell script to run all or individual tests

Running All Tests¶

To run the entire test suite and check your implementation, use:

./tests/run_tests.sh

This will:

Feed each test’s .in file into your program
Generate a log file named log_output.csv
Compare it to the expected .csv log using compare_logs.py
Print whether each test passed or failed (based on hazard score and severity)

Each test is numbered (e.g., test004), and outputs will clearly show:

TEST 004 PASSED
TEST 005 FAILED
...

A summary will follow at the end of execution.

Running an Individual Test¶

You may also run a single test by name:

./tests/run_tests.sh test007

This is useful for debugging one specific case at a time.

Handling Delays¶

Some tests simulate asynchronous streaming by including #DELAY n lines in the input. These cause the runner to pause for n seconds between batches to mimic real-time input behavior. Your program does not need to interpret these lines — the test runner handles it automatically.

Understanding the Log Comparison¶

Tests are validated using scripts/compare_logs.py. It checks your log output against the expected output with these rules:

The timestamp column is ignored during comparison.
Rows are sorted by (batch_id, sensor_id).
hazard_score is compared using a small epsilon tolerance to allow for floating-point rounding.

If any differences are found (e.g., wrong severity or score mismatch), the log comparison will display both your output and the expected result.

Log Format Reminder¶

All expected log files (and your output) must follow this header format:

timestamp,batch_id,sensor_id,valid_event_count,hazard_score,severity

If a batch produces no valid events, your program should still write the header — even if no rows follow.

You are responsible for testing your code further.. However, we wanted to provide a few test-cases to help you get started.

Task 4: Design and Report Section¶

This assignment emphasizes hands-on experience with CUDA programming patterns—particularly CUDA streams, concurrency, and data-parallel pipelines—rather than maximizing raw performance.

You are expected to complete the implementation to correctly process the event batches and produce the required log output. Your solution does not need to achieve the best possible occupancy, throughput, or memory access patterns.

However, you must submit a short report (report.txt or report.md) that includes the following:

Design Reflection¶

Briefly describe the structure of your pipeline and how you approached each major component (e.g., filtering, scoring, streaming).
Discuss your use of CUDA streams, memory layout decisions, and how you handled concurrency.

Improvement Opportunities¶

Identify areas in your implementation that could be improved or optimized. These could include: - Warp efficiency - Shared memory usage - Overlap of compute and data transfer
For each area, explain: - Why you think it could be improved - How you would approach improving it (even if you didn’t implement it)

Optional: Profiling Metrics¶

You may include profiling metrics (e.g., nvprof or nsys output) to support your claims. While not required, these can help justify your proposed improvements and demonstrate deeper understanding.

This report is your opportunity to show that you’ve thought critically about your implementation—even if your final solution favors simplicity over peak performance.

Grading¶

Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our Assignment Rubric page.)

The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:

Completeness: 70%
Correctness: 10%
Design/Style/Report: 25%

Submission¶

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Project #2” assignment page via two ways,

Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repository you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

Note

For either option, you must upload the entire directory structure; otherwise, your automated test grade will not run correctly and you will be penalized if we have to manually run the tests. Going with the first option will do this automatically for you. You can always add additional directories and files (and even files/directories inside the stater directories) but the default directory/file structure must not change.

Depending on the assignment, once you submit your work, an “autograder” will run. This autograder should produce the same test results as when you run the code yourself; if it doesn’t, please let us know so we can look into it. A few other notes:

You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy.
Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).
Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.