Homework #5¶

Due: Thursday, May 12th at 11:59pm

In this homework assignment, you will gain practice implementing a concurrent hashset.

Getting started¶

For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.

To actually get your private repository, you will need this invitation URL:

`HW5 invitation is posted on ED`__

When you click on an invitation URL, you will have to complete the following steps:

You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.
You must click “Accept this assignment” or your repository will not actually be created.
After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.
You now need to clone your repository (i.e., download it to your machine).
- Make sure you’ve set up SSH access on your GitHub account.
- For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52060-spr22/hw5-GITHUB-USERNAME.git.
- If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

Short Answer Questions¶

You have no short answer questions for this homework assignment.

Programming Questions¶

For this assignment you are only allowed to use the following Go concurrent constructs:

go statement
sync.Mutex and its associated methods.
sync/atomic package. You may use any of the atomic operations.

If you are unsure about whether you are able to use a language feature then please ask on Ed before using it.

Solo Assignment¶

The challenging part of this assignment is that you are working alone without the help of Professor Samuels or the course staff. You have enough practice and knowledge to be able to complete this specific assignment by yourselves. Professor Samuels and the course staff will not help debug or provide advice about your potential solution or fixes to it.

However, we will clarify anything about the specification, tests cases, or conceptually about the algorithm that you will implement for the assignment. You may post those types of questions on Ed and/or come to office hours to ask questions. Again we will not be looking at code or providing advice on possible solutions. You’ll need to do this on your own.

Concurrent Set¶

For this assignment, you will implement a concurrent hash set using linear probing. The implementation will implement the following Set interface:

type Set interface {
   Add(item string) bool
   Remove(item string) bool
   Contains(item string) bool
   Size() int
}

To keep the implementation simple, the set will only contain string values. The Add method adds values to the set. A set cannot contain duplicates so do not add an already existing item to the set. Add returns true if it was successful in adding the item; otherwise it returns false. Remove removes a string from the set if it exists. If Remove was able to remove the given string then it returns true; otherwise it returns false. The Contains method returns true if the given string is in the set; otherwise it returns false. The Size method returns how many string values are inside the set.

You must use the following hash function that is defined inside the hw5/set/set.go:

//hash is an implementation of the djb2 algorithm by Dan Bernstein
func hash(str string) int {
   hash := 5381
   for _, char := range str {
      hash = ((hash << 5) + hash) + int(char) /* hash * 33 + c */
   }
   return hash
}

To convert each string into its hash value. Make sure to still mod the hash value by the current table size.

Inside hw5/set/set.go file, there is a constructor function for creating a set. The only argument is the initial capacity for the set:

func NewSet(capacity int) Set {
   //Return a pointer to a set value.
   return nil
}

A hash set built on linear probing does not have unlimited capacity, since each bucket can only contain a single value and there are a fixed number of buckets. But, we do not want to have to anticipate how many buckets might be used for a given hash set in advance and hard-code that number in for the capacity of the hash set. Therefore, we will take the following approach: the bucket capacity passed into the constructor will represent the initial size of the table. If the fraction of occupied bucket grows beyond 75% (i.e., num_occupied_buckets/capacity > 0.75) after an update, then we will perform an operation called rehashing: We will expand the size of our hash set, and migrate all the data into their proper locations in the newly-expanded table (i.e., each item is hashed again, with the hash function now considering the new size of the table). We will grow the size of the table by 2; for instance, if the initial capacity is 2, then the size of the hash set will double to a new capacity of 4.

Recall from lecture, that each bucket has an h value that represents the range of items that were displaced from their original bucket location. For example, assume "a", "b", "c", "d" all collide at bucket index 18 (i.e., they all hash to index 18) then the implementation must place 3 of them in adjacent cells and h represents the furthest item away from the original hash location of 18. One thing to know is that linear probing is cyclic. This means that if the table has a capacity of 20 then once probing gets to the end of the table it wraps around to beginning of the table (i.e., index 0) and continues to linear probe from that location. Keep this in mind when thinking about your linear probing implementation.

Deleting an item in a hash set that uses linear probing is tricky. One way to handle this is to use a lazy deletion technique. We will “logically” delete the items. This means that you should have a marker for each bucket that indicates whether or not it’s item has been deleted. If the marker is true then the item is still inside the table; otherwise a false marker says that the items was deleted at some point. You must only “physically” remove (i.e., actually delete from the data structure) the items that have been “logically” removed during rehashing. This is not required implementation and you may choose how to handle deletion in your implementation. However, it may make concurrent implementations easier to logically implement.

I recommend that you define your own bucket type:

type bucket struct {
   // Define your fields
}

for your implementation and you can define the table using a slice (e.g., []bucket).

The next section explains how to implement the concurrent set.

Problem: Choose your own adventure¶

For this assignment, you can choose how to implement the concurrent set using linear probing based on choosing between two algorithms:

Course-grain implementation - use a single lock that locks down the entire concurrent set to implement linear probing. This is the easiest implementation. The maximum grade you can receive if you choose this implementation is 85/100.
Fine-grain implementation - Implement a solution similar to the closed addressing fine-grain approached presented in lecture. You will have a separate structure for locks and the table. You’ll need to think very carefully about how to use that implementation and adapt it to performing linear probing. Remember the open addressing approach uses separate chaining; however, linear probing uses a single table. The maximum grade you can receive if you choose this implementation is 100/100. Note that this one is challenging. I would recommend that you first implement the course-grain approach first before attempting this approach. It may seem easy from how it was discussed in class but there’s areas where deadlock can happen in this implementation.

Implement your implementation inside hw5/set/set.go. You can test your implementation using the set_test.go file.

Grading¶

Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our Assignment Rubric page.)

The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:

Completeness: 60%
Correctness: 30%
Design & Style: 10%

Obtaining your test score¶

The completeness part of your score will be determined using automated tests. To get your score for the automated tests, simply run the following from the Terminal. (Remember to leave out the $ prompt when you type the command.)

$ cd grader
$ go run hw5/grader hw5

This should print total score after running all test cases inside the individual problems. This printout will not show the tests you failed. You must run the problem’s individual test file to see the failures.

Design, Style and Cleaning up¶

Before you submit your final solution, you should, remove

any Printf statements that you added for debugging purposes and
all in-line comments of the form: “YOUR CODE HERE” and “TODO …”
Think about your function decomposition. No code duplication. This homework assignment is relatively small so this shouldn’t be a major problem but could be in certain problems.

Go does not have a strict style guide. However, use your best judgment from prior programming experience about style. Did you use good variable names? Do you have any lines that are too long, etc.

As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.

Submission¶

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #5” assignment page via two ways,

Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repository you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

Note

For either option, you must upload the entire directory structure; otherwise, your automated test grade will not run correctly and you will be penalized if we have to manually run the tests. Going with the first option will do this automatically for you. You can always add additional directories and files (and even files/directories inside the stater directories) but the default directory/file structure must not change.

Depending on the assignment, once you submit your work, an “autograder” will run. This autograder should produce the same test results as when you run the code yourself; if it doesn’t, please let us know so we can look into it. A few other notes:

You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy.
Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).
Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.