CMSC 11700-01 Spring 2002
Sharon Salveter

Homework #7
Due Tuesday 28 May at the beginning of class
No late assignments accepted.

An operating system must be able to locate the disk addresses of files. The file names and their associated addresses can be kept in a file directory. Operations that need to be performed are: locating the address of an existing file, inserting an entry into the directory for a newly created file, and deleting an entry for a recently purged file. These operations are often implemeted using hashing.

The file directory is a hash table where each entry contains three data items:

Commands can be of three types:

I <filename> <address> <length>
A new file has been created; add an entry to the directory.
The filename must be unique.
Assume the address and length are legal values.
D <filename>
A file has been purged; remove its entry from the directory.
L <filename>
Locate the address and length of file <filename>.

The commands are stored one per line in a data file, which will be available Friday 24 May on the website. The program you turn in must use this data.

You are to implement the hash table using random probing and (modified) mid-square hashing. In random probing, the search for an identifier X, in a hash table with b buckets is carried out by examining buckets f(X), f(X)+S(i), 1 <= i <= (b-1) where S(i) is a pseudo random number. The random number generator must satisfy the property that every number from 1 through b-1 must be generated exatly once. For a table with 2r buckets, the following sequence of computations will generate numbers with this property.

Insetion using random probing works similarly. The buckets f(X), f(X) + S(i), 1 <= i <= (b-1) are examined until one is found where there is space for the entry. If no buckets have space, there is overflow. Deletion works similarly.

The (modified) mid-square hash function is computed by squaring each character in the identifier, summing the squares, and then using an appropriate number of bits from the middle of the sum to obtain the bucket address. The sum of the squares is assumed to fit in one computer word. The number of bits to be used to obtain the bucket address depends on the number of buckets in the table. If r bits are used, the range of values is 2r. Thus the number of buckets in the hash table should be a power of 2. For this program you are to use r = 4 ( 16 buckets ), and bucket size = 3. Within a bucket, you are to search linearly.

For each command processed, print the following information:

Insert
filename, address, and length, along with
bucket number where inserted, and position within bucket (1,2,3)
or:
overflow message
or:
duplicate filename message (do not insert into hash table)
Delete
filename, address, and length, along with
bucket number deleted from, and position within bucket
or:
no find message
Locate
filename, address, and length, along with
bucket number where where found, and position within bucket
or:
no find message
Do not use STL containers. You will probably want to look at Carrano and Prichard, Chapter 12 on hashing, before starting this program.

As usual, you program should be elegantly documented and modularized and your output clearly presented.