Com Sci 222/322
Computer Architecture
Winter 1999

Design Project Step #4

Department of Computer Science

The University of Chicago

Last modified: Thu Mar 4 21:49:10 CST 1999

Project step #4, Simple pipeline (instruction look-ahead)
Due Friday, 12 March, midnight

Submission instructions

Purpose

To explore instruction look-ahead and its implementation in Verilog HDL.
To start to think about parallelism.

Discussion

In the computer designers' quest for faster CPU performance, they have used many techniques. One is a technique called instruction look-ahead. The idea of instruction look-ahead is to overlap the execution of instruction i with the fetch of the next instruction i + 1. This is essentially a two-stage instruction pipeline, with a potential speedup of two.

In this homework, you will implement instruction look-ahead in three steps:

Modify your computer to use only two instruction stages: the fetch (will include PC increment) and execute stages
Implement instruction look-ahead ignoring the problems surrounding JUMPs
Fix up the JUMPs

This homework requires you to think of algorithms executing in parallel, i. e., several actions occurring at the same time. Since this may be new to you, please ask if you need help.

Project steps

Start with the Verilog HDL code you produced for Project 3 (or use this as a clean start).
Combine the instruction fetch and PC increment instruction execution phases into only one phase. You will also need to modify the sequencer module: counter_CPU_sequencer. Use the fact that the memory read and PC increment can be done in parallel.
Instruction look-ahead ignoring jumps.
The idea of instruction look-ahead is to overlap the execution of instruction i with the fetch (and PC increment) of the next instruction i + 1.
If you followed the style from the initial counter_machine program then you already have formal parallelism between stages, using multiple "always" commands. In the counter_machine program the parallelism is sequentialized using signals (wire[cyc-1 : 0] step;)and you will need to change the signaling to make the two stages really go in parallel.
A hint: Each stage of your pipeline can be simulated by an "always" command. You should initialize the second stage of your pipeline to do a NullOp (1-cycle stall) as the first instruction to execute and at each posedge both stages in your pipeline should start working.
Second hint:You will probably need to introduce an interstage register to pass along the IR content.
You must be careful that there is no interference between registers in the two paths. For example, you can't read/write a register in both paths during the same (Verilog) clock period.
A word about memory accesses: dual ported memory has a double dose of addressing hardware, so you can read/write two things at once (in different memory locations). You can assume you use a dual ported memory module and that your code isn't self modifiable and therefore you can use simultaneous access to memory in the instruction fetch and the execution stages without any restriction.
Implement instruction look-ahead for all the instructions (accept the fact that jumps will be incorrect). What is the speedup you get?
Fixing up the JUMPs.

Modify the last version: implement instruction look-ahead for both jumps. The conditional jump instructions (JMPZ, JMPNZ) can be tricky because before the instruction has executed, you can not tell whether it will jump or not.
Hint: after each jump (PC is modified by the second stage of the pipeline) you will probably need to introduce a 1-cycle stall (NullOp) in the second stage of your pipeline in order to give the first stage the time to calculate the new IR.
Assuming all instructions occur equally likely, what is the speedup?
Demonstrate your new computer. Use the same test as in the Project 3 (here I restate the requirements):

Write and run a program that calculates the gcd (greatest common divisor) of two positive numbers. Assume that the two numbers are in M[0] and M[1] before your program starts. When your program terminates, the gcd value must be in M[2]. Also, when your program terminates it should display the memory values M[0], M[1] and M[3]. Keep all intermediate values in registers. You are not responsible for the behavior of the program on non positive inputs. Use the following version of Euclid's algorithm to compute the gcd:
```
while (b > 0) {

   c = b;

   if (a < b) then b = a else b = a - b;

   a = c;

}
```
Compile the algorithm by hand into the most straightforward machine code. Don't do any clever programming.

Run your program on at least the following inputs: (128, 36) and (55, 21)

Hand in

Verilog source code for the your CPU (for the latest step completed). Your test program should be loaded into memory starting at address 16.
Readme file containing: name, email address, answers to the questions in steps 2 and 3, Veriwell command to run your code, ...
Test results.

Com Sci 222/322 Computer Architecture Winter 1999

Design Project Step #4

Project step #4, Simple pipeline (instruction look-ahead) Due Friday, 12 March, midnight

Com Sci 222/322
Computer Architecture
Winter 1999

Project step #4, Simple pipeline (instruction look-ahead)
Due Friday, 12 March, midnight