Last modified: Thu Mar 4 21:49:10 CST 1999
Purpose
Discussion
In the computer designers' quest for faster CPU performance, they have used many techniques. One is a technique called instruction look-ahead. The idea of instruction look-ahead is to overlap the execution of instruction i with the fetch of the next instruction i + 1. This is essentially a two-stage instruction pipeline, with a potential speedup of two.
In this homework, you will implement instruction look-ahead in three steps:
This homework requires you to think of algorithms executing in parallel, i. e., several actions occurring at the same time. Since this may be new to you, please ask if you need help.
Project steps
counter_CPU_sequencer
. Use the
fact that the memory read and PC increment can be done in parallel.
The idea of instruction look-ahead is to overlap the execution of instruction i with the fetch (and PC increment) of the next instruction i + 1.
If you followed the style from the initial
counter_machine
program then you already have formal parallelism between stages,
using multiple "always" commands. In the counter_machine
program the parallelism is sequentialized using signals
(wire[cyc-1 : 0] step;
)and you will need to change the
signaling to make the two stages really go in parallel.
A hint: Each stage of your pipeline can be simulated by an "always" command. You should initialize the second stage of your pipeline to do a NullOp (1-cycle stall) as the first instruction to execute and at each posedge both stages in your pipeline should start working.
Second hint:You will probably need to introduce an interstage register to pass along the IR content.
You must be careful that there is no interference between registers in the two paths. For example, you can't read/write a register in both paths during the same (Verilog) clock period.
A word about memory accesses: dual ported memory has a double dose of addressing hardware, so you can read/write two things at once (in different memory locations). You can assume you use a dual ported memory module and that your code isn't self modifiable and therefore you can use simultaneous access to memory in the instruction fetch and the execution stages without any restriction.
Implement instruction look-ahead for all the instructions (accept the fact that jumps will be incorrect). What is the speedup you get?
Modify the last version: implement instruction look-ahead for both jumps. The conditional jump instructions (JMPZ, JMPNZ) can be tricky because before the instruction has executed, you can not tell whether it will jump or not.
Hint: after each jump (PC is modified by the second stage of the pipeline) you will probably need to introduce a 1-cycle stall (NullOp) in the second stage of your pipeline in order to give the first stage the time to calculate the new IR.
Assuming all instructions occur equally likely, what is the speedup?
Write and run a program that
calculates the gcd (greatest common divisor) of two positive
numbers. Assume that the two numbers are in M[0]
and
M[1]
before your program starts. When your program
terminates, the gcd value must be in M[2]
. Also,
when your program terminates it should display the memory values
M[0], M[1]
and M[3]
. Keep all intermediate
values in registers. You are not
responsible for the behavior of the program on non positive inputs.
Use the following version of Euclid's algorithm to compute the gcd:
Compile the algorithm by hand into the most straightforward machine code. Don't do any clever programming.while (b > 0) { c = b; if (a < b) then b = a else b = a - b; a = c; }
Run your program on at least the following inputs: (128, 36) and (55, 21)