CS 222/322 - Machine Organization
Final Exam, Winter 1998
Wednesday, 18 March, 1:30-3:30 PM
7 problems on pages 1-2

Com Sci 222 and 322

  1. This problem covers some of the ideas in problem 2.3 of the text, comparing a simple load/store instruction set to a memory-memory instruction set for instruction bytes fetched and data bytes fetched. Assume that load/store instructions are 4 bytes long and memory-memory instructions are 8 bytes long. All data are 4 bytes long. Can load/store win on instruction bytes fetched? Can there be a tie? Can memory-memory win? Answer the same three questions for data bytes fetched.
  2. Look at the table describing the simple DLX pipeline that I provided on the Web. Create a new column for the table, defining a memory-memory data transfer instruction (it transfers a word of data from one memory address to another). Stick to the simple spirit of the table--no forwarding at all. Compare the performance of the new instruction to a LOAD followed by a STORE. How does the comparison change with aggressive forwarding?
  3. Describe 3 ways in which a larger cache block size can improve uniprocessor performance, and 3 ways in which a larger cache block size can harm performance. One sentence should suffice for each of the 6 points. You can solve this problem without knowing the write method.

  4. In a modern microprocessor, the memory hierarchy provides three different places to look for a given memory address: TLB, cache, and main memory. If each of these were completely independent, there would be 8 possible states for a memory address in the microprocessor. But, not all 8 states are possible. Which 3 are impossible if the cache uses physical addresses? Which of these becomes possible if the cache uses virtual addresses?
  5. In this problem, all frequencies are based on the uniprocessor dynamic instruction stream. You have a computer with 100 microprocessors. You are running a program that is 5% inherently sequential. The remaining 95% of the program is capable of using all 100 processors, but at a cost in synchronization and contention overhead. With P>1 processors in action, the CPI increases by a factor of tex2html_wrap_inline23 . What number of processors will run the program as fast as possible? What is the speedup over sequential processing with that number of processors?
  6. You have a multiprocessor using write-back, write-invalidate, write-allocate for snoopy cache coherence. In addition to the snoopy bus, there is a second nonsnoopy bus, where each message is read by only one CPU or by the memory. What messages can go on the nonsnoopy bus? How does this change if you switch to write-update?

Com Sci 322 only

  1. In problem 5, I chose to make the synchronization and contention overhead proportional to tex2html_wrap_inline25 rather than to P for two reasons--one has to do with realism and the other has to do with making the problem interesting. What were my two reasons? That is, why is it superficially reasonable to expect the overhead to be proportional to tex2html_wrap_inline25 ? And, how does part of the problem trivialize when overhead is proportional to P?

About this document ...

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 final.tex.

The translation was initiated by Mike O'Donnell on Thu Mar 11 21:27:32 CST 1999


Mike O'Donnell
Thu Mar 11 21:27:32 CST 1999