## Practice Questions (and Answers) for Final Exam

CS 154, Winter 2020, Matni
IMPORTANT NOTE: These questions are NOT representative of EVERYTHING you need to study for the final exam! First of all, this is focused on post-midterm material. You should also review your midterm practice exam questions, all lab assignments questions and all class materials (lecture slides), including the examples and demos starting from the start of the quarter. The textbook is an important source of material and extra questions.

Given the following basic single-cycle CPU model, answer the following questions:


1. Given that: (a) registers $\mathbf{\$ a 0}=\mathbf{- 1 2 8}$ and $\mathbf{\$ s \mathbf { 0 }}=\mathbf{1 3 0}$, and (b) the instruction at the input is add $\$ \mathbf{t 0}, \$ \mathbf{s} 0, \$ a 0$, and (c) the instruction is in memory address $0 \times 00404400$,
a. what are the values for wires/buses A thru N? Give all your answers in an appropriately-sized hexadecimal.
b. what are the values of the controls RegDst, RegWrite, ALUSrc, Zero, MemRead, MemWrite, MemtoReg, PCSrc?
c. what operation is the ALU performing (not the code, but the function)?
2. Same as (1) for the next instruction in memory, which is addi $\mathbf{\$ v 0} \mathbf{0} \mathbf{\$ a 3}, \mathbf{5 1 0}$, where $\mathbf{\$ v 0}=3, \$ \mathbf{3}=2$.
3. Same as (2) for the next instruction in memory, which is $\mathbf{s w} \mathbf{\$ t 3}, \mathbf{4 4 ( \$ \mathbf { t 4 } ) ,}$ where $\mathbf{\$ t 3}=7, \mathbf{\$ t 4}=0 \times 70000004$.
4. Assume that logic blocks needed to implement a processor's datapath, like the one in the figure above in Question 1, have the following latencies:

$$
\begin{aligned}
& \text { Inst.Mem }=200 \mathrm{ps} \\
& \text { Add }=70 \mathrm{ps} \\
& \text { Mux }=20 \mathrm{ps} \\
& \text { ALU }=90 \mathrm{ps} \\
& \text { Regs }=90 \mathrm{ps} \\
& \text { DataMem. }=250 \mathrm{ps} \\
& \text { Sign-Ext. }=15 \mathrm{ps} \\
& \text { Shift-Left- } 2=10 \mathrm{ps}
\end{aligned}
$$

a. If the only thing we need to do in a processor is fetch consecutive instructions, what would the cycle time be?
b. What would the cycle time be for an add instruction? for an andi instruction? for a sw instruction?
c. Could we utilize a clock whose speed is 2.0 GHz for this design? Why or why not?
5. Assume a single-cycle design CPU fetches the following instruction word:

10101100011000100000000000010100 , which resides in memory address $\mathbf{0 x 0 0 5 1 1 2 3 4}$.
Assume that data memory is all zeros and that the processor's registers (as referenced by their MIPS register numbers) have the following values at the beginning of the cycle in which the above instruction word is fetched:

$$
\mathrm{r} 0=0, \mathrm{r} 1=-1, \mathrm{r} 2=2, \mathrm{r} 3=3, \mathrm{r} 4=-4, \mathrm{r} 5=10, \mathrm{r} 6=6, \mathrm{r} 8=8, \mathrm{r} 12=2 \mathrm{r} 31=-16
$$

a. What is the MIPS instruction that was fetched?
b. What function, if any, is the ALU performing for this instruction?
c. What is the new PC address after this instruction is executed?
d. What are the values of all 4 data inputs for the "Registers" unit? Give your answer in appropriate-length hexadecimals and use $\mathbf{X}$ for don't care values.
6. Assume that the individual stages of a MIPS CPU datapath have the following latencies:

$$
\mathbf{I F}=250 \mathrm{ps}, \mathbf{I D}=350 \mathrm{ps}, \mathbf{E X}=150 \mathrm{ps}, \mathbf{M E M}=300 \mathrm{ps}, \mathbf{W B}=200 \mathrm{ps}
$$

a. What is the clock cycle time in a pipelined and non-pipelined processor?
b. What is the approximate clock cycle speed-up from non-pipelined to pipelined?
7. Assume a 5 -stage pipelined MIPS datapath that takes the following instructions:

```
add $t5, $t2, $t1
lw $t3, 4($t5)
lw $t2, 0($t2)
or $t3, $t5, $t3
sw $t3, 0($t5)
```

a. Identify all hazards (all types).
b. If the hardware does not have hazard or forwarding detection, (manually) insert nop instructions to ensure correct execution.
c. Draw a multiple-clock-cycle diagram like the generalized one shown here (or something like Figure 4.52, on Page 305 in the book) and show all the forwards and stalls, if any, needed to have in order to correctly execute these instructions.


ANSWERS:
See table for answers for Questions 1, 2, and 3:

| Wire/Bus Control | Instruction 1 <br> add \$t0, \$s0, \$a0 | Instruction 2 <br> addi \$v0, \$a3, 510 | $\begin{aligned} & \text { Instruction } 3 \\ & \text { sw } \$ \mathrm{t} 3,44(\$ \mathrm{t}) \end{aligned}$ |
| :---: | :---: | :---: | :---: |
| A | 0x10 | 0x07 | 0x0C |
| B | 0x04 | 0x02 | 0x0B |
| C | 0x08 | X | X |
| D | 0x00000002 | 0x00000201 | X |
| E | X | $510=0 \times 01 \mathrm{FE}$ | $44=0 \times 002 \mathrm{C}$ |
| F | X | 0x000001FE | 0x0000002C |
| G | $130=0 \times 00000082$ | 0x00000002 | 0x70000004 |
| H | $-128=0 \times F F F F F F 80$ | 0x00000003 | 0x00000007 |
| I | 0xFFFFFF80 | 0x000001FE | 0x0000002C |
| J | 0x00000002 | 0x00000201 | 0x700000030 |
| K | X | X | X |
| L | $\mathrm{PC}=0 \times 00404400$ | $\mathrm{PC}=0 \times 00404404$ | $\mathrm{PC}=0 \times 00404408$ |
| M | $\mathrm{PC}+4=0 \times 00404404$ | $\mathrm{PC}+4=0 \times 00404408$ | $\mathrm{PC}+4=0 \times 0040440 \mathrm{C}$ |
| N | $\mathrm{PC}+4=0 \times 00404404$ | $\mathrm{PC}+4=0 \times 00404408$ | $\mathrm{PC}+4=0 \mathrm{x} 0040440 \mathrm{C}$ |
| RegDst | 1 | 0 | 0 |
| RegWrite | 1 | 1 | 0 |
| ALUSrc | 0 | 1 | 1 |
| Zero | 0 | 0 | 0 |
| MemRead | 0 | 0 | 0 |
| MemWrite | 0 | 0 | 1 |
| MemtoReg | 0 | 0 | X |
| PCSre | 0 | 0 | 0 |
| ALU Op type | Add | Add | Add |

1. 

a. $\quad$ Add $+\mathrm{Mux}=90 \mathrm{ps}$
b. For add: $\mathrm{IM}+\mathrm{Reg}+\mathrm{Mux}+\mathrm{ALU}+\mathrm{Mux}=420 \mathrm{ps}$

For andi: $\mathrm{IM}+\mathrm{Reg}+$ SignExt + Mux + ALU + Mux $=435 \mathrm{ps}$
For sw: IM + Reg + SignExt + Mux + ALU + DM + Mux $=685 \mathrm{ps}$
c. No, because a 2.0 GHz clock has a clock cycle of 500 ps which is too short for all instructions to work with.
2.
a. sw \$v0, 20(\$v1)
b. Add
c. $\mathrm{PC}+4=0 \times 00511238$
d. For "Read Reg1" (rs), the value is 3 in 5 -bits $=0 \times 03$.

For "Read Reg2" (rt), the value is 2 in 5 -bits $=0 \times 02$.
For "Write Reg" (rt), the value is 2 in 5 -bits $=0 x 02$.
For "Write Data", there is no value that we care about, so X.
3.
a. Non-pipelined is 1250 ps (sum of all stages' latencies) and pipelined is 350 ps (same as the longest stage latency).
b. About 3.6
4.
a. The $\$ t 5$ reg between add and the $1^{\text {st }} \mathrm{lw}$.

The $\$ \mathrm{t} 3$ reg between the $1^{\text {st }} \mathrm{lw}$ and the or.
The $\$ 43$ reg between the or and the sw.
add \$t5, \$t2, \$t1
lw \$t3, 4(\$t5)
lw \$t2, 0 (\$t2)
or \$t3, \$t5, \$t3
sw \$t3, 0(\$t5)
b. If there is NO forwarding, then we have to insert bubble delays (nop instructions):

3 nops between add and $1^{\text {st }} \mathbf{l w}$
2 nops between $2^{\text {nd }} \mathbf{l w}$ and or
3 nops between or and sw
c. 3 forwards done, no stalling: forward $\$ t 5$ from EX/MEM pipeline reg to EX input (add to $1^{\text {st }} \mathbf{l w}$ ) forward $\$ \mathrm{t} 3$ from MEM/WB pipeline reg to EX input ( $1^{\text {st }} \mathbf{l w}$ to or) forward $\$$ t3 from EX/MEM pipeline reg to EX input (or to sw)


