You must be aware that the data forwarding hardware discussed previously are implicitly assumed, but these hardware does not provide any support in executing the branch instruction..... and therefore left out of the pictures.
|
The reason that we can use the adder in the ALU stage is "tardiness" - it takes too long to bring PC1 and offset to the ALU stage and add there (when we do that, the CPU fetches 3 instructions before the branch will take place)
Note that this a significant amount of overhead (price) that we must pay to speed up branching....
Using the result in IR1 is not timely because the output of IR1 is equal to the constant only at the END of the CPU cycle
To make this work, we have to update the PSR in the middle of the CPU cycle and the PC at the end of the CPU cycle.
I will show you the two types of branch instructions: unconditional and conditional