CS355 Sylabus

Improving the branch delay in Branch instructions

Branch Instruction Processing
- In order to solve a problem, you gotta understand it first - and I mean, really understand it, not just half-and-half understanding
- So why did it take so long to branch ???
  - The branch instruction contains an offset that need to be added to the value of the PC to form the final target destination address - the branch instruction "jumps" to this target destination address
Side Note....
- You may or may not have noticed that BRANCH instructions DO NOT use registers as source operands, nor do they update registers
- So all this data forwarding hardware stuff is absolutely not applicable to BRANCHING instructions
- Because the CPU with the data forwarding hardware depicted is crowded (not to mention messy), I will discuss the issues using the original basic pipeline
  You must be aware that the data forwarding hardware discussed previously are implicitly assumed, but these hardware does not provide any support in executing the branch instruction..... and therefore left out of the pictures.
What is needed to execute a (conditional and unconditional) BRANCH instruction ?
- Value of the PC (PC1)
- Value of the Offset (IR1)
- Conditional codes

When is the earliest time that the needed values are available in the CPU ?

Information needed to execute a (conditional) branch instruction:

Value of the program counter (PC): (to compute the address in a relative branch

Value of the (relative) branch offset

This value of the branch offset is available in the IR1 register in the ID stage
This value can also be obtained from the source operand 2 field in the instruction code

Value of the flags (N, Z, V, Z) computed by the compare instruction prior to the conditional branch instruction

The following figure depicts the location of the (conditional) branch instruction and its preceeding compare instruction during a pipelined execution:
The figure above also show where to find (i.e., the locations) of the relevant data needed to execute the branch instruction:
- The offset in IR(ID) must be extracted
- Then added to the value in PC1
- The conditional code produced by the ALU along with the branch condition is used to determine if the branch will be taken.

Hardware used to speedup the branch instruction execution
- Now that we know what needs to be done, it is relatively easy to add (new) hardware to speed up the execution of the branch instruction.
- The following figure show the wiring scheme:
- Let's look at how the improve pipeline execute branch instructions....
  I will show you the two types of branch instructions: unconditional and conditional