Executing an unconditional branch instruction using the modified ID stage

The unconditional branch instruction do not use the flag values and will always update the PC with the value PC + offset
Executing the unconditional branch is easier to understand
(And therefore I will present this first)
I will use the following program to show you the branch delay using the modified ID stage:
(The execution does not use data forwarding and I will omit the data forwarding circuits in the diagrams to keep the material simple)

Executing an unconditional branch instruction using the modified ID stage - Example

Start of cycle 1: IF stage is fetching the branch instruction:

Executing an unconditional branch instruction using the modified ID stage - Example

End of cycle 1: BRA Lable is fetched in IR(ID)

Because we always make a copy of the PC into the ID stage, the value of PC available as source operand !

Executing an unconditional branch instruction using the modified ID stage - Example

Start of cycle 2: ID stage computes PC + offset, IF stage fetches add r2,r1,r2:

The MUX will always select PC + offset (= Label) as input when executing an unconditional branch instruction

Executing an unconditional branch instruction using the modified ID stage - Example

End of cycle 2: PC is updated to address Label , add r2,r1,r2 is fetched in IR(ID)

The CPU made the branch in 2 clock periods !! There was a one-instruction branch delay...

DEMO (using Aaron's pipelined CPU)

Execute this command on a lab machine:

/home/cs355001/demo/pipeline/6-speedup-bra

Program being executed:

0: 10 62 // mov r1,#62 18 1 // mov r2,#1 26 1 // mov r3,#1 34 1 // mov r4,#1 42 1 // mov r5,#1 50 1 // mov r6,#1 58 1 // mov r7,#1 0 0 // nop 0 0 // nop 0 0 // nop 0 0 // nop 0 0 // nop 12: 192 44 // bra +44 16 10 // add r2,r1,r2 (R2=R1+R2) <-- Only one instr is executed 24 11 // add r3,r1,r3 (R3=R1+R3) 32 12 // add r3,r1,r4 (R4=R1+R4) 40 13 // add r4,r1,r5 (R4=R1+R4) 48 14 // add r5,r1,r6 (R4=R1+R4) 56 15 // add r6,r1,r7 (R4=R1+R4) 56: 0 1 // <---- bra target (56 = 111000) 0 2

Demo: delayed branching in a real (SPARC) CPU

NOTE: the delayed branching instruction can be avoided by stalling the IF stage (thus preventing the CPU from fetching the next instruction....)
Most CPU's (e.g.: ARM) will avoid using a delayed branch instruction
However: the SPARC CPU does have a delayed branch

I can show you the delayed branch on my home machine that has a SPARC CPU:

~/.home2 User: cs355001 (my own passwd) cd /home/cs355001/demo/delay-branch /home/cs255000/bin/as255s sparc-DELAY-BRANCH /home/cs255000/bin/sparc (load prog and run it)