- Assembly Pipeline Machine Delay Slot Before Sturgis
- Assembly Pipeline Machine Delay Slot Before Sturgill
Pipelining increases the CPU instruction throughput - the number ofinstructions completed per unit of time. But it does not reduce the executiontime of an individual instruction. In fact, it usually slightly increasesthe execution time of each instruction due to overhead in the pipelinecontrol.
The increase in instruction throughput means that a program runs fasterand has lower total execution time.
Limitations on practical depth of a pipeline arise from:
1 Delay Slots (Optional) A machine has a ve-stage pipeline consisting of fetch, decode, execute, mem and write-back stages. The machine uses delay slots to handle control dependences. Jump targets, branch targets and destinations are resolved in the execute stage. (a) What is the number of delay slots needed to ensure correct operation?
Pipelinelatency. The fact that the execution time of each instructiondoes not decrease puts limitations on pipeline depth;Once the clock cycle is as small as the sum of the clock skew and latchoverhead, no further pipelining is useful, since there is no time leftin the cycle for useful work.
Imbalanceamong pipeline stages. Imbalance among the pipe stages reducesperformance since the clock can run no faster than the time needed forthe slowest pipeline stage;
Pipelineoverhead. Pipeline overhead arises from the combination of pipelineregister delay (setup time plus propagation delay) and clockskew.
Consider a nonpipelined machine with 6 execution stages of lengths 50 ns,50 ns, 60 ns, 60 ns, 50 ns, and 50 ns.
- Find the instructionlatency on this machine.
- How much time does it take toexecute 100 instructions?
Solution:
Assembly Pipeline Machine Delay Slot Before Sturgis
Instruction latency = 50+50+60+60+50+50= 320ns
Time to execute 100 instructions = 100*320 = 32000ns
Suppose we introduce pipeliningon this machine. Assume that when introducing pipelining, the clock skewadds 5ns of overhead to each execution stage.
- What is the instruction latency onthe pipelined machine?
- How much time does it take to execute100 instructions?
Solution:
Remember that in the pipelined implementation, the length of thepipe stages must all be the same, i.e., the speed of the slowest stageplus overhead. With 5ns overhead it comes to:
The length of pipelined stage = MAX(lengths of unpipelinedstages) + overhead = 60 + 5 = 65 ns
Instruction latency = 65 ns
Time to execute 100 instructions = 65*6*1 + 65*1*99 = 390 + 6435 = 6825 ns
- What is the speedupobtained from pipelining?
Solution:
Speedup is the ratio of the average instruction time without pipeliningto the average instruction time with pipelining.
(here we do not consider any stalls introduced by different typesof hazards which we will look at in the next section)
Assembly Pipeline Machine Delay Slot Before Sturgill
Average instruction time not pipelined = 320ns
Average instruction time pipelined = 65ns
Speedup for 100 instructions = 32000 / 6825 = 4.69