Question:
If it takes 5 ns to read an instruction from memory, 2 ns to decode the instruction, 3 ns to read the register file, 4 ns to perform the computation required by the instruction, and 2 ns to write the result into the register file, what is the maximum clock rate of the processor?

Question

Question:
If it takes 5 ns to read an instruction from memory, 2 ns to decode the instruction, 3 ns to read the register file, 4 ns to perform the computation required by the instruction, and 2 ns to write the result into the register file, what is the maximum clock rate of the processor?

s3a · Answer

Solution:
The time for an instuction to pass through the processor must be greater than the clock cycle time of the processor. The total time to execute an instruction is just the sum of the times to perform each step, or 16 ns. The maximum clock rate is 1/cycle time, or 62.5 MHz.

s3a · Answer

What I would like to know is:
(1) Why must the time for an instruction to pass through the processor be greater than the clock cycle time of the processor?

(2) Why is 1/cycle time equal to the maximum clock rate instead of the minimum? I would think that one would claim that all those things can be done in 16 ns if the processor is at least at a certain discussed speed/clock frequency.

If you need more information from me, just ask.

anonymous · Answer

(1)
If you assume that a single instruction is executed in a single clock cycle, all those things mentioned must be done within that clock cycle (unless your processor is pipelined, but I assume it isn't).

Before the next instruction on the next clock edge can be executed, the instruction should be read and decoded, the registers should be read, some computation must be done and the result should be written back to a register. All those steps take ( 5 + 2 + 3 + 4 + 2 = 16) ns.

If for example the write back isn't done becuase the clock is faster than those 16 ns, the result of the instruction won't be stored in the registers. That means it's not accessible by any other instruction, so the whole instruction execution was just a waste of time.

(2)
The cycle time and frequency are what's called reciprocal. A high frequency means a short cycle time and a low frequency means a long cycle time.

For example if the frequency is larger than the 62.5 MHz (let's say 100 MHz), what's the cycle time?
$$\frac{1}{100 \cdot 10 ^ 6} = 10 \cdot 10^{-9}$$ or 10 ns. As I tried to explain in (1), a cycle time of 10 ns is not enough to execute the instruction. 

Any frequency above 62.5 MHz will have a cycle time of less than 16 ns and any frequency lower that 62.5 MHz will have a cycle time of more than 16 ns. A longer cycle time won't cause problems for executing an instruction and is perfectly acceptable (as long as you don't care about performance). Therefor, any frequency lower than 62.5 MHz is acceptable and the 62.5 MHz is the maximum.

s3a · Answer

Okay so, to phrase it in my words to see if I understand it properly, a processor needs to be slow enough to not leave one of the tasks/sub-instructions undone before its next cycle begins.

For example, if the processor were 125 MHz, each cycle would have a “cut off time” of 8 ns so if an instruction's tasks/sub-instructions take longer than that, the instruction would not be completed properly and would therefore be considered undone in a practical sense since doing a job sloppily is pointless.

Assuming there are no physics/engineering limitations other than the time it takes to for all sub-instructions of an instruction to complete, the greatest frequency you can give your processor (because you want it to run more instructions per unit of time) is one that is slow enough to not exclude all the sub-instructions of an instruction (which in this case means that the processor would need to be slow enough to take at least 16 ns per cycle).

So from an optimizations (of the performance/being-able to function at all in the first place) perspective, engineers want the HIGHEST frequency that is LOW ENOUGH to not leave any sub-instructions undone.

Basically, there is not only an upper bound for the clock cycle but a lower one too.

Is everything I said above correct?

P.S.
Sorry for writing a lot of semi-repetitive stuff.

s3a · Answer

Also, having read more about this, I would like to ask if what I called tasks/sub-instructions in the post above is actually referred to as "micro-operations". Is that also correct?

anonymous · Answer

Sounds like you understand it :)

The tasks/sub-instructions are not quite micro-ops. I know them as pipeline-stages. Some processors take complex instructions and divide those instructions internally to several simpler ones: the micro-operations.

E.g. you could divide a call to a function into 1) storing the current PC and 2) jumping to the address. Storing the PC and the jumping are the micro-instructions in this case.

s3a · Answer

Yay for understanding the main point! :)

Speaking of pipelining (which I don't know much about), could you tell me how pipelining differs from having n > 1 cores processing instructions at the same time?

To elaborate, based on what I researched, pipelining is about getting more than one instruction to be run at once but, all my life I've been thinking that a single-core processor cannot do two things at once so, what am I missing?

Also, without getting too specific, could you please relate this to Intel's hyper-threading technology?

anonymous · Answer

In a way, pipelining is executing more instructions at the same time. It's not so much about improving the execution time of a single cycle (in fact, it'll probably be worse... More on that later), but the main focus of pipelining is increasing the throughput: the number of instructions per second.

In short, using your example, pipelining comes down to this:
Operation 5 is fetching an instruction from memory,
while operation 4 is decoding the instruction,
while operation 3 is reading the register file,
while operation 2 is performing a computation,
while operation 1 is writing the result into the register file.

So yes, different operations are executed at the same time, but with pipelining, different parts of those operations are executing. You can't have e.g. two operations that are performing a computation.

So what's the deal with the clock frequency? Instead of doing all five parts in a single cycle, one part is done in a single cycle. That means the clock frequency should be able to hangle the slowest stage, which would be 5 ns in your example. That would give you a clock frequency of 200 MHz. The downside is that one single instruction now takes 5 * 5 ns (remember that there're five stages in our pipeline) = 25 ns, which is a bit slower than the 16 ns. 

But if you look at throughput, it's a different story. In the original processor, one operation is completed every 16 ns. But in the pipelined processor, one operation is completed every cycle or every 5 ns. 

(see also the Wiki page on `Instruction pipeline`)


There are some more issues though. If instruction i will write to register x and instruction i+1 needs to read register x, there is a problem. The processor will need to wait a bit (called stalling) before it can continue with instruction i+1. If I understand hyper-threading correctly (I'm not that familiar with the details of hyper threading), it tries to avoid the processor stalling by determining a smart schedule of two processes (which share the same physical processor) trying to make sure that as few as possible stalls occur.

s3a · Answer

I'm stuck at "That means the clock frequency should be able to hangle the slowest stage, which would be 5 ns in your example.". Isn't the slowest stage 2 ns?

anonymous · Answer

The slowest stage is the one take requires the longest time. If something is done in 2 ns, it finishes faster than when something else needs 5 ns.

s3a · Answer

"The slowest stage is the one take requires the longest time. If something is done in 2 ns, it finishes faster than when something else needs 5 ns."
Lol, oops. :)

"But if you look at throughput, it's a different story. In the original processor, one operation is completed every 16 ns. But in the pipelined processor, one operation is completed every cycle or every 5 ns."
Here when you say "operation", you mean operation = instruction = aggregation of all sub instructions, right?

To prase it in my own words in order to see if I understand what you said, essentially, in the case of the problem posted initially, if the processor is not pipelined it just takes "the amount of time that it should" to complete a single sub-instruction whereas, if the processor is pipelined, it takes the amount of time of the slowest sub-instruction except that it does five of them at a time instead of one so, if you average the speed, you get:

Average speed when not pipelined: 16 ns / 1 instruction = 16 ns / instruction

Average speed when pipelined: 25 ns / 5 instructions = 5 ns / instruction

Also, if the processor is not pipelined, it actually completes one instruction = five sub-instructions per cycle whereas, if the processor is pipelined, it completes a fifth of five different instructions per cycle.

Is everything I said in this latest post correct? (Please correct any mistake I could have made, no matter how small the mistake is.)

anonymous · Answer

That sounds about right, except for the average speed when pipelined. Actually, it depends on how you define 'average speed'. If you look at a single instruction, that one instruction is finished in 25 ns. 

BUt if you look at multiple instructions, you'll notice that one instruction is finished every 5 ns (that's what is called 'throughput': the number of instructions that are finished every second)

s3a · Answer

Actually, I DID mean "Throughoutput" when I said "Average speed" ... so, I guess I get it all!

Thanks again!

s3a · Answer

Actually, I do have a last quick question.:

When the solution says that "the time for an instruction to pass through the processor must be greater than the clock cycle time of the processor.", that sounds like it's saying the instruction should take more nanoseconds than a cycle's amount of nanoseconds but, is what that statement is trying to say just that the instruction must be processed before the cycle is over?

anonymous · Answer

That "the time for an instruction to pass through the processor must be greater than the clock cycle time of the processor" doesn't sound right to me. If the time for an instruction, say `Ti` is larger than the clock cycle time, say `Tc`, then there is a problem, like I mentioned in reply to your first question. If `Ti = 16 ns` and `Tc = 10 ns` (so we have that `Ti > Tc`), part of the instruction will not be executed.

But yeah, what's probably meant is that and instruction should be finished before the cycle is over.

s3a · Answer

Alright, thank you very much. :)