If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?
For the most part, the clock rate of CUs has stopped progressing
because of power disipation issues, not because we cannot make the
clock signal go faster. Secondarily, the wire wall (wires are getting
slower as gates are getting faster) means that more clock cycles are
necessary to talk to remote parts of the chip. And finally, the memory
wall means that even if we sped up the clock rates, {Donning Nomex}
little performance drops to the bottom line due to the vast latencies
of a main memory read.
So, in effect, that limit has only been reached under the assumption
that power disipation is limited (to about 100 Watts). If somebody
comes up with a scheme whereby 1KWatts of power can be removed from a
chip of 13mm**3, and it costs about $10 in volume, then the clock rate
race will be "ON" again.
But even if the second paragraph becomes true, there is good reason to
believe that more performance can be placed on a die via
multiprocessors than through ever faster/bigger CPUs with more cache/
predictors/function-units that deliver ever less advancement per unit
area or per unit power (performance per Watt is often negative right
now as these things are added/extended).
For now the cpu makes multiple cycles per clock tick. (That's what the
cpu multiplier is for)
How long can that be a solution ?
Basically as long as the input clock can be detected with less than a
handful of picoseconds of (short term) jitter, the PLL multipliers can
multiply up that frequency to at least 10 GHz (maybe as high as 30
GHz) with adequate end point jitter control. The Cray {1, XMP, YMP,
2,...} computers kept a refrigerator sized boxes within a fraction of
a nanosecond of uncontrolled skew. All it takes is the power needed to
run the clock distribution network and a determined enginerring staf
to distribute the clocks.
It is that power that contributes to the lack of clock scaling you see
today.
Mitch Alsup
No longer at AMD.