Maker Pro
Maker Pro

If clocks too slow then switch to asynchronous ?

S

Skybuck

Hello,

If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?

For now the cpu makes multiple cycles per clock tick. (That's what the
cpu multiplier is for)

How long can that be a solution ?

Bye,
Skybuck.
 
R

Rene Tschaggelar

Skybuck said:
Hello,

If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?

For now the cpu makes multiple cycles per clock tick. (That's what the
cpu multiplier is for)

How long can that be a solution ?


To the contrary actually. Asynchroneous reception
means there has to be a clock usually power-of-2-
multiple of the bit rate.

Rene
 
M

MooseFET

Hello,

If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?

For now the cpu makes multiple cycles per clock tick. (That's what the
cpu multiplier is for)

How long can that be a solution ?

Asynchronous designs are *way* harder to do. It is much harder to
automate the process.

When you go through a register, the setup and hold times can be
checked at the input and then the timing of the output can be assumed
for further checking. In the asynchronous case, you have to follow
through all the logic paths and figure the delays at each step. If
there are many parts and many paths the number of computations gets
huge.

Some people are going with a very fast clock and declocking sections
of the chip when they are not needed. This way they can lower the
average power to prevent overheating without lower performance in most
cases. They include a bit of logic that slows things down if the CPU
gets too hot.

There is a new direction where the grain size of the declocking is
made very small. This gets most of the reduction in power that an
asynchronous design could do without making the design so much
harder. I predict that the next step on this path will be the local
monitoring of temperature.
 
S

Skybuck

To the contrary actually. Asynchroneous reception
means there has to be a clock usually power-of-2-
multiple of the bit rate.

Rene

Huh ?

Asynchronous cpu's should not need a clock.

It's like domino's, use it to signal stuff.

Bye,
Skybuck.
 
R

Rene Tschaggelar

Skybuck said:
Huh ?

Asynchronous cpu's should not need a clock.

It's like domino's, use it to signal stuff.

Oops, I was thinking about a UART and SPI.
What speed are you takling about ? I know synchroneous
circuits with 3 GBits. Beyond that ?

Rene
 
S

sirinath

Does any body know what are the research groups that are there looking
into this area?
 
M

MitchAlsup

If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?

For the most part, the clock rate of CUs has stopped progressing
because of power disipation issues, not because we cannot make the
clock signal go faster. Secondarily, the wire wall (wires are getting
slower as gates are getting faster) means that more clock cycles are
necessary to talk to remote parts of the chip. And finally, the memory
wall means that even if we sped up the clock rates, {Donning Nomex}
little performance drops to the bottom line due to the vast latencies
of a main memory read.

So, in effect, that limit has only been reached under the assumption
that power disipation is limited (to about 100 Watts). If somebody
comes up with a scheme whereby 1KWatts of power can be removed from a
chip of 13mm**3, and it costs about $10 in volume, then the clock rate
race will be "ON" again.

But even if the second paragraph becomes true, there is good reason to
believe that more performance can be placed on a die via
multiprocessors than through ever faster/bigger CPUs with more cache/
predictors/function-units that deliver ever less advancement per unit
area or per unit power (performance per Watt is often negative right
now as these things are added/extended).
For now the cpu makes multiple cycles per clock tick. (That's what the
cpu multiplier is for)

How long can that be a solution ?

Basically as long as the input clock can be detected with less than a
handful of picoseconds of (short term) jitter, the PLL multipliers can
multiply up that frequency to at least 10 GHz (maybe as high as 30
GHz) with adequate end point jitter control. The Cray {1, XMP, YMP,
2,...} computers kept a refrigerator sized boxes within a fraction of
a nanosecond of uncontrolled skew. All it takes is the power needed to
run the clock distribution network and a determined enginerring staf
to distribute the clocks.

It is that power that contributes to the lack of clock scaling you see
today.

Mitch Alsup
No longer at AMD.
 
S

Stefan Monnier

If somebody comes up with a scheme whereby 1KWatts of power can be removed
from a chip of 13mm**3, and it costs about $10 in volume, then the clock
rate race will be "ON" again.

I'm not even sure that's true. The $10 cost will be dwarfed by the cost of
the 1kW of power (plus air-conditioning, ...).
Yes, there would surely be some interest in such monsters, but such
a renewed "clock rate race" would probably stay confined to a fairly small
market compared to what we've seen at the end of last century.


Stefan
 
A

acd

Hello,

If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?

For now the cpu makes multiple cycles per clock tick. (That's what the
cpu multiplier is for)

How long can that be a solution ?

Bye,
Skybuck.


In a last-year's issue of the IEEE Journal on Solid-state circuits was
an asynchronous flipflop and logic design style. As an example they
used a multiplier. I was shocked by the overhead required for the
asyncrhonous handshaking. Comparing this with the aggressive 11 SOI
(if I am not mistaken) design of the Cell's SPEs synchronous design
gets us much further.
The on-chip clock generation I think is in principle not harder than
the handshaking of an asynchronous circuit.

Andreas
 
R

Robert Myers

For the most part, the clock rate of CUs has stopped progressing
because of power disipation issues, not because we cannot make the
clock signal go faster. Secondarily, the wire wall (wires are getting
slower as gates are getting faster) means that more clock cycles are
necessary to talk to remote parts of the chip. And finally, the memory
wall means that even if we sped up the clock rates, {Donning Nomex}
little performance drops to the bottom line due to the vast latencies
of a main memory read.

So, in effect, that limit has only been reached under the assumption
that power disipation is limited (to about 100 Watts). If somebody
comes up with a scheme whereby 1KWatts of power can be removed from a
chip of 13mm**3, and it costs about $10 in volume, then the clock rate
race will be "ON" again.
Hmmm. My brief review of the subject a couple of years back led me to
the perception that one of the reasons for going asynchronous is that
it can result in lower power operation for comparable performance. I
also came away with the perception that asynchronous isn't common
because it isn't common; i.e., little design experience, inadequate
tools, formidable design challenges.

You've proposed two walls: a power wall and a memory wall. The memory
wall has been pounded to the end of the earth and I'd rather not go
there again. If you could beat the power wall with asynchronous
operation, I'll bet there's a market.

Robert.
 
S

sirinath

Asynchronous is the way forward. there are various synchonisation
mechanisms. Resently I was reading a article on sunlabs about
processor called FleetZero which they have made
 
S

sirinath

Hi,

Is there any possibility of a Ph.D. Studentship possition there?

Regards Suminda
 
M

MooseFET

Asynchronous is the way forward. there are various synchonisation
mechanisms. Resently I was reading a article on sunlabs about
processor called FleetZero which they have made


I disagree. I don't see it as a path to any major break throughs. I
think it is tuning to a local maximum.

Google has been having trouble posting. Before I got further I will
post this
 
M

MooseFET

I disagree. I don't see it as a path to any major break throughs. I
think it is tuning to a local maximum.

Google has been having trouble posting.

It seems to be working so I will say more.

You can get about as much reduction in power by using a fine grained
declocking of the chip. Declocking allows all the normal design
methods to be used and reduces the troubles in following the prop.
delays through all paths.

Asynchronous design only reduces the number of transistors and the
power consumption by a nearly fixed percentage. It doesn't make the
growth in each follow a slower curve. To break the growth off the
curve it is on, we need a technology that goes away from using a logic
gate for each logic operation.

To explain what I mean by this, take the case of an AND logic gate
implemented with a rely. The coil is connected to one signal and the
NO contact is connected to the other. The NC contact is perhaps
grounded and the COM is the output. This makes a logic gate that does
the needed function. If you need to implement (A and B) and (A and C)
and (A and D), you would be tempted to put in three relays and need
about 3 times the power. You could, however, use a relay that has
three sets of contacts and require less than three times the power.
This is the sort of thing that a silicon version of would allow us to
break of the current power growth curve.
 
this remind me of something actualy a true case there was a clock that timed days and after days it was totaly wrong the solution after days of research was to make it count in gray code after that there was no more mistakes. synchronous or not one mistake ripple trough. with gray code it is absolute.
 
J

Joseph H Allen

Skybuck said:
If the limit has been reached for generating clock signals then switch
to asynchronous circuitry design ?

How does async design compare to latch-based skew-tolerant design? With
skew-tolerant design, you care only about the propogation delay through the
latch, and don't care so much about the clock. This lets you borrow time
from a shallow pipe-line stage for use in a deep-stage. Anyway, it seems
that this methodology has many of the advantages of async design, but
without it's problems: mainly that you don't have to worry about glitches.

Another question I have is logic size: yes with async design you do not have
a large global clock network, but the async design elements tend to be
larger (to avoid hazards). It would be interesting to see a comparison
between the best clocked logic with the best async logic. Both with scan
chains, or whatever is used to check for silicon defects.
 
T

Torben Ægidius Mogensen

Another question I have is logic size: yes with async design you do not have
a large global clock network, but the async design elements tend to be
larger (to avoid hazards).

Quite true. But in some cases you can make the async circuit smaller,
as you don't need to optimize for rare worst-case delays.

An example: The simplest adder is a ripple-carry adder, but that can
in the worst case take O(N) to settle (where N is the number of bits).
Hence, sync designs tend to use carry-lookahead or carry-select adders
that have a worst case propagation of O(log(N)), but are considerably
larger than ripple-carry adders. However, a ripple-carry adder has an
average delay of O(1), so an async ripple-carry adder can be faster
(on average) than a sync carry-lookahead or carry-select adder. And
smaller too.


Torben
 
Top