K
krw
[email protected] says... [...]The way I had imagined it was that the registers of the virtual CPUs
that are not currently running would be in a different place than the
ones that are actually being used. My concern was not increasing the
fan in and out of the busses on the ALU so that there would be no
increase in the loading and hence delay in those circuits.
If they're "somewhere" else, they have to be un/re/loaded. That
takes substantial time.
Yes, it may take a clock cycle to do the register swapping. Reducing
the number of registers on the bus allows those clock cycles to be at
a higher frequency so I think the advantage will out weigh the
disadvantage. BTW: I'm assuming several CPUs and lost of sets of
registers are on one chip.
It's going to take a *LOT* more than a clock cycle. You have to find
all the data in the file and you can't broadside that much data.
I was thinking in terms of a not very pipelined CPU so that the switch
over could happen in a few cycles. The registers currently being
written would have to stay in place until the write finished. This is
part of why I'm assuming a fairly simple CPU.
Why bother then? If you're giving it that much dead time simply do
things serially. You're essentially allowing time for a complete
context switch.
I don't see how you come to that conclusion.
Because that's how it's done? You have another source/destination
accessing the register file.
Yes and when a multiply doesn't draw an amp.
What does an amp matter at <1V?
My 32 bit -> 16 bit integer sqrt() for the 8051 doesn't use a look up
table and yet is fairly quick about it. It uses two observations:
1 - The sum of the first N odd numbers is N^2
2 - If you multiply X * 4, sqrt(X) doubles and both are shifts.
There is nothing in an 8051 that can be considered "fairly quick".
....and I rather like 8051s.