On Sat, 08 Sep 2007 15:16:12 -0700, John Larkin
On Sat, 08 Sep 2007 08:21:15 -0500, John Fields
On Fri, 07 Sep 2007 13:44:15 -0700, John Larkin
On Fri, 07 Sep 2007 15:06:31 -0400, Spehro Pefhany
On Fri, 07 Sep 2007 10:21:50 -0500, Vladimir Vassilevsky
John Larkin wrote:
Hmmm, seems like all 0's must be the lockup for an all XOR feedback.
OK, but you missed my point, which was that it's possible to
eliminate the lockup state by forcing it to be part of the sequence.
I don't follow that. One state, all 0's usually, is the lockup. How do
you force that to be part of the sequence?
You can add a lockup state detector, a big OR gate or something, and
jam in a "1" if the whole register ever gets to the all-0's state, but
then the all-0's state is not part of the sequence, because it never
happens again.
It can happen at the startup though. You have to ensure the nonzero
initial state.
VLV
If you care about fault tolerance you will ensure recovery from a zero
state which occurs at any time.
Best regards,
Spehro Pefhany
That gets into the philosophical issue: should we attempt to detect
and correct for transient hardware errors in digital systems? That can
apply to config bits in FPGAs (do we check them on a regular basis?),
registers in uPs (including PC, SP, etc), values in counters,
whatever.
We generally assume that if it's broke, it's broke.
---
The point here, though, is that the machine will get itself unbroke
if it ever accidentally gets into what would normally have been the
lock-up state.
And my point is that it shouldn't "accidentally" get into a broken
state, any more than the program counter of a CPU should accidentally
find itself in never-never land.
---
It shouldn't, but it can [get into a "broken" state] if that broken
state is allowed to exist. For instance, a glitch on a power supply
rail can cause any number of problems, including putting a shift
register in a prohibited state and causing a circuit to hang.
Power supply rails shouldn't glitch, and a decent system will either
work correctly through a brownout, or reset/restart properly if it
can't.
My circuit (Not "mine" in the sense that I invented it; I didn't.)
side-steps the problem by forcing the potentially problematical
normally prohibited state to be part of the sequence.
---
If a digital system is unreliable, the cause should be found and
fixed.
---
I agree, and my circuit is just one way to make the circuit more
reliable by totally eliminating the lock-up state.
---
The problem with kluges like this is the same problem with
watchdog timers: they hide the real problem, so keep it from getting
fixed.
---
It's hardly a kluge, and I have trouble understanding why someone as
ostensibly intelligent as you profess to be can't see that the
circuit eliminates a potential problem. Either that or you're
miffed about something.
We (me and my guys) have had this discussion, about whether we should
try to anticipate/sense/fix states that "can't happen", in the sense
that there's no logical path to the bad state. Our concensus is that
such sensing/fixing is fruitless, since any non-trivial system has
astronomically more hazardous states (like the enormous state space of
a uP program and its variables, or the megabits of configuration ram
in an FPGA) than can even be analyzed, much less repaired. If a
counter botches its state sequence, find out why and fix it. If a uP
program crashes, ditto. And don't design asynchronous state machines
that have small, lurking probabilities of screwing up. Things like
watchdog timers hide the design errors, so you neither fix things nor
learn.
---
I always turn off the watchdog timer on test units, and protos
delivered to customers. I only enable it after we're sure we don't
need it.
Because it may fix a system hangup caused by who-knows-what, things
that analysis and a few months of running didn't reveal, huge ESD
shots or something.
It is, after all, just one more thing that can go wrong.
I've never seen a watchdog timer cause a problem by malfunctioning on
its own. I did recently code a diagnostic routine that, given a
request to take and average zero ADC samples, would divide by zero,
hang, and trip the watchdog. I found the bug after some units had been
shipped, by reading the code, and as far as I know the bug was never
tripped. Cases like that justify enabling a watchdog *after*
considerable testing has not found provokable bugs that might trip it.
When my opinions differ from yours, you like to explain it by
conjecturing some sort of emotional crisis on my part. Do you treat
everybody that way?