It's pretty rare that a chip has a bug that is not documented, as
opposed to a faulty device. Certainly someone has to find unintended
features of ICs, but more often than not it is the manufacturer,
although Joerg had some frustrating times with a TI LDO.
That said, I review all the documents for a chip thoroughly, just as
John notes below, including any errata. My datasheet error rate (finding
incorrect or even missing information in a datasheet) is pretty high;
probably about 1 in 5 data sheets. I would prefer the information to be
missing to being wrong - I know a number of people who have been bitten
by such things. I then talk to the manufacturer to get the right answer;
refusal to accept/fix an error in the datasheet is grounds to ditch the
part.
First, I read the datasheet and appnotes carefully for any hint of
gotchas. If we're doing anything unusual or risky or sole-sourced, we
test a few actual parts as well. If a part/mfr is deemed acceptable,
we enter that mfr and his part number into our materials database as
acceptable for purchase to satisfy out in-house part number. We know
which assemblies use each part, and can control it if, for example,
only an OnSemi part should be used on a particular assembly.
I (and probably any designer of reasonable experience) use that same
discipline. Software guys'n'gals might not understand the shudders the
words 'single-source' and 'allocation' cause to a hardware designer.
And engineers around here *only* design with approved parts. To create
a new library/inventory part, they have to get approval from the parts
czar, who happens just now to be me.
It's a good thing to have an anal-retentive parts czar for a lot of
reasons. The two primary ones (my view, YMMVG) are a part I am already
using is known to operate within a spec (which may or may not be the
datasheet, but it's the devil I know) and using the same part gives me
economies of scale, quite apart from the administrative and engineering
cost of adding a part, new schematic part, new symbol (all of which must
be checked of course), footprint etc.
Again, (for Jan's benefit) this is a discipline that hardware designers
live with daily.
*PLUS* schematics and pcb layouts are group reviewed before release.
Amen.
How many people have other people read their code?
Well, code reviews can happen, but in general (as with all
generalisations, it's not necessarily true of any one person) software
types tend to be more defensive about their designs that hardware. I've
been in design reviews where asbestos underwear might have been a
reasonable clothing choice, but all the comments were criticism of the
design and methodology, not the person. I've also noticed this tends to
afflict pure HDL people as well (as opposed to a board designer who also
does HDL). Obviously, YMMVG.
With a defensive posture, the review breaks down long before any
reasonable work could be done. I've let hardware designers go because
they couldn't separate criticism of a design from criticism of
themselves as people, and therefore simply argued because they felt
insulted rather than argue the technical merit of their approach.
Well, we test things pretty hard. If a bug does turn up, we find out
why and document the facts, and the actions to be taken, in an ECO. We
know the configuration of *every single* product in the field, and
notify users if the bug affects them.
I've done a huge amount of code to test hardware, and it is (imo)
perhaps the toughest code I had to write. It has to trap all possible
errors (and report unusual events I had not foreseen) without crashing /
unloading the driver / other side effects; indeed the test code has to
be so robust it keeps on running even in the event of a critical failure
of the test item *so I can know what failed*.
Typical production code just bombs out with 'failed' - like that's
really a lot of information. At one place we had a some compact PCI
boards we designed and I rewrote the drivers so they wouldn't just stop
on encountering an error; they instead maintained a status word that
could be queried so I would know that *if* the driver encountered
problems it could let me know what the problem was. That code was
ultimately found to be quite useful and ended up in the shipping product.
It's no illusion that professional hardware design consistantly
produces solid products and that professional software design often
produces bloated, buggy crap. As I said, I use hardware disclipines to
produce code, and that code inherits the simplicity and reliability of
the discipline.
Perhaps discipline is the key; software can simply edit and rebuild, but
a hardware change is not quite as simple. For that reasons we have to
strive all the time to 'get it right the first time', a discipline some
in software see as not applicable (perhaps with some justification,
although I don't subscribe to that view).
Software should be *more* reliable than hardware, since software has
no inherent failure modes, isn't subject to temperature changes, power
glitches, parts variability or EMI, and is precisely reproducable
times a million copies... no solder joints, no part tolerances. Yet
it's the hardware that's usually most reliable. Software is buggy
because of miserable programming methodologies and practices. Mine's
not.
This is sad: FPGA design, these days, is dominated with struggling
with the software tools, trying to get the compilers to grudgingly
agree to do what you know you want done on-chip, and then trying to
get the compilers to run to completion without crashing. See
comp.arch.fpga: it's mostly about struggling against the tools. The
FPGAs themselves - the hardware - work fine. Xilinx 9.1 is just out -
a 1.5 gig download - and SP1 is already out, another gigabyte. I
wonder if they've fixed any of the memory leaks.
I still wish for FPGA design tools that will let me override the
optimisation settings on a per module (at least) basis - I might have
very good reasons for not desiring to optimise a gate away. Indeed some
tools are, in a phrase I have coined many a time, 'just smart enough to
be stupid'.
Schematic and layout tools have their issues, but given the type of
feedback those vendors get, they aren't nearly as bad as some synthesis
tools.
Cheers
PeteS