In sci.electronics.repair Arno said:
It is extremly unlikely for a slow chemical process to achive this
level of syncronicity. About as unlikely that it would be fair to call
it impossible
Your array died from a different cause that would affect all drives
simultaneously, such as a power spike.
Yes, they did not die from contacts oxidation at that very same moment. I
can not even tell they all died the same month--that array might've been
running in degraded mode with one drive dead, then after some time second
drive died but it was still running on one remaining drive. And only when
the last one crossed the Styx the entire array went dead. I don't use
Windows so my machines are never turned off unless there is a real need for
this. And they are rarely updated once they are up and running so there is
no reboots. Typical uptime is more than a year.
I don't know though how I could miss a degradation alert if there was any.
All 3 drives in the array simply failed to start after reboot. There were
some media errors reported before reboot but all drives somehow worked. Then
the system got rebooted and all 3 drives failed with the same "click of
death."
The mechanism here is not that oxidation itself killed the drives. It never
happens that way. It was a main cause of a failure, but drives actually
performed suicide like body immune system kills that body when overreacting
to some kind of hemorrargic fever or so.
The probable sequence is something like this:
- Drives run for a long time with majority of the files never
accessed so it doesn't matter if that part of the disk where they
are stored is bad or not
- When the system is rebooted RAID array assembly is performed
- While this assembly is being performed a number of sectors on a
drive found to be defective and drive tries to remap them
- Such action involves rewriting service information
- Read/write operations are unreliable because of failing head
contacts so the service areas become filled with garbage
- Once the vital service information is damaged the drive is
essentially dead because its controller can not read vital data to
even start the disk
- The only hope for the controller to recover is to repeat the read
in hope that it might somehow get read. This is that infamous
"click of death" sound when drive tries to read the info again and
again. There is no way it can recover because that data are
trashed.
- Drives do NOT fail while they run, the failure happens on the next
reboot. The damage that would kill the drives on that reboot
happened way before that reboot though.
That suicide also can happen when some old file that was not accessed for
ages is read. That attempt triggers the suicide chain.