[plug] Machine crashing (long sorry)

Craig Ringer craig at postnewspapers.com.au
Tue Feb 3 17:36:42 WST 2004


> The only time I've encountered this, it was due to severe filesystem
> corruption from failing hard disks.

To be more specific, I was seeing random crashes and unprompted reboots
from the machine, especially during the weekly backups. I was on holiday
in New Zealand trying to deal with over the phone, and it was only upon
my return that I was able to test properly and confirm the FS
corruption. Western Digital's tools indicated that the drive was peachy.
As the motherboard was slightly suspect, we actually replaced that (+CPU
and RAM) first - but the problem recurred within a couple of weeks.

This time, the PSU, disks and ATA cabling were replaced - as we had no
idea at that time what was wrong and needed the machine going reliably.

Only once the machine failed yet again - the same way - did I figure it
out, as I found out how to use smartctl to query the drive attributes.
The reallocated sector count was at something like 2000; once I ran the
WD tools on the drive ("problems were found, but have been fixed") it
said something like 2400. Another tools run and it went up to 2600. The
disk corrupted data quite fast, too. 

I plugged the old disk into a test system and found similar issues with
it - pretty much confirming the source of the problem. The nice folks at
Austin accepted both disks as returns for credit against a pair of
Seagate Barracudas - which have been doing good service in the machine
ever since.

Note that even when the disk was almost unusable due to the rate of bad
sectors, an extended SMART self-test (smartctl -x /dev/device, when not
accessing /dev/device AT ALL) still reported "I'm OK".

Thankfully, Seagate's disk tools appear much better than WDs. Not as
good as IBM's tools, but ... well, with an IBM drive you'd /need/ those
disk tools ;-)

-- 
Craig Ringer




More information about the plug mailing list