[plug] server failing with bizarre disk errors

Craig Ringer craig at postnewspapers.com.au
Wed Apr 9 19:02:21 WST 2003


Jon: looks like you might be right about the bad disk after all. I 
finially thought of running the SMART diagnotics:

(  5)Reallocated Sector Ct   0x0033   196   196   140       54

on hda. That's not death on a HDD but it sure isn't right, given:
(  4)Start Stop Count        0x0032   100   100   040       34
( 12)Power Cycle Count       0x0032   100   100   000       33
(  9)Power On Hours          0x0032   099   099   000       1332

Lets see.... pretty close to one bad sector PER DAY over the drive's 
(very short) lifetime.

I can only guess that hdb has problems due to a read on hda blocking 
access to the ATA bus, and that's what is causing the apparent multiple 
failure. That's the only thing I can think of that could cause the 
random distribution of errors between hda and hdb.

Oh well, I needed a RAID array anyway.

-----------------------

access:/home/craig# smartctl -v /dev/hda
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 16
Attribute                    Flag     Value Worst Threshold Raw Value
(  1)Raw Read Error Rate     0x000b   200   200   051       0
(  3)Spin Up Time            0x0007   172   161   021       4408
(  4)Start Stop Count        0x0032   100   100   040       34
(  5)Reallocated Sector Ct   0x0033   196   196   140       54
(  7)Seek Error Rate         0x000b   100   253   051       0
(  9)Power On Hours          0x0032   099   099   000       1332
( 10)Spin Retry Count        0x0013   100   253   051       0
( 11)Calibration Retry Count 0x0013   100   253   051       0
( 12)Power Cycle Count       0x0032   100   100   000       33
(196)Reallocated Event Count 0x0032   199   199   000       1
(197)Current Pending Sector  0x0012   200   200   000       0
(198)Offline Uncorrectable   0x0012   200   200   000       0
(199)UDMA CRC Error Count    0x000a   200   253   000       0
(200)Unknown Attribute       0x0009   200   200   051       0

access:/home/craig# smartctl -v /dev/hdb
Vendor Specific SMART Attributes with Thresholds:
Revision Number: 16
Attribute                    Flag     Value Worst Threshold Raw Value
(  1)Raw Read Error Rate     0x000b   200   200   051       0
(  3)Spin Up Time            0x0007   113   100   021       3408
(  4)Start Stop Count        0x0032   100   100   040       64
(  5)Reallocated Sector Ct   0x0033   200   200   140       0
(  7)Seek Error Rate         0x000b   200   200   051       0
(  9)Power On Hours          0x0032   097   097   000       2357
( 10)Spin Retry Count        0x0013   100   253   051       0
( 11)Calibration Retry Count 0x0013   100   253   051       0
( 12)Power Cycle Count       0x0032   100   100   000       62
(196)Reallocated Event Count 0x0032   200   200   000       0
(197)Current Pending Sector  0x0012   200   200   000       0
(198)Offline Uncorrectable   0x0012   200   200   000       0
(199)UDMA CRC Error Count    0x000a   200   253   000       0
(200)Unknown Attribute       0x0009   200   200   051       0




More information about the plug mailing list