[plug] Ext3: attempt to access beyond end of device

Craig Ringer craig at postnewspapers.com.au
Thu May 12 12:57:07 WST 2005


On Thu, 2005-05-12 at 12:17 +0800, Cameron Patrick wrote:

>   # smartctl -a /dev/hd(whatever)
>   [...]
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   057   051   006    Pre-fail  Always       -       127719436
>   3 Spin_Up_Time            0x0003   098   096   000    Pre-fail  Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       15
>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
>   7 Seek_Error_Rate         0x000f   089   060   030    Pre-fail  Always       -       836514076
>   9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6883
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       16
> 194 Temperature_Celsius     0x0022   036   048   000    Old_age   Always       -       36
> 195 Hardware_ECC_Recovered  0x001a   057   051   000    Old_age   Always       -       127719436
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
> 200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
> 202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0
>   [...]
> 
> You can also ask smartctl to get the drive to run a self test and
> various other things.

To run the test
	$ smartctl -t short /dev/hda
To wait until it's finished:
	$ sleep 10m
(or however long smartctl says it'll take)
then to look at the results:
	$ smartctl -a /dev/hda 

You can also use '-t long' if you find nothing wrong in the short test
but are suspicious, or if you're trying to do a comprehensive job. I
find the short test catches about 1/2 to 2/3 the bad disks I've run
into, the long test has caught every single one.

> One of the important fields here is
> Reallocated_Sector_Ct.  If it's non-zero, you have a drive with bad
> sectors and might want to complate buying a new one.

Other critical variables are:

    - Temperature_Celsius, if shown. You'll usually want the raw value.
      Very high == bad.
    - Offline_Uncorrectable (raw value). A non zero value here pretty
      much means it's bin time for the disk in my experience, as every
      disk I've had with a non-zero value here has been SERIOUSLY
      failing.
    - UDMA_CRC_Error_Count (raw value). I've seen this go through 
      the roof with bad cables and in one case a bad controller.
    - Hardware_ECC_Recovered (raw value). This often seems to get very
      high when the disk is going bad, especially due to heat.
    - Spin_Retry_Count (raw value): A key indication that the drive
      motor is failing is if this is non-zero (though 1 or 2 might be
      OK because of bad power, etc).

I find `smartctl -H' to be essentially useless. At least with Western
Digital disks, it often reports PASSED on disks that are so utterly
screwed that even the partition table can't be read and that can't even
spin up half the time. Seagate disks seem a little more honest about
warning you when they think they might be dying, and I haven't had a
Maxtor die on me yet so I can't comment on those.

Here's the vendor table SMART dump from my two (AFAIK; haven't run tests
recently) healthy desktop disks. The output is in original form,
unwrapped, so if your mail client doesn't soft-wrap text client-side it
should look fine.

=== START OF INFORMATION SECTION ===
Device Model:     Maxtor 6Y120P0
Serial Number:    Y41GMYVE
Firmware Version: YAR41BW0
User Capacity:    122,942,324,736 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Thu May 12 12:48:42 2005 WST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[...]

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   202   191   063    Pre-fail  Always       -       12632
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       760
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   252   246   187    Pre-fail  Always       -       40822
  9 Power_On_Minutes        0x0032   240   240   000    Old_age   Always       -       467h+36m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   251   251   000    Old_age   Always       -       1181
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       30
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       1331
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       1
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   196   195   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

[...]
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1227         -

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD1200JB-75CRA0
Serial Number:    WD-WMA8C2759841
Firmware Version: 16.06V16
User Capacity:    120,000,000,000 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu May 12 12:54:30 2005 WST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[...]

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   102   095   021    Pre-fail  Always       -       5658
  4 Start_Stop_Count        0x0032   099   099   040    Old_age   Always       -       1588
  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always       -       5
  7 Seek_Error_Rate         0x000b   100   253   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10716
 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1512
196 Reallocated_Event_Count 0x0032   196   196   000    Old_age   Always       -       4
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       4
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

-- 
Craig Ringer




More information about the plug mailing list