[plug] Using a HDD with Badsectors

Bernd Felsche bernie at innovative.iinet.net.au
Mon Nov 27 13:17:16 WST 2006


"Mark J Gaynor" <mark at mjg.id.au> writes:
>On 27/11/2006 at 9:04 AM Bernd Felsche wrote:
>>"Mark J Gaynor" <mark at mjg.id.au> writes:
>>>On 26/11/2006 at 12:42 PM Chris Caston wrote:

>>The problem relates to the need to remove heat that's inevitably
>>produced by the equipment when operating.
>>
>>Unless you're buying real, server-class hardware, it's rare to find
>>equipment that's designed and built for optimum heat dissipation.
>>And it's a marvel to behold consumer installations of heat-sensitive
>>equipment.

>This was what I was eluding to without an in depth explanation, the
>majority of people have never worked in a conditioned environment
>where things start to happen once the heat exchange begins to fail. 

>Consumer goods are designed to fail and usually do just after the
>warranty has expired. I don't believe its worth the trouble for $60-70
>and your time at the same $60-70 per hour to rectify things.

My time is charged at a little more than that. :-)

It doesn't matter btw how cold your server room. I've seen people
put solid rack panels in front of _obvious_ cooling air inlets to
servers... and cook half a dozen drives within a month.

>My experience with drives starting to give errors is to go get a new
>device before total failure occurs and you loose all the data on that
>drive. Time is money and is something a lot of hobbyists don't
>calculate into the cost.

It depends how *hard* you try to ask the drive about defects. The
fine print of the specifications has a "tolerable" rate of failures
which the drive will often happily map silently to another sector.
If it runs out of spare sectors, that's when you get the reported
errors. 

What really needs to be done is to monitor the rate of sector/read
failures... to detect increasing media faults. The temperature, if
in a stabilised environment, should also remain predictable, and
return to withing a narrow range when idle.

>Bottom line for me is once a drive starts to fail, you look at the
>replacement process as soon as possible.

Older SCSI drives used to set up new spare sectors at a low-level
format. So if the failure rate was still within spec, you could save
all the data off it, reformat at drive level and get a fresh start.

The rate of sector failure is crucial.

SMART information usually includes the number of "soft" failures.
One should keep a regular watch on that number to determine the rate
over a lifetime on mission-critical systems.
-- 
/"\ Bernd Felsche - Innovative Reckoning, Perth, Western Australia
\ /  ASCII ribbon campaign | "If we let things terrify us,
 X   against HTML mail     |  life will not be worth living."
/ \  and postings          | Lucius Annaeus Seneca, c. 4BC - 65AD.




More information about the plug mailing list