[plug] re: RAID5 & Hot Spares
Craig Ringer
craig at postnewspapers.com.au
Thu Jun 12 14:45:13 WST 2003
> I know this is an IBM h/w RAID we're talking about but just to be safe
> it may pay to ensure that any drive errors are appropriately
> acknowledged and the ailing drive kicked out of the array. I am
> thinking here in terms of what UWA faced recently with a MegaRaid
> controller - a drive was going toes up. The individual drive logs
> apparently said so, but the controller was happy to leave the drive in
> the array to cause mischief! Not Pretty (tm) Having said that, I
> don't know how you'd go about artificially creating fake bad blocks on a
> working drive to test whether or not it gets tossed out in your
> situation. Anyone?
Well ... if it weren't for the expensive SCSI drives, I'd probably do
something quite horrible. Ask the boss for a spare drive to do some
testing with - mentioning that it's going to destroy the drive in the
process. Remove the top of the drive, give it a light, short scratch on
a platter with a screwdriver, and replace the top. Instant bad sectors
;-) Of course, that can only be done with the server /off/ for the damage.
<rant> Speaking of RAID that isn't so great, I'm less than thrilled with
3ware. Despite excellent first impressions, the card has been apalling.
It's eaten my data - twice - though the service guys think that was a
hardware fault (I'm awaiting a warranty replacement while the server
sits inactive. They wouldn't send me the replacement until they got the
original - wonderful service, I say). That problem can only be specific
to our setup and card, since I can't possibly imagine a RAID controller
being that bad.
However, the card lacks any way to let you query the drive SMART data
(no LUN >0 reads like some SCSI arrays, no custom tools to do it). Their
support folks responded with "the card will do this for you". I had a
drive fail in the array (separate to the problem with the card its self,
it was another WD 120G JB dying) and it failed to notice before the
drive was totally f**ed. We're talking 400 + bad sectors, and almost
totally unuseable. When I ran a disk test, it started with 400-ish bad
sectors and finished with 440 (according to SMART). Clearly, the SMART
monitoring and drive testing doesn't work as well as it should - but
there's no user monitoring facility. In retrospect, I think I would've
gone for software RAID and saved $1200.
Hopefully they'll fix these issues, but they don't seem to be in the
listening to customers business, so maybe not.
</rant>
More information about the plug
mailing list