[plug] RAID, and what it can't solve

Craig Ringer craig at postnewspapers.com.au
Tue Jun 22 09:57:05 WST 2004


On Tue, 2004-06-22 at 09:36, Scott Middleton wrote:

> We just finished installing Linux on a computer with 12 SATA 240GB HDD
> on an escalade 9500 PCI-X with Hot Swap Caddies.

Allow me to express my extreme envy. I'm using an Escalade 8500 (which
is really a 7500 + some PATA->SATA bridge chips, hot-swap fixes and
tweaked firmware), and lack the hot-swap caddies. The PCI-X and 250GB
disks I have :-) . A 9500-8 would be really nice, as the 8500-8, while a
solid card, has some performance quirks that I understand are resolved
in the 9500-8.

You know how to use SMART on the Escalade controllers, right? I think I
posted about it a while back, so it should be in the archives. It's
something that's well worth knowing, as the monitoring that SMART can
provide is immensely better than what the controller will provide by its
self. I'm looking at logging and graphing disk temperatures and disk
error rates using SMART right now. It's also great when a disk drops out
of the array for some reason, as:

smartctl -d 3ware,portnumber -t short /dev/sda
or
smartctl -d 3ware,portnumber -t long /dev/sda

will help confirm if there's a definite fault on the drive. You can't
run the SMART tests while the array is live AFAIK (the drive would
timeout and fall from the array). The disk(s) must not be accessed by
the OS for the duration of the tests, which typically means unmounting
all volumes and suspending any LVM VGs that have PVs on that array.
Still, its _much_ better than having to shut down and move the disks to
an on-board SATA controller for laborious one-by-one testing like you'd
have to do otherwise - with smartctl you can have all disks in an array
test themselves in parallel and without moving them from their ports.

You _can_ run normal SMART queries on disks that are part of a live and
active array, though. The ability to anticipate a disk failure and
_choose_ a time to rebuild onto a hot spare is invaluable.

> One hot spare and RAID
> 5 the client has a total disk space of 2.27TB. Takes a long time to
> format :) This computer can afford to lose 2 HDDs but not at the same
> time. One can fail and once the array has finished rebuilding the other
> can fail. If 2 HDD fail the then Array is degraded and Data loss is
> likely. Load testing this was a bitch since it took hours to rebuild the
> array every time we purposefully flattened it.

I know that feeling. I abused my original array pretty badly when
testing the server - yanking a disk, reinserting it while the array was
rebuilding, waiting until it finished rebuilding and yanking a different
disk, etc. It took _forever_ but it was worth it, because I could have
some expectation that the array would handle failure correctly.
Unfortunately I wasn't able to do as much testing with this new array.

--
Craig Ringer




More information about the plug mailing list