[plug] Raid-5 recovery
Andrew Howell
andrew at it.net.au
Sun Dec 28 00:58:51 WST 2003
Have you tried using mdadm to monitor your array?
It can e-mail or run a program when an event occurs.
Andrew
Brad Campbell wrote:
> G'day all,
> I just went through the anguish of a 2 channel failure on a 5 disk
> 780GB raid. Most of the info I located on the web was wayyyy out of
> date for the current tools and md drivers.
>
> In case it might save someone the anguish in the future.. some tips.
>
> In my case the issue is caused by ide channels locking up. Normaly if
> I'm recording satellite overnight I may get a channel lock up and drop
> a drive. Cold restart and resync and we are sweet. (I'm waiting on
> some PATA-SATA adaptors to work around this issue but they are on
> backorder in the US). Last night I dropped two channels in about 30
> minutes, which caused the array to be marked as bad. (More than one
> failed disk and your toast)
>
> So.. check the syslog and dmesg event counts and determine which was
> the first disk to fail.
>
> use lsraid -p -R to scan all disks and spit out a raidtab.
> Ensure your raidtab matches this disk mapping exactly before you do
> anything else.
>
> Change the first failed disk from raid-disk to failed-disk in your
> raidtab.
>
> run mkraid --force --dangereous-no-resync and read the stern warning.
> after sufficient checks and a big lump in your throat run
> mkraid --really-force --dangerous-no-resync
>
> If all went well you will have a running array. If not then I guess
> you have lost the contents of the array. I have not seen any other way
> to make this work. I even contemplated hand editing the raid
> superblocks to mark the second failed disk as ok but could find no
> info on it.
>
> Mount the fs readonly and check to see if everything is where it
> should be. Run fsck readonly just in case.
>
> And if all went well, mark the failed disk as raid-disk in your
> raidtab and use raidhotadd to add it back into the array.
> 3 hours worth of resyncing later and your back on-line.
> <Wipes sweat from brow>
>
> All the info out there about ckraid and the kernel doing the work for
> you is well out of date.
>
> If someone could suggest a cheap way of backing up 720GB of data I
> might sleep easier at night. Short of that I'm going to have to find
> something that watches the syslog for a raid disk failure message and
> shuts the machine down before it gets a chance to fail a second disk.
>
> Brad (waiting for raid resizing to be added to EVMS so I can add an
> extra couple of drives into the array, it should happen early 2004
> fingers crossed!)
> _______________________________________________
> plug mailing list
> plug at plug.linux.org.au
> http://mail.plug.linux.org.au/cgi-bin/mailman/listinfo/plug
--
Andrew Howell
Director
Informed Technology
E-mail: andrew at it.net.au
Ph: 08 9380 4244 Fax: 08 9380 4354
More information about the plug
mailing list