[plug] Raid-5 recovery

Andrew Howell andrew at it.net.au
Sun Dec 28 00:58:51 WST 2003


Have you tried using mdadm to monitor your array?

It can e-mail or run a program when an event occurs.

Andrew

Brad Campbell wrote:

> G'day all,
> I just went through the anguish of a 2 channel failure on a 5 disk 
> 780GB raid. Most of the info I located on the web was wayyyy out of 
> date for the current tools and md drivers.
>
> In case it might save someone the anguish in the future.. some tips.
>
> In my case the issue is caused by ide channels locking up. Normaly if 
> I'm recording satellite overnight I may get a channel lock up and drop 
> a drive. Cold restart and resync and we are sweet. (I'm waiting on 
> some PATA-SATA adaptors to work around this issue but they are on 
> backorder in the US). Last night I dropped two channels in about 30 
> minutes, which caused the array to be marked as bad. (More than one 
> failed disk and your toast)
>
> So.. check the syslog and dmesg event counts and determine which was 
> the first disk to fail.
>
> use lsraid -p -R to scan all disks and spit out a raidtab.
> Ensure your raidtab matches this disk mapping exactly before you do 
> anything else.
>
> Change the first failed disk from raid-disk to failed-disk in your 
> raidtab.
>
> run mkraid --force --dangereous-no-resync and read the stern warning.
> after sufficient checks and a big lump in your throat run
> mkraid --really-force --dangerous-no-resync
>
> If all went well you will have a running array. If not then I guess 
> you have lost the contents of the array. I have not seen any other way 
> to make this work. I even contemplated hand editing the raid 
> superblocks to mark the second failed disk as ok but could find no 
> info on it.
>
> Mount the fs readonly and check to see if everything is where it 
> should be. Run fsck readonly just in case.
>
> And if all went well, mark the failed disk as raid-disk in your 
> raidtab and use raidhotadd to add it back into the array.
> 3 hours worth of resyncing later and your back on-line.
> <Wipes sweat from brow>
>
> All the info out there about ckraid and the kernel doing the work for 
> you is well out of date.
>
> If someone could suggest a cheap way of backing up 720GB of data I 
> might sleep easier at night. Short of that I'm going to have to find 
> something that watches the syslog for a raid disk failure message and 
> shuts the machine down before it gets a chance to fail a second disk.
>
> Brad (waiting for raid resizing to be added to EVMS so I can add an 
> extra couple of drives into the array, it should happen early 2004 
> fingers crossed!)
> _______________________________________________
> plug mailing list
> plug at plug.linux.org.au
> http://mail.plug.linux.org.au/cgi-bin/mailman/listinfo/plug


-- 
Andrew Howell
Director
Informed Technology
E-mail: andrew at it.net.au
Ph: 08 9380 4244  Fax: 08 9380 4354






More information about the plug mailing list