[plug] Urgent RAID Help needed

James Bromberger james at rcpt.to
Wed Jul 31 18:59:12 WST 2002


On Wed, Jul 31, 2002 at 03:09:52PM +0800, Adrian Woodley wrote:
> Have a look at James B's "Incase of Emergency Break Glass" on the
> http://www.james.rcpt.to/programs/debian/raid1/
> You may need to install a new system on a spare drive and use that to recover
> the data on the RAID disk.
 

Hum. Very timely thread for me. I had an email message from the raid subsystem
on my machine yesterday saying:


> From: mdadm monitoring <root at phobe>
> To: root at phobe
> Subject: Fail event on /dev/md/5:phobe
>
> This is an automatically generated mail message from mdadm
> running on phobe
> A Fail event had been detected on md device /dev/md/5.
> It could be related to sub-device /dev/ide/host0/bus1/target0/lun0/part7.
> Faithfully yours, etc.


Not the kind of message you like to read.

Interesting that mdadm used devfs paths. Anyways, checking the log showed:

> Jul 30 03:03:27 phobe kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> Jul 30 03:03:27 phobe kernel: hdc: dma_intr: error=0x40 { UncorrectableError },
> LBAsect=51759668, sector=19533784
> Jul 30 03:03:27 phobe kernel: end_request: I/O error, dev 16:07 (hdc), 
> sector 19533784
> Jul 30 03:03:27 phobe kernel: raid1: Disk failure on ide/host0/bus1/target0/lun0/part7, disabling device.
> Jul 30 03:03:27 phobe kernel: ^IOperation continuing on 1 devices
> Jul 30 03:03:27 phobe kernel: raid1: ide/host0/bus1/target0/lun0/part7: 
> rescheduling block 19533784
> Jul 30 03:03:27 phobe kernel: md: recovery thread got woken up ...
> Jul 30 03:03:27 phobe kernel: md: updating md5 RAID superblock on device
> Jul 30 03:03:27 phobe kernel: md: (skipping faulty ide/host0/bus1/target0/lun0/part7 )
> Jul 30 03:03:27 phobe kernel: md: ide/host0/bus0/target0/lun0/part7 [events: 
> 0000001f]<6>(write) ide/host0/bus0/target0/lun0/part7's sb offset: 62037760
> Jul 30 03:03:27 phobe kernel: raid1: ide/host0/bus0/target0/lun0/part7: 
> redirecting sector 19533784 to another mirror
> Jul 30 03:03:27 phobe kernel: md5: no spare disk to reconstruct array! -- 
> continuing in degraded mode
> Jul 30 03:03:27 phobe kernel: md: recovery thread finished ...



First some background. If you've read my RAID page, then you'll see I have 
two disks set into a group of identicla partitions. Now, if a disk was bad, 
I would have expected all partitions on that disk to have gone bad, not 
just partition 7 (which, of course, has may main fileshare on it, of course).

Suspecting that it just got its knickers in a knot, I tried to add it back to 
the array (raidhotadd /dev/md5 /dev/ide/host0/...). I got a 'disk busy' 
message. So tonight I tried again. First I did a 'raidhotremove /dev/md5 
/dev/ide/...' to make sure the failed physical partition was not part of the 
md5 device, and then a corresponding raidhotadd...


And it has duly said:


> Jul 31 18:16:09 phobe kernel: md: syncing RAID array md5
> Jul 31 18:16:09 phobe kernel: md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
> Jul 31 18:16:09 phobe kernel: md: using maximum available idle IO bandwith (but
> not more than 100000 KB/sec) for reconstruction.
> Jul 31 18:16:09 phobe kernel: md: using 124k window, over a total of 62037760 blocks.


Which now loogs good. Cat'ing /proc/mdstatus shows:

> Personalities : [raid1]
> read_ahead 1024 sectors
>  md5 : active raid1 ide/host0/bus1/target0/lun0/part7[2] ide/host0/bus0/target0/lun0/part7[0]
>      62037760 blocks [2/1] [U_]
>      [===================>.]  recovery = 96.4% (59866752/62037760) finish=1.7min speed=20373K/sec


So it is all looking good. After a further 1.7 minutes, this now says:


> md5 : active raid1 ide/host0/bus1/target0/lun0/part7[1] ide/host0/bus0/target0/lun0/part7[0]
>       62037760 blocks [2/2] [UU]




And thus it looks good. I was about to purchase a new drive and replace it, 
and may still do so. Both these drives were purchased at the same time, so 
I guess the chance of one failing is equal on both. Anyways, I guess I will 
look for further failures and see what happens.


Regards,

  James
(This is all with stock standard 2.4.18 from the Debian package, with 
no software recompiled or changed in any way excdept for config files)

-- 
 James Bromberger <james_AT_rcpt.to> www.james.rcpt.to
 Remainder moved to http://www.james.rcpt.to/james/sig.html
 The Australian Linux Technical Conference 2003: http://www.linux.conf.au/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20020731/6d25e1a4/attachment.pgp>


More information about the plug mailing list