[plug] handling failed non-redundant storage in a server

Craig Ringer craig at postnewspapers.com.au
Thu Feb 12 11:56:46 WST 2004


On Thu, 2004-02-12 at 11:43, Sham Chukoury wrote:
> On Thu, 2004-02-12 at 11:09, Craig Ringer wrote:
> 
> <snip>
> 
> > I was wondering if there's any way to deal with this - to remove the
> > processes I know will never recover, unmount the dead volume without
> > causing any harm to other parts of the system, etc. While I'll be able
> > to reboot this evening, surely there's a way of dealing with this sort
> > of thing without a reboot? 
> 
> Hmmm... Cycling init levels? :)

'fraid not. It's not an issue that simple - these processes are in D
state (interruptible sleep waiting for I/O). It's not a simple problem
of a process needing restarting or a volume needing unmounting. Any
attempt to unmount the volume simply causes the umount process to block
too. Even `sync` blocks permanantly.

> Or... have you tried killing the unrecoverable processes?
> kill[all] (-9) (pid|name)

Yes. As they're waiting for I/O they can't be killed, even by a kill -9.
The reason, as I understand it, is that they're running in kernel mode
at the moment and can't be killed until they leave the kernel I/O
routines - which they never will, because the disk is no longer there.

> As to unmounting the dead volume.. find out which processes think
> they've got open files on it, using lsof, and kill those processes, then
> try unmounting.

It's not a matter of processes using the volume that's the issue. The
volume can't be umounted because the kernel can't sync the filesystem -
the device is no longer there.

To give you an idea of what I mean - if I 
	`dd if=/dev/sdc of=/tmp/diskstart bs=1M count=1` 
I get the error:
	dd: opening `/dev/sdc': No such device or address
because the device is no longer present - it's been disabled by the RAID
controller and the driver has made the kernel aware of this. Yet this
entirely absent device has a mounted filesystem and files open on that
filesystem.

> You mean something like this?
> http://www.digital-explosion.co.uk/index.php?articleID=31

Eek. No, not that well cooled - water can stay well away from my
servers. I'm talking about a 5U railmount server case with most of the
back taken up by fans - cooling by stupid amounts of airflow. As much as
I'd love a quieter cooling solution, it's just not practical.

Craig Ringer




More information about the plug mailing list