[plug] handling failed non-redundant storage in a server

Thu Feb 12 13:25:03 WST 2004

On Thu, 2004-02-12 at 11:56, Craig Ringer wrote:

> Yes. As they're waiting for I/O they can't be killed, even by a kill -9.
> The reason, as I understand it, is that they're running in kernel mode
> at the moment and can't be killed until they leave the kernel I/O
> routines - which they never will, because the disk is no longer there.

Ah, right. =\

Heh, well, kill -9 is one of the most powerful things I have against
stubborn processes.. If that doesn't work and changing init levels isn't
an option, I don't see how to get out of the mess without a good old
fashioned reboot.

> because the device is no longer present - it's been disabled by the RAID
> controller and the driver has made the kernel aware of this. Yet this
> entirely absent device has a mounted filesystem and files open on that
> filesystem.

Yeah, I get ya now. So you need some mechanism at the kernel level to
somehow make it 'give up' on the device and flush all the buffered I/O
waiting to be done. Sounds almost like a fix, but so dangerous if
activated accidentally elsewhere.

> Eek. No, not that well cooled - water can stay well away from my
> servers. I'm talking about a 5U railmount server case with most of the
> back taken up by fans - cooling by stupid amounts of airflow. As much as
> I'd love a quieter cooling solution, it's just not practical.

Heheh. :)

§:)