[plug] handling failed non-redundant storage in a server

Thu Feb 12 11:48:52 WST 2004

On Thu, 2004-02-12 at 11:34, James Devenish wrote:
> Since the load number basically indicates processes that are queued (not
> sure what it truly indicates under Linux), it is possible to have a high
> load average without noticeably slow performance, if those processes
> don't have long-lasting I/O or computation requirements. I guess, in
> your case, the hung processes are 'in the middle of something' but each
> time the kernel looks at those processes, it decides to ignore them and
> try again later.

Thanks for the explanation - I'd never clearly understood what, exactly,
the load average meant.

>  Would prefer to have
> programmes die "cleanly" with 'file not found' than hang.

There are times when it's appropriate for a program to hang. For one
thing, the kernel doesn't know if the device will be coming back - it
could be temporarily misbehaving, could be being spun up after going
into sleep mode and taking a while to do it, etc. 

A way for the admin to designate a device 'dead' would be nice. The
kernel would then switch to reporting I/O errors on access attempts
instead of blocking processes. An optional timeout to switch
non-responding devices over to 'dead' mode would also be useful.

> You clearly work in a noisy environment! Do you have your own streaming
> audio server that we can connect to, in order to listen in on your
> workplace? ;-)

It just wouldn't be the same with a volume control. Hmm... I think I
want one of those for here!

It's not really /that/ noisy, but for a long term work environment it's
very uncomfortable. Sufficiently so that I prefer to work with earplugs.

Craig Ringer