[plug] hot freeze - not a contradiction in terms
Denis Brown
dsbrown at cyllene.uwa.edu.au
Wed Dec 13 11:19:25 WST 2006
At 08:34 PM 12/12/2006, Gavin Chester wrote:
>With this hot spell I have seen my workstation emulating a windoze PC in
>the sense that twice today without warning it has lost all keyboard
>input and apparent disc I/O so that I could not even open a VT or do
>anything except a full reboot.
Some random thoughts...
RAIDed drives? If not then drive(s) could be a possibility however, if
RAID, then it should fail gracefully or leave messages behind I would think
in something like /var/log/messages.
Although, if the problem is drive-related then drives per se have a fairly
high thermal mass. When they get hot, they "stay hot" and a subsequent
reboot - if that is what it takes - should see the system crash again very
shortly thereafter, assuming you let it "cool down" for say five minutes.
If it was me, I would be thinking more in terms of memory or other mobo
related components since they would cool down faster and give a longer
period of operation before overheating again. Sort of supporting that is
the thought that, if a drive failed and loaded garbage in place of a
required application, module, etc then only that application, module or
whatever should be affected - the remainder of the processes should just
keep marching along. Ergo you should still have shell access, etc.
Of course I can think of heaps of things that would contravene that, swap
being one. Load something dodgy out of swap and all bets are off.
If it is mobo-related then that might explain why the system has no time to
record in logs where it hurts before it dies.
Temperature / voltage monitoring of the mobo may be a profitable avenue to
pursue, especially if the logging can be done via a serial port to a dumb
terminal - you may be able to see some trends leading up to failure?
HTH,
Denis
More information about the plug
mailing list