[plug] hot freeze - not a contradiction in terms
    Denis Brown 
    dsbrown at cyllene.uwa.edu.au
       
    Wed Dec 13 11:19:25 WST 2006
    
    
  
At 08:34 PM 12/12/2006, Gavin Chester wrote:
>With this hot spell I have seen my workstation emulating a windoze PC in
>the sense that twice today without warning it has lost all keyboard
>input and apparent disc I/O so that I could not even open a VT or do
>anything except a full reboot.
Some random thoughts...
RAIDed drives?   If not then drive(s) could be a possibility however, if 
RAID, then it should fail gracefully or leave messages behind I would think 
in something like /var/log/messages.
Although, if the problem is drive-related then drives per se have a fairly 
high thermal mass.   When they get hot, they "stay hot" and a subsequent 
reboot - if that is what it takes - should see the system crash again very 
shortly thereafter, assuming you let it "cool down" for say five minutes.
If it was me, I would be thinking more in terms of memory or other mobo 
related components since they would cool down faster and give a longer 
period of operation before overheating again.   Sort of supporting that is 
the thought that, if a drive failed and loaded garbage in place of a 
required application, module, etc then only that application, module or 
whatever should be affected - the remainder of the processes should just 
keep marching along.   Ergo you should still have shell access, etc.
Of course I can think of heaps of things that would contravene that, swap 
being one.   Load something dodgy out of swap and all bets are off.
If it is mobo-related then that might explain why the system has no time to 
record in logs where it hurts before it dies.
Temperature / voltage monitoring of the mobo may be a profitable avenue to 
pursue, especially if the logging can be done via a serial port to a dumb 
terminal - you may be able to see some trends leading up to failure?
HTH,
Denis
    
    
More information about the plug
mailing list