[plug] hot freeze - not a contradiction in terms
Gavin Chester
sales at ecosolutions.com.au
Wed Dec 13 11:47:48 WST 2006
On Wed, 2006-12-13 at 11:19 +0900, Denis Brown wrote:
> At 08:34 PM 12/12/2006, Gavin Chester wrote:
> >With this hot spell I have seen my workstation emulating a windoze PC in
> >the sense that twice today without warning it has lost all keyboard
> >input and apparent disc I/O so that I could not even open a VT or do
> >anything except a full reboot.
>
> Some random thoughts...
> RAIDed drives? If not then drive(s) could be a possibility however, if
> RAID, then it should fail gracefully or leave messages behind I would think
> in something like /var/log/messages.
They are running in a 2-drive LVM setup off a LSI Logic controller card.
>
> Although, if the problem is drive-related then drives per se have a fairly
> high thermal mass. When they get hot, they "stay hot" and a subsequent
> reboot - if that is what it takes - should see the system crash again very
> shortly thereafter, assuming you let it "cool down" for say five minutes.
Maybe that figures :-/ The second time it froze after a couple of hours
use I had rebooted straight after the first freeze. This time, it's
been running solid for 14 hrs now after I first rested it for a couple
of hours and we haven't yet reached the hottest part of the day.
> If it was me, I would be thinking more in terms of memory or other mobo
> related components since they would cool down faster and give a longer
> period of operation before overheating again. Sort of supporting that is
> the thought that, if a drive failed and loaded garbage in place of a
> required application, module, etc then only that application, module or
> whatever should be affected - the remainder of the processes should just
> keep marching along. Ergo you should still have shell access, etc.
>
> Of course I can think of heaps of things that would contravene that, swap
> being one. Load something dodgy out of swap and all bets are off.
>
> If it is mobo-related then that might explain why the system has no time to
> record in logs where it hurts before it dies.
>
> Temperature / voltage monitoring of the mobo may be a profitable avenue to
> pursue, especially if the logging can be done via a serial port to a dumb
> terminal - you may be able to see some trends leading up to failure?
>
> HTH,
> Denis
Thanks for the info :-)
Gavin
More information about the plug
mailing list