Fw: [plug] How to diagnose a crashing Linux server?
Denis Brown
dsbrown at cyllene.uwa.edu.au
Tue Jun 3 12:49:27 WST 2003
At 11:53 3/06/2003 +0800, Richard Mortimer wrote in part:
>Greetings Folks,
>
>As a follow-up to this email that I sent just over a week ago - I resorted
>to "plan B" which was to schedule a CRON job to reboot every night.
<snip>
> > 0:00 /usr/libexec/gcon
> > 0:00 /usr/libexec/bono
> > 0:00 metacity --sm-sav
> >
> > JLM> metacity -sm-sav = small windows manager using gtk2
> > The other two I'm not sure, but you need to issue ps au to see the owner.
>
>Is anyone familiar with these two processes "gcon" and "bono"?
Sounds like Bonobo-activation to me - not that I've played with it
knowingly :-) Try this link for some info. gcon seems related so they
may not be anything to worry about...
http://www-106.ibm.com/developerworks/library/l-gn-cor/
<snip>
(From previous material describing the hardware.)
Hardware RAID, eh? I recall very recent problems here at UWA where the
main mail server dropped and didn't get up again real soon. Happened
several times in the space of a few days. The problem was reported to be
that the hardware RAID (AMI MegaRaid) controller was masking drive
errors. Can you examine individual drive error logs or run S.M.A.R.T.
checks on the drives to determine their state of health?
"... it turns out the you have to use the RAID
configuration software and look at each physical disk individually, and then
find the one with media errors in its history. For some reason the RAID
controller doesn't fail the disk from the array when it gets media errors,
but instead stupidly leaves it in there to cause problems!"
Sorry if this is "been there, done that" territory.
Cheers,
Denis
More information about the plug
mailing list