[plug] How to diagnose a crashing Linux server?
Craig Ringer
craig at postnewspapers.com.au
Tue May 20 10:41:08 WST 2003
> I have a RH8.0 webserver which is crashing every 4 to 5 days. I've never had
> a server crash on me before, and was hoping someone could go through a basic
> "check list" of what I should be looking for.
What's the behaviour in the crash? Kernel panic, just becomes
unreachable, what?
Do you have console access or the ability to attach a null-modem cable
to another machine you control? If so, try to see if you can get some
info on why its crashing by looking at the console. I suggest a syslog
entry (to /etc/syslog.conf) like:
*.* /dev/tty12
to dump all system messages to tty12. Can be helpful tracking faults.
If you can attach a null-modem cable, then try booting the machine with
"console=/dev/ttyS0" or "console=/dev/tty1 console=/dev/ttyS0" after
attaching the serial cable to another machine. You should be able to use
a program like Minicom or (if a 'doze box) Hyperterminal to access the
serial port and watch the console output. If the machine dumps anything
like a kernel panic, you can capture it (because you set your terminal
app to log to a file) and that'll help your diagnostics a lot.
You can also make an /etc/inittab entry to attach a getty to the serial
line, allowing you to log in over the serial port even if you lose
TCP/IP access. Great for (a) if you kill the ssh server while upgrading
it and (b) if something goes badly wrong on the server.
As for general debugging - well, make sure the CPU isn't overheating,
and if you've added any new RAM recently see if you can borrow some
different RAM to test with. Check syslog for disk errors (though if
they're on the primary disk, they probably won't get written to the log
on disk - that's another good use of a serial console).
If you can give some more info on what's happening, it'd be helpful.
Craig
More information about the plug
mailing list