[plug] How to diagnose a crashing Linux server?

Craig Ringer craig at postnewspapers.com.au
Tue May 20 10:41:08 WST 2003


> I have a RH8.0 webserver which is crashing every 4 to 5 days. I've never had
> a server crash on me before, and was hoping someone could go through a basic
> "check list" of what I should be looking for.

What's the behaviour in the crash? Kernel panic, just becomes 
unreachable, what?

Do you have console access or the ability to attach a null-modem cable 
to another machine you control? If so, try to see if you can get some 
info on why its crashing by looking at the console. I suggest a syslog 
entry (to /etc/syslog.conf) like:

*.*				/dev/tty12

to dump all system messages to tty12. Can be helpful tracking faults.

If you can attach a null-modem cable, then try booting the machine with
"console=/dev/ttyS0" or "console=/dev/tty1 console=/dev/ttyS0" after 
attaching the serial cable to another machine. You should be able to use 
a program like Minicom or (if a 'doze box) Hyperterminal to access the 
serial port and watch the console output. If the machine dumps anything 
like a kernel panic, you can capture it (because you set your terminal 
app to log to a file) and that'll help your diagnostics a lot.

You can also make an /etc/inittab entry to attach a getty to the serial 
line, allowing you to log in over the serial port even if you lose 
TCP/IP access. Great for (a) if you kill the ssh server while upgrading 
it and (b) if something goes badly wrong on the server.

As for general debugging - well, make sure the CPU isn't overheating, 
and if you've added any new RAM recently see if you can borrow some 
different RAM to test with. Check syslog for disk errors (though if 
they're on the primary disk, they probably won't get written to the log 
on disk - that's another good use of a serial console).

If you can give some more info on what's happening, it'd be helpful.

Craig




More information about the plug mailing list