[plug] server keeps dropping out

Mark O'Shea mark at musicalstoat.co.uk
Thu Mar 25 11:30:28 WST 2004


On Thu, 25 Mar 2004, Jon  Miller wrote:

> Like to get some input from the group on a problem that is a constant pain in the rear.  I have 2 sites that have a Linux gateway (one with Red Hat Linux 7.2 and the other with Debian (Linux version 2.4.18-bf2.4) .  They mainly perform gateway services in conjunction with mail services.  They sit behind Cisco routers that have firewall and VPN feature sets  turned on.
> Sporadically,  the servers go through periods where they freeze up and have to be rebooted.  For instance the Debian server normally have to be rebooted every Monday, now it's everyday and sometimes twice a day.  The Red Hat server requires at least once a day.  The servers are still running just the services either drops out or goes into what appears to be in a zombie state (loaded but not functioning).
> I've searched through both the /var/log/messages and /var/log/syslog looking for clues as to why they are doing this, but nothing is listed other than when they are rebooted.
> My plans are to run the following:
> memory tests
> harddisk test
> CPU tests
> In most cases after running these test normally nothing shows up as failed or failing.
>
> I'm looking for some sort of application that can be installed and record events when there are changes. The log must be sent to my server so that in the event there is a change I would have at least the last state of the server prior to it freezing.
>
> Are there any online testing that can be performed while the server is up to get any indication as to what is going on?
>
> Specs on servers
> Server A
> AMD 1.0 GHz CPU
> 256MB SDRAM
> WD 100EB ATA HDD
> 40x CD-ROM
> NetGear FA310 TX NIC
>
> Server B
> Intel Pentium III 500MHz CPU
> WD136BA ATA HDD
> 40x CD-ROM
> 256MB SDRAM
> NetGear FA310 TX NIC
>
> Will be replacing both servers soon.
>
> Thanks
>

When this happens can you still log in from the console?  If you can what
happens when you bring the nics down then back up again?  Does it start to
work again?

If so then it may well be the NICs that are giving you woes.  These ones
have been known to do that given randomly high (or not even that high)
traffic, and then work fine for ages and not give any of your colleagues
problems (I've got a couple that have never played up for me).  There may
well be better drivers around.  Donald Becker did the 2.2 kernel tulip
drivers but you can find 2.4 ones at http://sourceforge.net/projects/tulip
Or you could try changing the NICs for different manufacturers ones (try
not to go cheap).

If of course my first paragraph isn't what is happening then disregard
this ;)

Regards,
-- 
Mark O'Shea




More information about the plug mailing list