[plug] Compaq SMP problem (PLUG list repost with updates)
Beau Kuiper
kuiperba at cs.curtin.edu.au
Mon Jun 12 22:48:31 WST 2000
Hi,
Try replacing the eepro network card with a cheap network card (rtl8139 will
do) and see if it fails then. There is quite a bit of discussion about the
eepro100 and the tulip networks cards and their big problems on the linux-net
mailing list. If this fixes it, then search for a different network card that
gives decent performace but isn't based on the tulip or the eepro100.
Beau Kuiper
kuiperba at cs.curtin.edu.au
On Mon, 12 Jun 2000, Raven wrote:
> Hi all,
>
> Here I am with a story about a 'sick' Compaq.
>
> The hardware:
>
> Compaq SP750 with dual 733MHz Pentium III xeon processors
> 1GB of Rambus memory
> Adaptec 7899 SCSI controller
> Matrox 16MB G400 dual head card
> Intel EtherPro100 (I believe, I will confirm the driver)
> The machine has a full duplex link to a switch with
> Gigabit connectivity to our Solaris servers (a Cabletron SSR8000).
>
> The problem:
>
> The problem is with network performance. After some period of time,
> network performance drops off to almost nothing. FTP's that crank
> through at 8-10 Mbyte/sec when the machine is 'fresh' drop off to
> sub-modem speeds ie. < 2KBytes/sec when it gets 'sick'.
>
> The drop-off can happen after a few hours of operation, or it can happen
> after a week. No other major symptoms, everything other than network
> related operations seem to perform OK. The only common factor is that
> the system has allocated most or all of it's memory for some purpose
> (not unusual for a Unix system).
>
> The story so far:
>
> I have had a look at the messages output and the machine seems to
> recognize everything OK and there doesn't seem to be anything that looks
> relevant to when the machine gets sick. The kernel .config checks out
> for an SMP kernel (according to the SMP FAQ).
>
> Kernels that have shown the problem so far are 2.2.14, 2.3.99-pre6 and
> 2.2.15. They are compiled with fewest options needed to support required
> system functionality. Kernels 2.2.16 and 2.4.0-test1 have not been tried
> yet.
>
> The kernel currently used is 2.2.15. The most recent build of this
> kernel performed OK for about 5-6 days and then required a shutdown for
> building mains power maintenance. This kernel is being used now in SMP
> mode and has lasted 6 days so far. Currently the machine has been OK for
> 6 days, but will be going down for hard disk maintenance tomorrow.
>
> One time when the machine got sick the interface was downed, the network
> card module unloaded and reloaded and the interface brought back up.
> This had no effect, the machine still ran slow until the next reboot.
>
> A dump of the /proc tree was taken when the system was operating
> normally and on a couple of occasions where the system was running
> slowly. There is no information there, that we are aware of, that might
> indicate the source of the problem.
>
> Today I checked the network card settings and found that the Linux
> machine has forced 100mbps full-duplex operation. I have had the switch
> changed to the same setting. Could this cause such a problem?
>
> The next steps:
>
> After reading the linux-kernel list FAQ I have replaced egcs-1.1.2 with
> gcc-2.7.2.3, downloaded 2.2.16 and compiled a new smp kernel. This
> kernel will be used after the reboot tomorrow. We also plan on using the
> 'nosmp' option to see if that makes any difference.
>
> Trying a different network card will also be done (can't say exactly
> when).
>
> In the meantime can anyone suggest what might be causing this problem or
> suggest any other things to try please.
>
> --
> ,-._|\ Ian Kent
> / \ Perth, Western Australia
> *_.--._/ E-mail: ian.kent at pobox.com, raven at plug.linux.org.au
> v Web: http://pobox.com/~ian.kent
More information about the plug
mailing list