[plug] Compaq SMP problem (PLUG list repost with updates)

Beau Kuiper kuiperba at cs.curtin.edu.au
Mon Jun 12 22:48:31 WST 2000


Hi,

Try replacing the eepro network card with a cheap network card (rtl8139 will
do) and see if it fails then. There is quite a bit of discussion about the
eepro100 and the tulip networks cards and their big problems on the linux-net
mailing list. If this fixes it, then search for a different network card that
gives decent performace but isn't based on the tulip or the eepro100.

Beau Kuiper
kuiperba at cs.curtin.edu.au

On Mon, 12 Jun 2000, Raven wrote:
> Hi all,
> 
> Here I am with a story about a 'sick' Compaq.
> 
> The hardware:
> 
>     Compaq SP750 with dual 733MHz Pentium III xeon processors
>     1GB of Rambus memory
>     Adaptec 7899 SCSI controller
>     Matrox 16MB G400 dual head card
>     Intel EtherPro100 (I believe, I will confirm the driver)
>     The machine has a full duplex link to a switch with
>     Gigabit connectivity to our Solaris servers (a Cabletron SSR8000).
> 
> The problem:
> 
> The problem is with network performance. After some period of time,
> network performance drops off to almost nothing. FTP's that crank
> through at 8-10 Mbyte/sec when the machine is 'fresh' drop off to
> sub-modem speeds ie. < 2KBytes/sec when it gets 'sick'.
> 
> The drop-off can happen after a few hours of operation, or it can happen
> after a week. No other major symptoms, everything other than network
> related operations seem to perform OK. The only common factor is that
> the system has allocated most or all of it's memory for some purpose
> (not unusual for a Unix system).
> 
> The story so far:
> 
> I have had a look at the messages output and the machine seems to
> recognize everything OK and there doesn't seem to be anything that looks
> relevant to when the machine gets sick. The kernel .config checks out
> for an SMP kernel (according to the SMP FAQ).
> 
> Kernels that have shown the problem so far are 2.2.14, 2.3.99-pre6 and
> 2.2.15. They are compiled with fewest options needed to support required
> system functionality. Kernels 2.2.16 and 2.4.0-test1 have not been tried
> yet.
> 
> The kernel currently used is 2.2.15. The most recent build of this
> kernel performed OK for about 5-6 days and then required a shutdown for
> building mains power maintenance. This kernel is being used now in SMP
> mode and has lasted 6 days so far. Currently the machine has been OK for
> 6 days, but will be going down for hard disk maintenance tomorrow.
> 
> One time when the machine got sick the interface was downed, the network
> card module unloaded and reloaded and the interface brought back up.
> This had no effect, the machine still ran slow until the next reboot.
> 
> A dump of the /proc tree was taken when the system was operating
> normally and on a couple of occasions where the system was running
> slowly. There is no information there, that we are aware of, that might
> indicate the source of the problem.
> 
> Today I checked the network card settings and found that the Linux
> machine has forced 100mbps full-duplex operation. I have had the switch
> changed to the same setting. Could this cause such a problem?
> 
> The next steps:
> 
> After reading the linux-kernel list FAQ I have replaced egcs-1.1.2 with
> gcc-2.7.2.3, downloaded 2.2.16 and compiled a new smp kernel. This
> kernel will be used after the reboot tomorrow. We also plan on using the
> 'nosmp' option to see if that makes any difference.
> 
> Trying a different network card will also be done (can't say exactly
> when).
> 
> In the meantime can anyone suggest what might be causing this problem or
> suggest any other things to try please.
> 
> --
>    ,-._|\    Ian Kent
>   /      \   Perth, Western Australia
>   *_.--._/   E-mail: ian.kent at pobox.com, raven at plug.linux.org.au
>         v    Web: http://pobox.com/~ian.kent



More information about the plug mailing list