[plug] Compaq SMP problem

Mike from West Australia erazmus at wantree.com.au
Fri Jun 9 12:06:42 WST 2000


Hi,

Sounds like a memory shuffling issuem - have seen something similar
on old dos network (SMC I think) card/config.

As more traffic goes through the s/w keeps shifting more and more
buffers and allocating more memory for this - it never ended up
flushing the buffers correctly but instead allocated new buffers...

I beleive the system config and/or new drive fixed it, the tech report
said something about the IRQ response being 'nested', this slowed
the machine to a crawl - sounds very similar to what you have. The
thing was the old machine never indicated it ran out of memory, perghaps
yours is the same to some degree,

Is it possible the s/w IRQ processing for one cpu keeps a copy of that
fro the other during network activity and the o/s loses it from that
point on int erms of memory allocation for network buffers

What happens if oyu disable one of the CPUS (compeltely) ?

Rgds

Mike




At 11:04 PM 8/6/2000 +0800, you wrote:
>
>Hi all,
>
>Here I am with the story about our 'sick Compaq.
>
>The hardware:
>
>    Compaq SP750 with dual 733MHz Pentium III xeon processors
>    1GB of Rambus memory
>    Adaptec 7899 SCSI controller
>    Matrox 16MB G400 dual head card
>    Intel EtherPro100 (I believe, I will confirm the driver)
>    The machine has a full duplex link to a switch with
>    Gigabit connectivity to our Solaris servers.
>
>The problem:
>
>The problem is with network performance. After some period of time,
>network performance drops off to almost nothing. FTP's that crank
>through at 8-10 Mbyte/sec when the machine is 'fresh' drop off to
>sub-modem speeds ie. < 2KBytes/sec when it gets 'sick'.
>
>The drop-off can happen after a few hours of operation, or it can happen
>after a week. No other major symptoms, everything other than network
>related operations seem to perform OK. The only common factor seen to
>date is that the system has allocated most or all of it's memory for
>some purpose (not unusual for a Unix system).
>
>
>The story so far:
>
>I have had a quick look at the dmesg output and the machine seems to
>recognise everything OK. The kernel .config checks out for an SMP kernel
>(according to the SMP FAQ, brief check).
>
>Kernels that have shown the problem so far are 2.2.14, 2.3.99.pre6 and
>2.2.15. They are compiled with fewest options needed to support required
>system functionality. Kernels 2.2.16 and 2.4.0-test1 have not been tried
>yet as they are not yet stable.
>
>The kernel currently used is 2.2.15. The most recent build of this
>kernel performed OK for about 5-6 days and then required a shutdown for
>building mains power maintenance. This kernel will be used again in SMP
>mode until the problem occurs. Next the same system kernel config with
>the 'nosmp' boot option will be used.
>
>One time when the machine got sick the interface was downed, the network
>card module unloaded and reloaded and the interface brought back up.
>This had no effect, the machine still ran slow until the next reboot.
>
>A dump of the /proc tree was taken when the system was operating
>normally and on a couple of occassions where the sysem was running
>slowly. There is no information there, that we are aware of, that might
>indicate the source of the problem.
>
>There doesn't seem to be anything in the messages file that looks
>relevent when the machine gets sick.
>
>Trying a different network card will also be done (can't say exactly
>when).
>
>Ian K
>
>
>
>



More information about the plug mailing list