[plug] Query: Very problematic memory behaviour with SSH/Debian

Craig Ringer craig at postnewspapers.com.au
Sun Dec 29 22:05:50 WST 2002


Hi James.

With the information provided, I can't offer any useful suggestions as 
to what could be causing the problem, but I can suggest a few things to 
try for when you have access to the machine again. I don't know how much 
you know and don't know so some/all of what I suggest may be obvious - 
but hey, better said than not.

A linux system will as a matter of course fill up its RAM under heavy 
IO, but most of the RAM used is for cache only and is freely discarded 
when a better use comes up for it. When you run a "free -m" you will see 
something like:

             total       used       free     shared    buffers     cached
Mem:          250        245          4          0         30        149
-/+ buffers/cache:        66        184
Swap:        1443         18       1424

In this case, while I only have 4M RAM "free", for all intents and 
purposes I have 184M available for use.

The thrashing, however, is definitely NOT normal. I presume no tweaks 
have been made to the contents of /proc/sys/vm ? Try running the scp in 
a login session with a very limited memory space (using ulimit), eg

  ulimit -m 20000 -n 50 -v 50000

to set a maxmimum of ~20M RAM use, 50M total VM and 50 open files. If 
the scp runs the same from within a shell with these limits imposed, you 
can be pretty sure its not directly ssh's fault and can start looking at 
things like dodgy kernel patches.

Consider a kernel upgrade - even temporarily, it can be a useful test. 
Also consider booting the machine in single-user mode ("linux 1" or 
similar on the boot prompt from LILO/GRUB) and testing the copy from 
there - carefully.

As for the X11 memory use, that actually sounds pretty normal. On my 
system, the output of "top", trimmed to just display X, is:

On my system, top says:
   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
30558 root       9 -10  272M  10M  2996 S <   0.7  4.2   0:21 XFree86

which roughly reflects your comments. I also use the NVidia drivers. I'm 
not too hot on the linux VM stuff myself, but I think that Size includes 
m-mapped files, dynamically linked libraries and executables, even if 
they're not in RAM - whereas RSS is the actual current RAM use. Anybody 
feel free to correct me on this. Basically, there's no reason to stress 
there.

What does "top" say about X on the system in question? running "top" and 
hitting "M" to sort by RAM usage, then running commands in another vt 
can be very informative at times - especially if you set top to update 
more frequently, say "top -d 0.1" to update 10x/second. Consider doing 
this during an scp and watching what happens to what processes.

> I wonder if there could be a fault with the disk that would lead to
> these symptoms.

If so, you'd be seeing errors in syslog somewhat reminicient of:

Dec 29 11:02:04 access kernel: hdb: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Dec 29 11:02:04 access kernel: hdb: dma_intr: error=0x84 { 
DriveStatusError BadCRC }

or similar looking errors about DMA transfer timeouts.

You can (as root) run 'dmesg -n 8' to set the kernel to print almost all 
printk messages to the console, too. Not too useful under X, but handy 
from a vt. That will cause those disk error messages to be printed for 
sure, if they're happening.

Hope something I've said might be of some use.

Craig Ringer



More information about the plug mailing list