[plug] Query: Very problematic memory behaviour with SSH/Debian

Sun Dec 29 22:05:50 WST 2002

Hi James.

With the information provided, I can't offer any useful suggestions as 
to what could be causing the problem, but I can suggest a few things to 
try for when you have access to the machine again. I don't know how much 
you know and don't know so some/all of what I suggest may be obvious - 
but hey, better said than not.

A linux system will as a matter of course fill up its RAM under heavy 
IO, but most of the RAM used is for cache only and is freely discarded 
when a better use comes up for it. When you run a "free -m" you will see 
something like:

             total       used       free     shared    buffers     cached
Mem:          250        245          4          0         30        149
-/+ buffers/cache:        66        184
Swap:        1443         18       1424

In this case, while I only have 4M RAM "free", for all intents and 
purposes I have 184M available for use.

The thrashing, however, is definitely NOT normal. I presume no tweaks 
have been made to the contents of /proc/sys/vm ? Try running the scp in 
a login session with a very limited memory space (using ulimit), eg

  ulimit -m 20000 -n 50 -v 50000

to set a maxmimum of ~20M RAM use, 50M total VM and 50 open files. If 
the scp runs the same from within a shell with these limits imposed, you 
can be pretty sure its not directly ssh's fault and can start looking at 
things like dodgy kernel patches.

Consider a kernel upgrade - even temporarily, it can be a useful test. 
Also consider booting the machine in single-user mode ("linux 1" or 
similar on the boot prompt from LILO/GRUB) and testing the copy from 
there - carefully.

As for the X11 memory use, that actually sounds pretty normal. On my 
system, the output of "top", trimmed to just display X, is:

On my system, top says:
   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
30558 root       9 -10  272M  10M  2996 S <   0.7  4.2   0:21 XFree86

which roughly reflects your comments. I also use the NVidia drivers. I'm 
not too hot on the linux VM stuff myself, but I think that Size includes 
m-mapped files, dynamically linked libraries and executables, even if 
they're not in RAM - whereas RSS is the actual current RAM use. Anybody 
feel free to correct me on this. Basically, there's no reason to stress 
there.

What does "top" say about X on the system in question? running "top" and 
hitting "M" to sort by RAM usage, then running commands in another vt 
can be very informative at times - especially if you set top to update 
more frequently, say "top -d 0.1" to update 10x/second. Consider doing 
this during an scp and watching what happens to what processes.

> I wonder if there could be a fault with the disk that would lead to
> these symptoms.

If so, you'd be seeing errors in syslog somewhat reminicient of:

Dec 29 11:02:04 access kernel: hdb: dma_intr: status=0x51 { DriveReady 
SeekComplete Error }
Dec 29 11:02:04 access kernel: hdb: dma_intr: error=0x84 { 
DriveStatusError BadCRC }

or similar looking errors about DMA transfer timeouts.

You can (as root) run 'dmesg -n 8' to set the kernel to print almost all 
printk messages to the console, too. Not too useful under X, but handy 
from a vt. That will cause those disk error messages to be printed for 
sure, if they're happening.

Hope something I've said might be of some use.

Craig Ringer