[plug] Query: Very problematic memory behaviour with SSH/Debian
Craig Ringer
craig at postnewspapers.com.au
Sun Dec 29 22:05:50 WST 2002
Hi James.
With the information provided, I can't offer any useful suggestions as
to what could be causing the problem, but I can suggest a few things to
try for when you have access to the machine again. I don't know how much
you know and don't know so some/all of what I suggest may be obvious -
but hey, better said than not.
A linux system will as a matter of course fill up its RAM under heavy
IO, but most of the RAM used is for cache only and is freely discarded
when a better use comes up for it. When you run a "free -m" you will see
something like:
total used free shared buffers cached
Mem: 250 245 4 0 30 149
-/+ buffers/cache: 66 184
Swap: 1443 18 1424
In this case, while I only have 4M RAM "free", for all intents and
purposes I have 184M available for use.
The thrashing, however, is definitely NOT normal. I presume no tweaks
have been made to the contents of /proc/sys/vm ? Try running the scp in
a login session with a very limited memory space (using ulimit), eg
ulimit -m 20000 -n 50 -v 50000
to set a maxmimum of ~20M RAM use, 50M total VM and 50 open files. If
the scp runs the same from within a shell with these limits imposed, you
can be pretty sure its not directly ssh's fault and can start looking at
things like dodgy kernel patches.
Consider a kernel upgrade - even temporarily, it can be a useful test.
Also consider booting the machine in single-user mode ("linux 1" or
similar on the boot prompt from LILO/GRUB) and testing the copy from
there - carefully.
As for the X11 memory use, that actually sounds pretty normal. On my
system, the output of "top", trimmed to just display X, is:
On my system, top says:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
30558 root 9 -10 272M 10M 2996 S < 0.7 4.2 0:21 XFree86
which roughly reflects your comments. I also use the NVidia drivers. I'm
not too hot on the linux VM stuff myself, but I think that Size includes
m-mapped files, dynamically linked libraries and executables, even if
they're not in RAM - whereas RSS is the actual current RAM use. Anybody
feel free to correct me on this. Basically, there's no reason to stress
there.
What does "top" say about X on the system in question? running "top" and
hitting "M" to sort by RAM usage, then running commands in another vt
can be very informative at times - especially if you set top to update
more frequently, say "top -d 0.1" to update 10x/second. Consider doing
this during an scp and watching what happens to what processes.
> I wonder if there could be a fault with the disk that would lead to
> these symptoms.
If so, you'd be seeing errors in syslog somewhat reminicient of:
Dec 29 11:02:04 access kernel: hdb: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Dec 29 11:02:04 access kernel: hdb: dma_intr: error=0x84 {
DriveStatusError BadCRC }
or similar looking errors about DMA transfer timeouts.
You can (as root) run 'dmesg -n 8' to set the kernel to print almost all
printk messages to the console, too. Not too useful under X, but handy
from a vt. That will cause those disk error messages to be printed for
sure, if they're happening.
Hope something I've said might be of some use.
Craig Ringer
More information about the plug
mailing list