[plug] odd file manager hangs

Craig Ringer craig at postnewspapers.com.au
Wed Aug 18 14:54:49 WST 2004


Hi folks

I'm running into some odd trouble on my core server, related to users'
file browsers. Sometimes, when a user tries to open a certain commonly
used shared directory, /netstore/current_stories, their file browser
will hang for a few seconds (up to a minute). This only seems to happen
when several people are active in that directory, and only during active
times on the server.

The directory is accessed by a bunch of users over Samba, and a number
of users running apps locally on the server (via remote X). The hanging
file browesr issue has only been spotted on the remote X users'
machines, never in SMB file browsers. I've also never seen 'ls' pause
for an unusually long time when listing the directory.

I have not yet been able to reproduce the problem in testing when
strace()ing a process or running it under gdb, but have observed it on
users' machines. Trust me, this is not for lack of trying - it seems
like just before I get to attach a debugger or trace the program, it
stops doing it.

The only "in process" info I've collected so far is info on open file
handles (thanks, James!) with fuser. Most of the time the directory
looks like I'd expect:


[root at bucket root]# fuser -v /netstore/current_stories

                     USER        PID ACCESS COMMAND
/netstore/current_stories
                     root       6153 ..c..  smbd
                     root       9778 ..c..  smbd
                     root      13091 ..c..  afpd
                     root      15977 ..c..  smbd
                     root      18765 f....  fam
                     root      22595 ..c..  smbd
                     root      27396 ..c..  afpd
                     root      28007 ..c..  afpd
                     root      30517 ..c..  smbd

but on one occasion - shortly after which a user reported that they'd
seen the freeze again - I saw the following instead:

[root at bucket root]# fuser -v /netstore/current_stories

                     USER        PID ACCESS COMMAND
/netstore/current_stories
                     root       6153 ..c..  smbd
                     root       9778 ..c..  smbd
                     aja       10566 f....  nautilus
                     aja       10572 f....  nautilus
                     aja       10573 f....  nautilus
                     aja       10574 f....  nautilus
                     aja       10575 f....  nautilus
                     aja       10576 f....  nautilus
                     aja       10578 f....  nautilus
                     aja       10579 f....  nautilus
                     aja       10580 f....  nautilus
                     aja       10581 f....  nautilus
                     aja       10582 f....  nautilus
                     aja       10583 f....  nautilus
                     root      15977 ..c..  smbd
                     root      18765 f....  fam
                     root      21128 ..c..  smbd
                     jen       23525 f....  nautilus
                     jen       23539 f....  nautilus
                     jen       23540 f....  nautilus
                     jen       23541 f....  nautilus
                     jen       23546 f....  nautilus
                     jen       23547 f....  nautilus
                     jen       23548 f....  nautilus
                     jen       23549 f....  nautilus
                     jen       23550 f....  nautilus
                     jen       23552 f....  nautilus
                     jen       23553 f....  nautilus
                     jen       23556 f....  nautilus
                     root      28007 ..c..  afpd
                     root      30517 ..c..  smbd

(Aja and Jen are two of the users here who use graphical file browsers
and have this problem).

It's odd that /both/ users' file browsers show up as having the
directory open at once, but otherwise I never see any. Perhaps there's
some odd race/conflict with directory listing?

I've tried running

while true; do ls >&/dev/null; done

and trying to access the dir with a file manager then, but it doesn't
seem to make a difference and works normally.

I've also seen this hang at least once under Konqueror when testing with
it on my local login, but it was too brief to get a debugging trace of
any sort from. Of course.

There are no interesting messages in syslog or dmesg that might point to
the problem.

So ... has anybody else seen anything like this? Any ideas, guesses, or
voodo rituals?

--
Craig Ringer




More information about the plug mailing list