[plug] the GNOME panel that just won't die
Craig Ringer
craig at postnewspapers.com.au
Tue Jun 15 14:54:48 WST 2004
Hi all
I'm running into a real head-scratcher here, and was hoping to get some
assistance or ideas.
A user here has had the GNOME panel hang a couple of times recently. Odd
and very annoying, but killing it and restarting it has always done the
trick.
Not so this time.
The panel just wouldn't die. It looks like the GNOME segfault handler
ran when it died, flagging it as traced. This happened before, too - or
at least the process was in T state when I killed it.
Mentally swearing at GNOME, I tried to kill the panel as I had the
previous time - but it didn't die. Not even kill -9 would kill it. A
little more looking around revealed the GNOME segfault handler with the
panel as a ppid .... but the segfault handler is a defunct process (Z
state).
from "ps aux":
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
aja 28611 0.0 0.5 21196 10960 ? S 09:12 0:03 gnome-panel
jen 6442 0.0 0.0 0 0 ? Z 14:13 0:00
[gnome_segv2] <defunct>
[craig]$ ps -e --format "pid user ppid wchan cmd" | grep jen
6442 jen 2648 exit [gnome_segv2] <defunct>
2648 jen 1 finish gnome-panel --sm-config-prefix
/gnome-panel-0iv5Wq/ --sm-client-id
110a000004000107517969800000242930001 --profile default
I've worked around the problem for now by logging the user out, killing
gconfd and the bonobo-activation-server, and logging them back in. GNOME
could no longer see the 'zombie' panel, so it started a new one and is
working OK. I still have an unkillable process on my server, though, and
that is not something that makes me happy.
The machine in question used to be RH8, but has evolved over time. It
runs 2.6.3 (soon to be 2.6.6, as it's going down for an upgrade soon
anyway). I didn't want to go to 2.6 but we needed some of the disk
elevator improvements quite badly. Aside from a few upgraded apps it's
otherwise mostly RH8. GNOME is unmodified from the RH8 GNOME. The
machine has ample RAM (ECC DDR) and storage (RAID) and is otherwise 100%
rock solid, so the chances of this being hardware related are slim to none.
Uptime is 102 days. We've done better than that before, but not all that
much better. It's not like we're using WinNT where uptime is a possible
explanation for "it's going insane" though. (I just rebooted the NT last
week after over 90 days of uptime - not too bad for a Windows server).
As you can imagine, this is driving me nuts. One process is gone but
won't leave the process table, and another one won't finish terminating.
As an aside, we have a nice shiny KDE 3.2.1 but the users don't like it,
so they're sticking with GNOME. This is painful for me, as GNOME 2.0.1
is ... less than entirely stable, and upgrading GNOME has proved
impractical. Maybe GARNOME has improved since I last tried. Any opinions
or comments on the "virtual server on server using UML" approach? I'm
_very_ tempted to keep the terminal users in a UML so that their
environment can easily be cloned, upgraded, forked off for a testbed,
etc. I'd be very interested in any info on production experience with
UML, esp in environments with interactive, memory-heavy processes.
--
Craig Ringer
More information about the plug
mailing list