[plug] the GNOME panel that just won't die
Craig Ringer
craig at postnewspapers.com.au
Wed Jun 16 20:17:18 WST 2004
James Devenish wrote:
> In message <40D02611.80505 at postnewspapers.com.au>
> on Wed, Jun 16, 2004 at 06:50:57PM +0800, Craig Ringer wrote:
>
>>After a while it suddenly stopped responding over the network. 'ifdown
>>eth2; ifup eth2' helped briefly, but then it stopped responding again.
>>'ethtool eth2' reported link was fine, and the interface has a
>>statically assigned IP. When the machine started reporting 'no route
>>to host' for /some/ packets (completely losing the rest) when pinging
>>another machine,
>
> Hmm, interesting. I've encountered a similar-sounding problem with
> kernels <2.6. There's a particular machine with Intel EtherExpress Pro
> cards (IIRC) that's always given trouble, regardless of kernel version.
I use a PCI-X e1000 on this machine. That's _very_ interesting. Are you
using the eepro (Donald Becker) or e100 (Intel) driver for your eepro/100?
> Mostly, problems under load (though it doesn't take much to make it act
> "heavily loaded", because it's bad at I/O on the whole). Occasionally it
> says "too much work" and stops responding on an interface.
I didn't see anything along those lines in dmesg.
What I do see, shortly before the crash, is a bunch of errors like this:
Jun 16 17:56:28 bucket mount.smbfs[15421]:
tdb(/var/lib/samba/gencache.tdb): tdb_lock failed on list 10 ltype=0
(Bad file descriptor)
Jun 16 17:56:28 bucket mount.smbfs[15421]: [2004/06/16 17:56:28, 0]
tdb/tdbutil.c:tdb_log(724)
from smbfs. I was noticing problems with smbfs (well, even more problems
than smbfs usually causes) before the crash. Still, I think it likely
that these errors are just related to the general networking failure.
This is also interesting:
Jun 16 17:54:19 bucket kernel: NETDEV WATCHDOG: eth2: transmit timed out
(repeated several times over the half hour before the crash)
and there are _lots_ of errors from gdm, afpd, imapd, pop3d, and smbfs
about timeouts. Nothing else obviously raises a red flag, but the logs
are _very_ noisy so it'll take some proper filtering to analyse them
properly.
I didn't see anything interesting in dmesg when examining it before the
crash, either, but I was in rather a hurry...
> I can connect
> via a different interface, but then TCP connections only last a few
> minutes before they stall. I can keep starting new TCP connections for a
> while, but eventually the interface will fail like the previous one.
> There is certainly some way of temporarily recovering the interfaces
> from the console, yet not the existing connections, so it still acts
> pretty screwed until rebooted.
That does sound vaguely similar, yes. Odd. I didn't get a chance to try
another interface (the two eepro/100 interfaces are currently unused,
and I didn't have time to fiddle around) but my experience with the
gigabit interface matches what you mention, in terms of a temporary
recovery and weird stalling.
--
Craig Ringer
More information about the plug
mailing list