[plug]: weird box behavior
Craig Ringer
craig at postnewspapers.com.au
Fri Oct 26 18:42:09 WST 2001
Hi all
Thought those of you who are hardware minded might have some ideas about
this, since I'm utterly stumped.
Its downright weird. I hope I'm not imposing on those on the list by asking
this but (despite the fact you'll save me from insanity) it might be
interesting.
Symptoms:
I've got a machine here that is crashing randomly every couple of days,
usually when I'm not here and its really important that it work. I've had a
kernel panic (needed it up so told illiterate staffer to press reset sw,
hence no idea more than that of what happened). The dhcp daemon seems to
silently die (no core, no log msgs) sometimes. Enlightenment segfaults
sometimes. X goes insane (cursor leaping in random directions whenever mouse
moves) - and no I DO NOT run gpm - whenever I switch back from a console.
XFree86v4 has graphical oddities you have to see to believe and is unusable,
despite the official support for the card and chipset.
Tests run:
I've run K7burn for a full day without an error. (funnily, the system ran
like a dog...)
Ditto memtest.
Passes a memtest86 overnight without problems.
Disk I/O testing reveals no problems, hdparm settings affect nothing.
Tried running it without X for a few days but other symptoms persisted.
Tried unloading all non-critical modules, removing all but 128m of RAM (tried
each DIMM, too), shutting down non-critical services, and aliasing
auto-loading non-critical modules to "null". No luck.
Turned everything I could off in BIOS, after going to "failsafe" mode.
Its suprisingly slow for a duron 850 - my 650 @home is faster even if I
remove the extra 256m of ram and bring it down to 128.
Flame retardant: I know the services on this box are dumb, I'd love to change
it, but I don't have the space, a spare box or 2, or other resources to
separate these things out. I have _no_ available machines to replace this
with except a couple of even-less-reliable p100s.
Hardware & Software
The box is an AMD Duron 850 w 256M of RAM. Via KT133 chipset, AC97 onboard
audio. 2 NICs, a RealTek 8029 based card and a LinkSys EtherFast 10/100. Oh
and a Cirrus Logic 6465 AGP video card. Its running debian sid (was potato,
then woody, finially sid in a desparate attempt that maybe a recent s/w fix
would eliminate the problems. Tried custom kernels to no avail).
Services
The box runs a DHCP server, acts as a 'net gateway/firewall, runs Squid and a
caching BIND (ports blocked to internet and daemons not bound to
internet-exposed-nic's IP). This makes it quite an important box. I'd like to
go back to woody if this proves to be a hardware problem.
Problem
Wholesale hardware replacement not an option. Not even for this box. So is
the response from on high.
Important machine.
*ARRGGGGGGHHHHHHH* sanity failing.
Any suggestions on further tests, possible cause/fixes, etc would be much
appreciated. I'd attatch a screenshot of XFree86v4's behavior but for the
obvious problems with that. Suffice to say, it starts up fine. Move the mouse
and chunks of screen slide with it, smearing out a trail. Blocks of screen
"teleport" around when graphics are updated. Etc.
More information about the plug
mailing list