[plug]: weird box behavior

Craig Ringer craig at postnewspapers.com.au
Fri Oct 26 18:42:09 WST 2001


	Hi all
Thought those of you who are hardware minded might have some ideas about 
this, since I'm utterly stumped.
Its downright weird. I hope I'm not imposing on those on the list by asking 
this but (despite the fact you'll save me from insanity) it might be 
interesting.

	Symptoms:

I've got a machine here that is crashing randomly every couple of days, 
usually when I'm not here and its really important that it work. I've had a 
kernel panic (needed it up so told illiterate staffer to press reset sw, 
hence no idea more than that of what happened). The dhcp daemon seems to 
silently die (no core, no log msgs) sometimes. Enlightenment segfaults 
sometimes. X goes insane (cursor leaping in random directions whenever mouse 
moves) - and no I DO NOT run gpm - whenever I switch back from a console. 
XFree86v4 has graphical oddities you have to see to believe and is unusable, 
despite the official support for the card and chipset. 

	Tests run:

I've run K7burn for a full day without an error. (funnily, the system ran 
like a dog...)
Ditto memtest.
Passes a memtest86 overnight without problems.
Disk I/O testing reveals no problems, hdparm settings affect nothing.
Tried running it without X for a few days but other symptoms persisted.
Tried unloading all non-critical modules, removing all but 128m of RAM (tried 
each DIMM, too), shutting down non-critical services, and aliasing 
auto-loading non-critical modules to "null". No luck.
Turned everything I could off in BIOS, after going to "failsafe" mode.

Its suprisingly slow for a duron 850 - my 650 @home is faster even if I 
remove the extra 256m of ram and bring it down to 128. 

Flame retardant: I know the services on this box are dumb, I'd love to change 
it, but I don't have the space, a spare box or 2, or other resources to 
separate these things out. I have _no_ available machines to replace this 
with except a couple of even-less-reliable p100s. 
	
	Hardware & Software

The box is an AMD Duron 850 w 256M of RAM. Via KT133 chipset, AC97 onboard 
audio. 2 NICs, a RealTek 8029 based card and a LinkSys EtherFast 10/100. Oh 
and a Cirrus Logic 6465 AGP video card. Its running debian sid (was potato, 
then woody, finially sid in a desparate attempt that maybe a recent s/w fix 
would eliminate the problems. Tried custom kernels to no avail).
	
	Services

The box runs a DHCP server, acts as a 'net gateway/firewall, runs Squid and a 
caching BIND (ports blocked to internet and daemons not bound to 
internet-exposed-nic's IP). This makes it quite an important box. I'd like to 
go back to woody if this proves to be a hardware problem.

	Problem
Wholesale hardware replacement not an option. Not even for this box. So is 
the response from on high.
Important machine.
*ARRGGGGGGHHHHHHH* sanity failing.


Any suggestions on further tests, possible cause/fixes, etc would be much 
appreciated. I'd attatch a screenshot of XFree86v4's behavior but for the 
obvious problems with that. Suffice to say, it starts up fine. Move the mouse 
and chunks of screen slide with it, smearing out a trail. Blocks of screen 
"teleport" around when graphics are updated. Etc.



More information about the plug mailing list