[plug] re: server freezing
Denis Brown
dsbrown at cyllene.uwa.edu.au
Mon Jul 7 22:15:33 WST 2003
On Mon, 7 Jul 2003, Jon Miller wrote:
> Just did a line-by-line look at dmesg and saw something that I find
interesting, instead of stating it found 2 CPU processors it's actually
saying found 4 processors. There are only 2 Xeon processors.
>
G'day Jon.
Hey, can I have one of those? A spare Xeon or two might be ideal :-)
Seriously, as Craig said, it would be unlikely to be hardware, but I'd
like to contribute one thing... I have one of its smaller brothers, an
x205 and the interesting thing with this one is its network card (chipset)
which is by Broadcom. Fine chipset but has obviously suffered a painful
birth process. I got to a situation where my Debian installation, with
custom built kernel and from-Broadcom-website driver would come up after a
cold boot (shutdown -h now followed by power cycle) but would not come up
cleanly after a warm boot (shutdown -r now). The problem was the MAC
address had all zeroes, not so good for routing, etc after restart!
While that may seem irrelevant to your situation, along the way I came to
see quite a lot of the IBM web pages and a lot, through Googling, of
Broadcom driver code, user experiences (not all complimentary I might
add). It is obvious there have been firmware issues, apart from driver
issues. I am currently waiting approval from IBM to upgrade the Broadcom
chiposet foirmware to the latest revision (2.33) from its current 2.24c
level. "Approval" is necessary because the firmware is "supported only
on" something like the 335 model, not the lowly 205 and I'm not keen to
anger Mr IBM when it may come warranty time in the future :-)
While this would not explain necessarily keyboard lockups, it might none
the less be worth a look. Current firmware revision is found at
/proc/net/nicinfo/ethx.info where "x" will be 0 for the first card, 1 for
the second (if fitted) and so on.
The other thoughts are BIOS revision (I guess you've been there and done
that though) and the creation of a cron job that just logs an entry (date,
time?) to a text file by appending once every so often. Post-freeze you
could see if the guts of the system was working, even though it had shut
up shop as far as keyboard and monitor activity was concerned. That is
assuming that you don't get (someone's - Craig's?) suggestion of a serial
console idea going beforehand.
Oh, the other thought on this class of IBM boxes is their use of a snazzy
system health check - SmartPath, LightPath or somesuch they call it.
Supposed to report problems courtesy of some LEDs internal to the box,
right down to fan failures, etc, etc. Presume that's another "been
there, done that" exercise. :-(
HTH,
Denis
More information about the plug
mailing list