[plug] re: server freezing
Jon Miller
jlmiller at mmtnetworks.com.au
Wed Jul 9 19:18:59 WST 2003
>>/sbin/lspci
[root at gfpmsql root]# /sbin/lspci
00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
00:00.1 Host bridge: ServerWorks: Unknown device 0012
00:00.2 Host bridge: ServerWorks: Unknown device 0000
00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
00:0f.3 ISA bridge: ServerWorks: Unknown device 0225
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
02:01.0 Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet
Controller (rev 02)
02:02.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
02:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X
Gigabit Ethernet (rev 02)
05:03.0 RAID bus controller: IBM Netfinity ServeRAID controller
>>/sbin/lspci -vv
00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00:00.1 Host bridge: ServerWorks: Unknown device 0012
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00:00.2 Host bridge: ServerWorks: Unknown device 0000
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
(prog-if 00 [VGA])
Subsystem: IBM: Unknown device 0240
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2000ns min), cache line size 08
Interrupt: pin A routed to IRQ 26
Region 0: Memory at fd000000 (32-bit, non-prefetchable)
[size=16M]
Region 1: I/O ports at 2200 [size=256]
Region 2: Memory at febff000 (32-bit, non-prefetchable)
[size=4K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [5c] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
Subsystem: ServerWorks CSB5 South Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 64
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if
82 [Master PriP])
Subsystem: ServerWorks CSB5 IDE Controller
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, cache line size 08
Region 0: I/O ports at 01f0 [size=8]
Region 1: I/O ports at 03f4
Region 2: I/O ports at 0170 [size=8]
Region 3: I/O ports at 0374
Region 4: I/O ports at 0700 [size=16]
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
(prog-if 10 [OHCI])
Subsystem: ServerWorks OSB4/CSB5 USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (20000ns max), cache line size 08
Interrupt: pin A routed to IRQ 11
Region 0: Memory at febfe000 (32-bit, non-prefetchable)
[size=4K]
00:0f.3 ISA bridge: ServerWorks: Unknown device 0225
Subsystem: ServerWorks: Unknown device 0230
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=4
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=4
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=4
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Capabilities: [60] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=4
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
02:01.0 Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet
Controller (rev 02)
Subsystem: Intel Corp. PRO/1000 XT Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 08
Interrupt: pin A routed to IRQ 18
Region 0: Memory at fbfe0000 (32-bit, non-prefetchable)
[size=128K]
Region 1: Memory at fbfc0000 (32-bit, non-prefetchable)
[size=128K]
Region 2: I/O ports at 2300 [size=32]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device.
Command: DPERE- ERO+ RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [f0]
Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
Address: 0000000000000000 Data: 0000
02:02.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
Subsystem: Adaptec 29160LP Low Profile Ultra160 SCSI Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (10000ns min, 6250ns max), cache line size 08
Interrupt: pin A routed to IRQ 20
BIST result: 00
Region 0: I/O ports at 2400 [disabled] [size=256]
Region 1: Memory at fbfbf000 (64-bit, non-prefetchable)
[size=4K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
02:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X
Gigabit Ethernet (rev 02)
Subsystem: IBM: Unknown device 026f
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (16000ns min), cache line size 08
Interrupt: pin A routed to IRQ 29
Region 0: Memory at fbfa0000 (64-bit, non-prefetchable)
[size=64K]
Capabilities: [40] PCI-X non-bridge device.
Command: DPERE- ERO- RBC=0 OST=0
Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Capabilities: [48] Power
Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+
Queue=0/3 Enable-
Address: 0000000100000000 Data: 2f58
05:03.0 RAID bus controller: IBM Netfinity ServeRAID controller
Subsystem: IBM: Unknown device 0259
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, cache line size 08
Interrupt: pin A routed to IRQ 22
Region 0: Memory at f4000000 (32-bit, prefetchable) [size=64M]
Expansion ROM at <unassigned> [disabled] [size=32K]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>>[root at gfpmsql root]# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 16506895 16507517 16507378 16507241 IO-APIC-edge timer
1: 1 1 1 1 IO-APIC-edge
keyboard
2: 0 0 0 0 XT-PIC
cascade
8: 1 0 0 0 IO-APIC-edge rtc
11: 0 0 0 0 IO-APIC-level
usb-ohci
12: 12 8 8 13 IO-APIC-edge PS/2
Mouse
14: 1 0 0 1 IO-APIC-edge ide0
18: 8502 8519 8504 8532 IO-APIC-level eth0
20: 4 4 4 4 IO-APIC-level
aic7xxx
22: 5211 5230 5251 5209 IO-APIC-level ips
29: 332445 332628 332689 332663 IO-APIC-level eth1
NMI: 0 0 0 0
LOC: 66029161 66029191 66029191 66029191
ERR: 0
MIS: 0
>>[root at gfpmsql root]# uname -a
Linux gfpmsql 2.4.18-14smp #1 SMP Wed Sep 4 12:34:47 EDT 2002 i686 i686
i386 GNU/Linux
>> [root at gfpmsql root]# lsmod
Module Size Used by Not tainted
autofs 13700 0 (autoclean) (unused)
tg3 48392 1
e1000 56332 1
iptable_filter 2412 0 (autoclean) (unused)
ip_tables 15608 1 [iptable_filter]
st 31440 0 (unused)
mousedev 5688 0 (unused)
keybdev 2976 0 (unused)
hid 22404 0 (unused)
input 6240 0 [mousedev keybdev hid]
usb-ohci 22056 0 (unused)
usbcore 80512 1 [hid usb-ohci]
ext3 73024 2
jbd 56752 2 [ext3]
ips 45088 3
aic7xxx 138452 0 (unused)
sd_mod 13552 6
scsi_mod 110344 4 [st ips aic7xxx sd_mod]
Hope this helps.
While gathering this information the server froze, no errors nor
messages.
Jon
On Tue, 2003-07-08 at 01:28, Craig Ringer wrote:
> > IBM x235
> > 4 x 73GB SCSI U320 Drives
> > 2 GB memory
> > 2 x 10/100/1000 NIC
> > ServerRAID -5i Raid controller.
>
> I have a dual Xeon running RH8 with a gigabit NIC (and 2x 10/100 NICs)
> thats quite happy, but I'm using nice Intel NICs . I've heard bad things
> about broadcom - perhaps you might want to see if you can borrow an
> Intel NIC (buy a 10/100 or see if you can get a 10/100/1000 on loan,
> whatever).
>
> I'm also operating with 2 GB of RAM. The disk subsystem is different
> (SATA RAID - was a PITA at first, but now works like a dream) but that
> shouldn't really matter. Are the disks in RAID, and if so what type? If
> they're not in a RAID array, try doing SMART queries on them (it
> shouldn't happen, but sometimes even really top quality drives are DOA
> or close to it).
>
> I've observed a problem similar to that which you describe in the past,
> and it turned out to be caused by the system trying to swap pages back
> in from a swap partition on a dying HDD. I ended up replacing the entire
> (basic PC hardware) machine. A few months later, the new machine started
> doing the same thing - but that time I was getting syslog messages (DMA
> errors etc) that clued me in to the problem. I think the first time the
> bad areas must've been /only/ on swap space or rarely used bits of disk,
> so I didn't get any useful messages. The disk tested "OK" with the
> manufacturers disk utils, but proved stuffed when installed and thrashed
> with bonnie++ overnight. Anyway, what I'm trying to say, fighting
> against 1:30-am-itis, is "even if they're good disks, test them and make
> sure you're not encountering a defective HDD." Most
> *cough*westerndigital*cough* manufacturers disk tools don't suck, and
> are capable of quering the drive SMART data (though they don't say as
> much), so that tends to be a good start.
>
> Your server probably has BIOS serial console support, as well as support
> for IPMI. Most Xeon systems do AFAIK. I suggest that you look into these
> and see if you can get more diagnostic information from it.
>
> Also - posting /proc/interrupts and the output of both 'lspci' and
> 'lspci -vvv' can be exceedingly useful when reading "it just crashes"
> questions. Perhaps you could post this info?
>
> People: please post detailed hardware info when dealing with potential
> hardware issues such as lockups, crashes, unexplained stalling, and the
> like. Think PCI devices list, interrupts, `uname -a`, loaded modules,
> storage info (eg RAID type if any), etc). Someone always has to ask for
> it anyway.
>
> Oh .... if you decide the server is inexplicably FUBAR and it's easier
> to replace than fix, can I have the old one? ;-)
>
> *grin*
>
> Craig Ringer
--
Jon Miller <jlmiller at mmtnetworks.com.au>
MMT Networks Pty Ltd
More information about the plug
mailing list