[plug] re: server freezing

Jon Miller jlmiller at mmtnetworks.com.au
Wed Jul 9 19:18:59 WST 2003


>>/sbin/lspci
[root at gfpmsql root]# /sbin/lspci 
00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
00:00.1 Host bridge: ServerWorks: Unknown device 0012
00:00.2 Host bridge: ServerWorks: Unknown device 0000
00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
00:0f.3 ISA bridge: ServerWorks: Unknown device 0225
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
02:01.0 Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet
Controller (rev 02)
02:02.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
02:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X
Gigabit Ethernet (rev 02)
05:03.0 RAID bus controller: IBM Netfinity ServeRAID controller

>>/sbin/lspci -vv

00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-

00:00.1 Host bridge: ServerWorks: Unknown device 0012
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-

00:00.2 Host bridge: ServerWorks: Unknown device 0000
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-

00:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
(prog-if 00 [VGA])
        Subsystem: IBM: Unknown device 0240
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping+ SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (2000ns min), cache line size 08
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at fd000000 (32-bit, non-prefetchable)
[size=16M]
        Region 1: I/O ports at 2200 [size=256]
        Region 2: Memory at febff000 (32-bit, non-prefetchable)
[size=4K]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [5c] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
        Subsystem: ServerWorks CSB5 South Bridge
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Latency: 64

00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) (prog-if
82 [Master PriP])
        Subsystem: ServerWorks CSB5 IDE Controller
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64, cache line size 08
        Region 0: I/O ports at 01f0 [size=8]
        Region 1: I/O ports at 03f4
        Region 2: I/O ports at 0170 [size=8]
        Region 3: I/O ports at 0374
        Region 4: I/O ports at 0700 [size=16]

00:0f.2 USB Controller: ServerWorks OSB4/CSB5 USB Controller (rev 05)
(prog-if 10 [OHCI])
        Subsystem: ServerWorks OSB4/CSB5 USB Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (20000ns max), cache line size 08
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at febfe000 (32-bit, non-prefetchable)
[size=4K]

00:0f.3 ISA bridge: ServerWorks: Unknown device 0225
        Subsystem: ServerWorks: Unknown device 0230
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0

00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=4
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=4
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=4
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Capabilities: [60] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=4
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
02:01.0 Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet
Controller (rev 02)
        Subsystem: Intel Corp. PRO/1000 XT Server Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (63750ns min), cache line size 08
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at fbfe0000 (32-bit, non-prefetchable)
[size=128K]
        Region 1: Memory at fbfc0000 (32-bit, non-prefetchable)
[size=128K]
        Region 2: I/O ports at 2300 [size=32]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [e4] PCI-X non-bridge device.
                Command: DPERE- ERO+ RBC=0 OST=0
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-   Capabilities: [f0]
Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000

02:02.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
        Subsystem: Adaptec 29160LP Low Profile Ultra160 SCSI Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (10000ns min, 6250ns max), cache line size 08
        Interrupt: pin A routed to IRQ 20
        BIST result: 00
        Region 0: I/O ports at 2400 [disabled] [size=256]
        Region 1: Memory at fbfbf000 (64-bit, non-prefetchable)
[size=4K]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

02:08.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X
Gigabit Ethernet (rev 02)
        Subsystem: IBM: Unknown device 026f
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (16000ns min), cache line size 08
        Interrupt: pin A routed to IRQ 29
        Region 0: Memory at fbfa0000 (64-bit, non-prefetchable)
[size=64K]
        Capabilities: [40] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=0
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-   Capabilities: [48] Power
Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+
Queue=0/3 Enable-
                Address: 0000000100000000  Data: 2f58

05:03.0 RAID bus controller: IBM Netfinity ServeRAID controller
        Subsystem: IBM: Unknown device 0259
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr+ Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64, cache line size 08
        Interrupt: pin A routed to IRQ 22
        Region 0: Memory at f4000000 (32-bit, prefetchable) [size=64M]
        Expansion ROM at <unassigned> [disabled] [size=32K]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-


>>[root at gfpmsql root]# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:   16506895   16507517   16507378   16507241    IO-APIC-edge  timer
  1:          1          1          1          1    IO-APIC-edge 
keyboard
  2:          0          0          0          0          XT-PIC 
cascade
  8:          1          0          0          0    IO-APIC-edge  rtc
 11:          0          0          0          0   IO-APIC-level 
usb-ohci
 12:         12          8          8         13    IO-APIC-edge  PS/2
Mouse
 14:          1          0          0          1    IO-APIC-edge  ide0
 18:       8502       8519       8504       8532   IO-APIC-level  eth0
 20:          4          4          4          4   IO-APIC-level 
aic7xxx
 22:       5211       5230       5251       5209   IO-APIC-level  ips
 29:     332445     332628     332689     332663   IO-APIC-level  eth1
NMI:          0          0          0          0 
LOC:   66029161   66029191   66029191   66029191 
ERR:          0
MIS:          0


>>[root at gfpmsql root]# uname -a
Linux gfpmsql 2.4.18-14smp #1 SMP Wed Sep 4 12:34:47 EDT 2002 i686 i686
i386 GNU/Linux

>> [root at gfpmsql root]# lsmod
Module                  Size  Used by    Not tainted
autofs                 13700   0  (autoclean) (unused)
tg3                    48392   1 
e1000                  56332   1 
iptable_filter          2412   0  (autoclean) (unused)
ip_tables              15608   1  [iptable_filter]
st                     31440   0  (unused)
mousedev                5688   0  (unused)
keybdev                 2976   0  (unused)
hid                    22404   0  (unused)
input                   6240   0  [mousedev keybdev hid]
usb-ohci               22056   0  (unused)
usbcore                80512   1  [hid usb-ohci]
ext3                   73024   2 
jbd                    56752   2  [ext3]
ips                    45088   3 
aic7xxx               138452   0  (unused)
sd_mod                 13552   6 
scsi_mod              110344   4  [st ips aic7xxx sd_mod]


Hope this helps.

While gathering this information the server froze, no errors nor
messages.

Jon


On Tue, 2003-07-08 at 01:28, Craig Ringer wrote:
> > IBM x235
> > 4 x 73GB SCSI U320 Drives
> > 2 GB memory
> > 2 x 10/100/1000 NIC
> > ServerRAID -5i Raid controller.
> 
> I have a dual Xeon running RH8 with a gigabit NIC (and 2x 10/100 NICs) 
> thats quite happy, but I'm using nice Intel NICs . I've heard bad things 
> about broadcom - perhaps you might want to see if you can borrow an 
> Intel NIC (buy a 10/100 or see if you can get a 10/100/1000 on loan, 
> whatever).
> 
> I'm also operating with 2 GB of RAM. The disk subsystem is different 
> (SATA RAID - was a PITA at first, but now works like a dream) but that 
> shouldn't really matter. Are the disks in RAID, and if so what type? If 
> they're not in a RAID array, try doing SMART queries on them (it 
> shouldn't happen, but sometimes even really top quality drives are DOA 
> or close to it).
> 
> I've observed a problem similar to that which you describe in the past, 
> and it turned out to be caused by the system trying to swap pages back 
> in from a swap partition on a dying HDD. I ended up replacing the entire 
> (basic PC hardware) machine. A few months later, the new machine started 
> doing the same thing - but that time I was getting syslog messages (DMA 
> errors etc) that clued me in to the problem. I think the first time the 
> bad areas must've been /only/ on swap space or rarely used bits of disk, 
> so I didn't get any useful messages. The disk tested "OK" with the 
> manufacturers disk utils, but proved stuffed when installed and thrashed 
> with bonnie++ overnight. Anyway, what I'm trying to say, fighting 
> against 1:30-am-itis, is "even if they're good disks, test them and make 
> sure you're not encountering a defective HDD." Most 
> *cough*westerndigital*cough* manufacturers disk tools don't suck, and 
> are capable of quering the drive SMART data (though they don't say as 
> much), so that tends to be a good start.
> 
> Your server probably has BIOS serial console support, as well as support 
> for IPMI. Most Xeon systems do AFAIK. I suggest that you look into these 
> and see if you can get more diagnostic information from it.
> 
> Also - posting /proc/interrupts and the output of both 'lspci' and 
> 'lspci -vvv' can be exceedingly useful when reading "it just crashes" 
> questions. Perhaps you could post this info?
> 
> People: please post detailed hardware info when dealing with potential 
> hardware issues such as lockups, crashes, unexplained stalling, and the 
> like. Think PCI devices list, interrupts, `uname -a`, loaded modules, 
> storage info (eg RAID type if any), etc). Someone always has to ask for 
> it anyway.
> 
> Oh .... if you decide the server is inexplicably FUBAR and it's easier 
> to replace than fix, can I have the old one? ;-)
> 
> *grin*
> 
> Craig Ringer
-- 
Jon Miller <jlmiller at mmtnetworks.com.au>
MMT Networks Pty Ltd




More information about the plug mailing list