[plug] Dying Samba Server

Daniel Pearson (Flashware Solutions) daniel at flashware.net
Mon Nov 6 11:21:11 WST 2006


Daniel Pearson (Flashware Solutions) wrote:
> Tomasz Grzegurzko wrote:
>> On 11/2/06, Daniel Pearson (Flashware Solutions) 
>> <daniel at flashware.net> wrote:
>>> Daniel Pearson (Flashware Solutions) wrote:
>>> > Tomasz Grzegurzko wrote:
>>> >> On 11/2/06, Daniel Pearson (Flashware Solutions)
>>> >> <daniel at flashware.net> wrote:
>>> >>>
>>> >>>  Ok, so I've got a box running Ubuntu Server running kernel
>>> >>> 2.6.15-23-amd64
>>> >>> (that I'm using as a Samba DC/FS) - and in recent weeks it seems to
>>> >>> have
>>> >>> just completed halted at least once a week.
>>> >>>
>>> >>>  I've had a look through /var/log/messages kern.log and syslog and
>>> >>> can't
>>> >>> seem to find any 'error' messages in there.. where else should I be
>>> >>> looking?
>>> >>>
>>> >>>  TIA; Dan
>>> >>>
>>> >> Is there anything on the console when it "goes"? Kernel panics, HDD
>>> >> read errors, anything like that? I've found such `hard' lockups are
>>> >> usually the result of hardware failures but narrowing them down may
>>> >> require a peek at the console to verify if anything happened before
>>> >> that freeze.
>>> > That were my thoughts, also - there's no monitor attached to it, and
>>> > I'm never anywhere near it when it does.. it sits remotely..
>>> >
>>> > Is there a way to cat * in /var/log and search for 'error' and/or 
>>> 'hda' ?
>>> >
>>> Ok, a grep of messages.0 for 'hda' shows..
>>>
>>>
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009401.362845] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009421.432845] hda:
>>> dma_timer_expiry: dma status == 0x21
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009431.424727] hda: DMA
>>> timeout error
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009431.434985] hda: dma
>>> timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009431.499423] hda: status
>>> error: status=0x50 { DriveReady SeekComplete }
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009431.637266] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009451.708247] hda:
>>> dma_timer_expiry: dma status == 0x21
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009461.700129] hda: DMA
>>> timeout error
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009461.710460] hda: dma
>>> timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009461.776676] hda: status
>>> error: status=0x50 { DriveReady SeekComplete }
>>> Oct 23 11:53:51 flashware-svr01 kernel: [1009461.911689] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 23 14:56:58 flashware-svr01 kernel: [    0.000000] Bootdata ok
>>> (command line is root=/dev/hda1 ro quiet splash)
>>> Oct 23 14:56:58 flashware-svr01 kernel: [    0.000000] Kernel command
>>> line: root=/dev/hda1 ro quiet splash
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   13.774740]     ide0: BM-DMA
>>> at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   14.226099] hda: ST3200827A,
>>> ATA DISK drive
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   16.605512] hda: max request
>>> size: 1024KiB
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   16.651328] hda: 390721968
>>> sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, UDMA(100)
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   16.675675] hda: cache
>>> flushes supported
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   16.675732]  hda: hda1 
>>> hda2 <
>>> hda5 >
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   19.853766] EXT3-fs: hda1:
>>> orphan cleanup on readonly fs
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   19.889744] EXT3-fs: hda1: 2
>>> orphan inodes deleted
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   26.848552] Adding 2988048k
>>> swap on /dev/hda5.  Priority:-1 extents:1 across:2988048k
>>> Oct 23 14:56:58 flashware-svr01 kernel: [   26.993837] EXT3 FS on hda1,
>>> internal journal
>>> Oct 23 20:35:23 flashware-svr01 kernel: [20429.780772] hda:
>>> dma_timer_expiry: dma status == 0x21
>>> Oct 23 20:35:34 flashware-svr01 kernel: [20439.772416] hda: DMA timeout
>>> error
>>> Oct 23 20:35:34 flashware-svr01 kernel: [20439.778841] hda: dma timeout
>>> error: status=0x58 { DriveReady SeekComplete DataRequest }
>>> Oct 23 20:35:34 flashware-svr01 kernel: [20439.824050] hda: status
>>> error: status=0x50 { DriveReady SeekComplete }
>>> Oct 23 20:35:34 flashware-svr01 kernel: [20439.949375] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 23 20:35:54 flashware-svr01 kernel: [20460.005494] hda:
>>> dma_timer_expiry: dma status == 0x21
>>> Oct 23 20:36:04 flashware-svr01 kernel: [20469.997137] hda: DMA timeout
>>> error
>>> Oct 23 20:36:04 flashware-svr01 kernel: [20470.006199] hda: dma timeout
>>> error: status=0x58 { DriveReady SeekComplete DataRequest }
>>> Oct 23 20:36:04 flashware-svr01 kernel: [20470.069646] hda: status
>>> error: status=0x50 { DriveReady SeekComplete }
>>> Oct 23 20:36:04 flashware-svr01 kernel: [20470.203815] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 23 20:36:24 flashware-svr01 kernel: [20490.270181] hda:
>>> dma_timer_expiry: dma status == 0x21
>>> Oct 23 20:36:34 flashware-svr01 kernel: [20500.261823] hda: DMA timeout
>>> error
>>> Oct 23 20:36:34 flashware-svr01 kernel: [20500.272127] hda: dma timeout
>>> error: status=0x58 { DriveReady SeekComplete DataRequest }
>>> Oct 23 20:36:34 flashware-svr01 kernel: [20500.337236] hda: status
>>> error: status=0x50 { DriveReady SeekComplete }
>>> Oct 23 20:36:34 flashware-svr01 kernel: [20500.478237] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 23 20:36:54 flashware-svr01 kernel: [20520.544860] hda:
>>> dma_timer_expiry: dma status == 0x21
>>> Oct 23 20:37:05 flashware-svr01 kernel: [20530.536504] hda: DMA timeout
>>> error
>>> Oct 23 20:37:05 flashware-svr01 kernel: [20530.546891] hda: dma timeout
>>> error: status=0x58 { DriveReady SeekComplete DataRequest }
>>> Oct 23 20:37:05 flashware-svr01 kernel: [20530.609751] hda: status
>>> error: status=0x50 { DriveReady SeekComplete }
>>> Oct 23 20:37:05 flashware-svr01 kernel: [20530.742668] hda: status
>>> timeout: status=0xd0 { Busy }
>>> Oct 26 19:51:25 flashware-svr01 kernel: [276774.505573] hda: status
>>> timeout: status=0x80 { Busy }
>>> _______________________________________________
>>> PLUG discussion list: plug at plug.org.au
>>> http://www.plug.org.au/mailman/listinfo/plug
>>> Committee e-mail: committee at plug.linux.org.au
>>>
>>
>>
>> That is either a bad HDD or memory. The reason I say that is because
>> I/O operations work like this: HDD->memory, memory->CPU etc. So if the
>> RAM is bad, it will *look* like HDD errors. Though this is quite
>> telltale, more than likely HDD problems. You could try disabling DMA
>> (# hdparm -d 0 /dev/hda); see if that helps.
>>
>> See how you go.
>> Tomasz
> root at flashware-svr01:/var/log# hdparm -d 0 /dev/hda
>
> /dev/hda:
> setting using_dma to 0 (off)
> using_dma    =  0 (off)
>
>
> Done.... I'll keep you updated as more info comes to hand!
>

Ok, this does NOT look good..


root at flashware-svr01:/var/log# reboot
bash: /sbin/reboot: Input/output error

Dead HDD? :(



More information about the plug mailing list