[plug] server failing with bizarre disk errors

Jon Miller jlmiller at mmtnetworks.com.au
Wed Apr 9 18:51:23 WST 2003



Jon L. Miller, MCNE, CNS
Director/Sr Systems Consultant
MMT Networks Pty Ltd
http://www.mmtnetworks.com.au

"I don't know the key to success, but the key to failure
 is trying to please everybody." -Bill Cosby



>>> craig at postnewspapers.com.au 6:20:17 PM 9/04/2003 >>>
> One thing that can cause drives to behave erratically is not enough power, what size PSU is in the machine?

500W but you know how those things are. I'm installing a new Enermax 
550W (honestly 550W capable not the usual BS) tonight.

> I'm assuming there are good fans cooling off the drives.

Yeah. My home box doesn't have drive bay cooling and that's the cause of 
/its/ problems I suspect. Very different to this though.

> Another issue to look at is using drives with small cache, they are not designed for servers but for workstations.  

8mb of cache is not overly small.
JLM> Are yours the 8MB or 2MB cache? from what I can on WD site these look like the 2MB cache.

> When I did my training with Compaq the one thing they stressed to us was the proper use of HDD in servers.  They stressed in a production environment NEVER under any circumstances use an IDE drive.  The data will create a bottleneck effect.  This is due to a server "serves" multiple requests and a workstation only does "requests" and that is done one-at-a-time.  For this reason a SCSI system should always be used in a server. You may want to consider this.  Yes, I know most would argue that the EIDE drives are closing in on the EIDE drives, but it's not the speed that is the issue, it's the number of instructions that the processor is precessing to the drives that is causing the issue.  Workstations IDE drives can only do so many per cycle.  Whereas the server has to do a whole lot more in the same time.

(a) lots of workstations need multiple accesses in progress for decent 
interactivity
(b) Some modern ATA drives and chipsets support TCQ, allowing multiple 
in-flight commands SCSI style.
(c) SATA has solved the issue anyway.

While I'm sure Compaq had some good technical reasons, I wouldn't be 
surprised if they also wanted to sell more SCSI disks in their servers ;-)

JLM> lol, yeah that too could be true, but I've found in our test using a Compaq, IBM vs a white box server that as long as they were using the SCSI subsystems we didn't have much differences in data transfer to host.  But when we put in a IDE server with the fastest drive and largest cache at the time (before the 8MB), we saw a large difference in transferring large data, The Compaq and IBM was much faster than the IDE server although not much in smaller data transfer.  

I do see your point though - the HDD manufacturers tune the drive 
firmware for "desktop" access patterns - but that is not a reason for 
crashes and total access failures, just poor performance. The demands on 
this machine are significant, but nothing like a video editing 
workstation or even a personal database.

JLM> true, but what I'm saying is the poorer performance is what may be causing your subsystem to "choke" on it's own data.  Keep in mind the slowest memory holding device is the HDD.  Therefore, it will be this subsystem that will "bottleneck" your system under extreme stress.  If the data isn't processed in a certain lime frame a crash or errors will be generated. UDMA has helped in that it is has direct access to memory, but when it goes to write the data or read data from the drive (because it is not cached) the I/O is going to be slow when it comes to handling a large chunk of data.  Not much of an issue in smaller data transfers.  This is why I asked does this happen when a lot of processing is taking palce.


Craig Ringer







More information about the plug mailing list