[plug] server failing with bizarre disk errors

Jon Miller jlmiller at mmtnetworks.com.au
Wed Apr 9 18:37:32 WST 2003


I should have been clearer about this subject so hopefully this will clear up what I was saying mainly in the queuing of instructions and the use of multiple devices in a system
To make a fair comparison between modern SCSI (SCSI-3) and ATA (ATA/ATAPI-6) you have to look at two different scenarios: Single device and multiple device environments. 

Single device 
This scenario is common in desktop computers where you connect a single device to a single adapter and perform data transfers. There is practically no difference between the two interfaces, this holds for bandwidth as well as resource usage (CPU) as both interfaces use the most efficient way to transfer data, namely DMA. This means that there is no point in purchasing a generally speaking more expensive SCSI based system when the cheaper ATA interface would do an equally good job. 

Multi device 
This scenario is common in high-end desktop computers and servers where you connect multiple devices to one or more interface adapters. This is where SCSI has major advantages compared to ATA: 

Connectivity: The ATA interface can only address two devices while SCSI can address eight devices (Narrow SCSI), 16 devices (Wide SCSI), 32 (Very Wide SCSI) or 126 (FireWire). There are also many peripherals available to SCSI only and not ATA. 
Bandwidth: The demand for high transfer rates in servers can not be met using current ATA interfaces based on the two devices per adapter limit and even if it could carry more devices there simply isn't enough bandwidth and flexibility available for serious server application. 
Efficiency: The ATA devices lack the intelligence to perform command queuing as well as their SCSI counterparts which can queue up to 256 commands per logical unit. SCSI hard disk drives aimed at the extreme performance server market have had a lot of research and development time on optimizing seek patterns and rescheduling commands to minimize seek times and maximize throughput. This may not be evident by looking at desktop benchmarks but under heavy server loads, this is evident.
Also, SCSI hard disk drives generally tend to be designed to work well in RAID-systems where I/O load is spread across multiple drives. 
Dependability: Most high-end SCSI hard drives are quite expensive but there are good reasons for it. They can sustain higher temperatures and stay mechanically functional despite the expansion of the metal parts with temperature and and generelly have better build quality. The net result is that they are the natural choice for enterprise server applications. Connectors suitable for hot-swapping drives in RAID-systems is something only SCSI boasts, and helps maintaining large disk arrays where down-time is unacceptable. 

I had to write this up on a project comparing the differences.



Jon L. Miller, MCNE, CNS
Director/Sr Systems Consultant
MMT Networks Pty Ltd
http://www.mmtnetworks.com.au

"I don't know the key to success, but the key to failure
 is trying to please everybody." -Bill Cosby



>>> jlmiller at mmtnetworks.com.au 6:09:22 PM 9/04/2003 >>>
One thing that can cause drives to behave erratically is not enough power, what size PSU is in the machine? I'm assuming there are good fans cooling off the drives.
Another issue to look at is using drives with small cache, they are not designed for servers but for workstations.  When I did my training with Compaq the one thing they stressed to us was the proper use of HDD in servers.  They stressed in a production environment NEVER under any circumstances use an IDE drive.  The data will create a bottleneck effect.  This is due to a server "serves" multiple requests and a workstation only does "requests" and that is done one-at-a-time.  For this reason a SCSI system should always be used in a server. You may want to consider this.  Yes, I know most would argue that the EIDE drives are closing in on the EIDE drives, but it's not the speed that is the issue, it's the number of instructions that the processor is precessing to the drives that is causing the issue.  Workstations IDE drives can only do so many per cycle.  Whereas the server has to do a whole lot more in the same time.
I'm not going to get into the depth of this on the list.  But I suggest since this is a production server that it be setup as a production server.


Jon L. Miller, MCNE, CNS
Director/Sr Systems Consultant
MMT Networks Pty Ltd
http://www.mmtnetworks.com.au 

"I don't know the key to success, but the key to failure
 is trying to please everybody." -Bill Cosby



>>> craig at postnewspapers.com.au 5:23:20 PM 9/04/2003 >>>
> If you find out what it is, please let me know, all I can tell you is that
> unplugging and replugging the cables, and replacing them doesn't work. (for
> me anyway) Maybe moving to another IDE controller will help. I've been
> having these insurmountable opportunities for close to 2 years - goes away
> for a while, and then strikes at full strength.

Already tried a cable change.

Out of interest, what chipset and hdds do /you/ use? Mine are all 
western digital JBs, an 80g and a 120g in this machine.

This honestly doesn't look like anything as simple as "bad drive". I've 
had 3 servers do this in 6 months ( a succession in fact ) and have 
replaced drives, RAM, motherboards, CPUs, cables, PSUs  - everything.

This is only the latest machine to do it. What's worse - my home machine 
is showing signs of going the same way, but in its case there are 
clearly identifiable drive issues (two unreadable files that cause DMA 
timeouts and CRC errors. When overwritten with zeros, problem goes away).

I've heard of laptops doing stuff like this with dodgy PM, which is why 
I've disabled power management - no change.

Craig









More information about the plug mailing list