[plug] 2.4.24 mremap root exploit

Craig Ringer craig at postnewspapers.com.au
Thu Feb 19 09:05:04 WST 2004


On Thu, 2004-02-19 at 08:48, Bernd Felsche wrote:
> On Thu, Feb 19, 2004 at 07:38:30AM +0800, Craig Ringer wrote:
> > On Thu, 2004-02-19 at 07:31, Craig Ringer wrote:
> 
> > > I'm currently playing with Bonnie++ to see what thrashing the system
> > > turns up. As it's running live services, the benchmarks won't be fair,
> > > but should be somewhat informative nonetheless.
> > 
> > *gack*
> 
> > I had to abort the Bonnie++ run on the RAID 5 array, because the system
> > was crawling to a halt. Load avg >15, ls blocking for 30 seconds, etc.
> > Bad stuff.
> 
> RAID5 + lots of writes == BAD on all systems.

I realise that. However, that's no reason why reads from a separate disk
array should be totally starved, just because the RAID 5 array is being
thrashed to a puddle of molten metal. 

I suspect from recent reading and a quick look at the driver that the
controller is keeping a deep controller-wide or driver-wide queue of
I/O, meaning that about 256 writes need to be serviced on the RAID5
array before the controller gets around to doing a read from the RAID1
array. I'm hoping to reduce the queue depth to about 16 I/Os to see if
that helps - after all, the kernel will still be queueing them up before
sending them to the controller, and without TCQ-capable disks there's
little point in a deep queue on the controller.

> No matter how good the operating system, RAID5 sustained-write
> performance is best rated in nanopascals; it sucks. A write has in
> effect to be replicated across all members of the array; so
> tripling/quadruling/quintupling/... disk activity is expected.

Well, if I understand RAID5 properly a write must occur on at least two
disks (the data write, and the parity write) after a read off the other
one. This means that on a 3 disk array at most one write _or_ three
reads may be in flight at once. 4 disk RAID 5 apparently performs a bit
better, as some of the time a read and a write can be in flight at once,
but it still sucks. AFAIK the killer difference is that in RAID 5 a read
must be performed before writing the parity data, in order to get the
data from the 3rd disk to XOR against the newly written data. Icky.

AFAIK  RAID 1 also requires writes to both disks, but avoids the
read-before-write grossness.

> Bonnie++ will likely cause any RAID5 system to resemble
> inter-galactic void.

That's to be expected; however, it should not cause the RAID1 system on
the same controller to also closely resemble said void.

> Some RAID5 systems put a large, non-volatile cache in the system so
> that the host can go away with the delusion that all writes have
> actually happened. That is IMO; bad. One of my customers lost half a
> day's transactions because the disk controller received a reset when
> a hot-plug drive was replaced - cache was emptied and writes were
> "forgotten" resulting in a corrupted database.

Ouch. The RAID controller here has (if I remember correctly) a 128MB
buffer, but by default it only does read caching for exactly that
reason. 

I find large write caches a little less offensive when the cache is
battery backed, but that still won't protect you from some faults (like
the controller reset you mentioned).

Craig Ringer




More information about the plug mailing list