Re: [squid-users] RAID is good

From: Marcus Kool <marcus.kool@dont-contact.us>
Date: Wed, 26 Mar 2008 11:30:44 -0300

The point of why I started the discussion is that the statement in the wiki
"Do not use RAID under any circumstances" is at least outdated.

Most companies will trade in performance for reliability because they depend
on internet access for their business and cannot afford to have 2-48 hours
of unavailability.

Everybody knows that EMC and HP systems are much more expensive than
a JBOD but this is not a valid reason to say "Never use RAID".
"Never use RAID" implies that RAID is *BAD* which is simply not true.

 From my point of view, the wiki should say something like:

If you want cheapest, modest performance, with no availability guarantees use JBOD.
If you want cheap, modest performance and availability use RAID1/RIAD5 without
a sophisticated disk array (preferably with a RAID card that has
128+ MB battery-backed write cache).
If you want cheapest availability use RAID5 without a sophisticated disk array
If you want expensive extreme performance and availability use a sophisticated disk array.

-Marcus

Adrian Chadd wrote:
> And I'd completely agree with you; because you're comparing $EXPENSIVE
> attached storage (that generally is run as RAID) to $NOT_SO_EXPENSIVE
> local storage which doesn't have .. well, all the fruit.
>
> The EMC disk arrays, when treated as JBOD's, won't be faster. They're faster
> because you're rolling massive caches on top of RAID5+striping, or RAID1+0,
> etc.
>
> The trouble is this - none of us have access to high end storage kit,
> so developing solutions that'll work there is just not going to happen.
>
> I've just acquired a 14 disk compaq storageworks array, so at least I have
> $MANY disks to benchmark against, but its still effectively direct attach
> JBOD rather than hardware RAID.
>
> Want this fixed? Partner with someone who can; or do the benchmarks yourself
> and publish some results. My experience with hardware RAID5 cards attached
> to disk arrays (ie, read -not- intelligent disk shelves like EMC, etc)
> is that RAID5 is somewhat slower for the Squid IO patterns. I'd repeat that,
> but I don't have a U320-enabled RAID5 card here to talk to this shelf.
>
>
>
>
> Adrian
>
> On Tue, Mar 25, 2008, Ben Hollingsworth wrote:
>>>>> One should also consider the difference between
>>>>> simple RAID and extremely advanced RAID disk systems
>>>>> (i.e. EMC and other arrays).
>>>>> The external disk arrays like EMC with internal RAID5 are simply faster
>>>>> than a JBOD of internal disks.
>>> How many write-cycles does EMC use to backup data after one
>>> system-used write cycle?
>>> How may CPU cycles does EMC spend figuring out which disk the
>>> file-slice is located on, _after_ squid has already hashed the file
>>> location to figure out which disk the file is located on?
>>>
>>> Regardless of speed, unless you can provide a RAID system which has
>>> less than one hardware disk-io read/write per system disk-io
>>> read/write you hit these theoretical limits.
>> I can't quote disk cycle numbers, but I know that our fiber-connected HP
>> EVA8000's (with ginormous caches and LUNs spread over 72 spindles, even
>> at RAID5) are one hell of a lot faster than the local disks. The 2 Gbps
>> fiber connection is the limiting factor for most of our high-bandwidth
>> apps. In our shop, squid is pretty low bandwidth by comparison. We
>> normally hover around 100 req/sec with occasional peaks at 200 req/sec.
>>
>>> But its not so much a problem of human-noticable absolute-time as a
>>> problem of underlying duplicated disk-io-cycles and
>>> processor-io-cycles and processor delays remains.
>>>
>>> For now the CPU half of the problem gets masked by the
>>> single-threadedness of squid (never though you'd see that being a
>>> major benefit eh?). If squid begins using all the CPU threads the OS
>>> will loose out on its spare CPU cycles on dual-core machines and RAID
>>> may become a noticable problem there.
>> Your arguments are valid for software RAID, but not for hardware RAID.
>> Most nicer systems have a dedicated disk controller with its own
>> processor that handles nothing but the onboard RAID. A fiber-connected
>> disk array is conceptually similar, but with more horsepower. The CPU
>> never has to worry about overhead in this case. Perhaps for these
>> scenarios, squid could use a config flag that tells it to put everything
>> on one "disk" (as it sees it) and not bother imposing any of its own
>> overhead for operations that will already be done by the array controller.
>>
>
>> begin:vcard
>> fn:Ben Hollingsworth
>> n:Hollingsworth;Ben
>> org:BryanLGH Health System;Information Technology
>> adr:;;1600 S. 48th St.;Lincoln;NE;68506;USA
>> email;internet:ben.hollingsworth@bryanlgh.org
>> title:Systems Programmer
>> tel;work:402-481-8582
>> tel;fax:402-481-8354
>> tel;cell:402-432-5334
>> url:http://www.bryanlgh.org
>> version:2.1
>> end:vcard
>>
>
>
Received on Wed Mar 26 2008 - 08:30:59 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Apr 01 2008 - 13:00:05 MDT