Re: Re: [squid-users] Squid high bandwidth IO issue (ramdisk SSD) from Adrian Chadd on 2009-08-04 (squid-users)

From: Adrian Chadd <adrian_at_squid-cache.org>
Date: Tue, 4 Aug 2009 14:57:43 +0800

How much disk IO is going on when the CPU shows 70% IOWAIT? Far too
much. The CPU time spent in CPU IOWAIT shouldn't be that high. I think
you really should consider trying an alternative disk controller.

adrian

2009/8/4 smaugadi <adi_at_binat.net.il>:
>
> Dear Adrian and Heinz,
> Sorry for the delayed replay and thanks for all the help so far.
> I have tried changing the file system (ext2 and ext3), changed the
> partitioning geometry (fdisk -H 224 -S 56) as I read that this would improve
> performance with SSD.
> I tried ufs, aufs and even coss (downgrade to 2.6). (By the way the average
> object size is 13KB).
> And failed!
>
> From system monitoring during the squid degradation I saw:
>
> /usr/local/bin/iostat -dk -x 1 1000 sdb
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 4.00 0.00 72.00 36.00
> 155.13 25209.75 250.25 100.10
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 4.00 0.00 16.00 8.00
> 151.50 26265.50 250.50 100.20
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 3.00 0.00 12.00 8.00
> 147.49 27211.33 333.33 100.00
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 4.00 0.00 32.00 16.00
> 144.54 28311.25 250.25 100.10
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 4.00 0.00 100.00 50.00
> 140.93 29410.25 250.25 100.10
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 4.00 0.00 36.00 18.00
> 137.00 30411.25 250.25 100.10
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await svctm %util
> sdb 0.00 0.00 0.00 2.00 0.00 8.00 8.00
> 133.29 31252.50 500.50 100.10
>
> As soon as the service time increases above 200MS problems start, also the
> total time for service (time in queue + service time) goes all the way to 32
> sec.
>
> This is from mpstat at the same time:
>
> 09:33:56 AM CPU %user %nice %sys %iowait %irq %soft %steal
> %idle intr/s
> 09:33:58 AM all 3.00 0.00 2.25 84.02 0.12 2.75 0.00
> 7.87 9782.00
> 09:33:58 AM 0 3.98 0.00 2.99 72.64 0.00 3.98 0.00
> 16.42 3971.00
> 09:33:58 AM 1 2.01 0.00 1.01 80.40 0.00 1.51 0.00
> 15.08 1542.00
> 09:33:58 AM 2 2.51 0.00 2.01 92.96 0.00 2.51 0.00
> 0.00 1763.50
> 09:33:58 AM 3 3.02 0.00 3.02 90.95 0.00 3.02 0.00
> 0.00 2506.00
>
> 09:33:58 AM CPU %user %nice %sys %iowait %irq %soft %steal
> %idle intr/s
> 09:34:00 AM all 0.50 0.00 0.25 74.12 0.00 0.62 0.00
> 24.50 3833.50
> 09:34:00 AM 0 0.50 0.00 0.50 0.00 0.00 1.00 0.00
> 98.00 2015.00
> 09:34:00 AM 1 0.50 0.00 0.00 98.51 0.00 1.00 0.00
> 0.00 544.50
> 09:34:00 AM 2 0.50 0.00 0.00 99.50 0.00 0.00 0.00
> 0.00 507.00
> 09:34:00 AM 3 0.50 0.00 0.00 99.00 0.00 0.50 0.00
> 0.00 766.50
>
> 09:34:00 AM CPU %user %nice %sys %iowait %irq %soft %steal
> %idle intr/s
> 09:34:02 AM all 0.12 0.00 0.25 74.53 0.00 0.12 0.00
> 24.97 1751.50
> 09:34:02 AM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 100.00 1155.50
> 09:34:02 AM 1 0.00 0.00 0.50 99.50 0.00 0.00 0.00
> 0.00 230.50
> 09:34:02 AM 2 0.00 0.00 0.00 100.00 0.00 0.00 0.00
> 0.00 220.00
> 09:34:02 AM 3 0.00 0.00 0.50 99.50 0.00 0.00 0.00
> 0.00 146.00
>
> 09:34:02 AM CPU %user %nice %sys %iowait %irq %soft %steal
> %idle intr/s
> 09:34:04 AM all 1.25 0.00 1.50 74.97 0.00 0.00 0.00
> 22.28 1607.50
> 09:34:04 AM 0 5.47 0.00 5.47 0.00 0.00 0.00 0.00
> 89.05 1126.00
> 09:34:04 AM 1 0.00 0.00 0.00 100.00 0.00 0.00 0.00
> 0.00 158.50
> 09:34:04 AM 2 0.00 0.00 0.50 98.51 0.50 0.50 0.00
> 0.00 175.50
> 09:34:04 AM 3 0.00 0.00 0.00 100.00 0.00 0.00 0.00
> 0.00 147.00
>
> Well, some times you eat the bear and some times the bears eat you.
>
> Do you have any more ideas?
> Regards,
> Adi.
>
>
>
>
> Adrian Chadd-3 wrote:
>>
>> 2009/8/2 Heinz Diehl <htd_at_fancy-poultry.org>:
>>
>>> 1. Change cache_dir in squid from ufs to aufs.
>>
>> That is almost always a good idea for any decent performance under any
>> sort of concurrent load. I'd like proof otherwise - if one finds it,
>> it indicates something which should be fixed.
>>
>>> 2. Format /dev/sdb1 with "mkfs.xfs -f -l lazy-count=1,version=2 -i attr=2
>>> -d agcount=4"
>>> 3. Mount it afterwards using
>>> "rw,noatime,logbsize=256k,logbufs=2,nobarrier" in fstab.
>>
>>> 4. Use cfq as the standard scheduler with the linux kernel
>>
>> Just out of curiousity, why these settings? Do you have any research
>> which shows this?
>>
>>> (Btw: on my systems, squid-2.7 is noticeably _a lot_ slower than squid-3,
>>> if the object is not in cache...)
>>
>> This is an interesting statement. I can't think of any specific reason
>> why there should be any particular reason squid-2.7 performs worse
>> than Squid-3 in this instance. This is the kind of "works by magic"
>> stuff which deserves investigation so the issue(s) can be fully
>> understood. Otherwise you may find that a regression creeps up in
>> later Squid-3 versions because all of the issues weren't fully
>> understood and documented, and some coder makes a change which they
>> think won't have as much of an effect as it does. It has certainly
>> happened before in squid. :)
>>
>> So, "more information please."
>>
>>
>>
>> Adrian
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Squid-high-bandwidth-IO-issue-%28ramdisk-SSD%29-tp24775448p24803136.html
> Sent from the Squid - Users mailing list archive at Nabble.com.
>
>
Received on Tue Aug 04 2009 - 06:57:53 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 04 2009 - 12:00:03 MDT