Res: Res: [squid-users] squid 3.2.0.5 smp scaling issues

From: Marcos <mczueira_at_yahoo.com.br>
Date: Mon, 25 Apr 2011 12:15:29 -0700 (PDT)

thanks for your answer David. i'm seeing too much feature been included at squid 3.x, but it's getting as slower as new features are added. i think squid 3.2 with 1 worker should be as fast as 2.7, but it's getting slower e hungry. Marcos ----- Mensagem original ---- De: "david_at_lang.hm" <david_at_lang.hm> Para: Marcos <mczueira_at_yahoo.com.br> Cc: Amos Jeffries <squid3_at_treenet.co.nz>; squid-users_at_squid-cache.org; squid-dev_at_squid-cache.org Enviadas: Sexta-feira, 22 de Abril de 2011 15:10:44 Assunto: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues ping, I haven't seen a response to this additional information that I sent out last week. squid 3.1 and 3.2 are a significant regression in performance from squid 2.7 or 3.0 David Lang On Thu, 14 Apr 2011, david_at_lang.hm wrote: > Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues > > Ok, I finally got a chance to test 2.7STABLE9 > > it performs about the same as squid 3.0, possibly a little better. > > with my somewhat stripped down config (smaller regex patterns, replacing CIDR >blocks and names that would need to be looked up in /etc/hosts with individual >IP addresses) > > 2.7 gives ~4800 requests/sec > 3.0 gives ~4600 requests/sec > 3.2.0.6 with 1 worker gives ~1300 requests/sec > 3.2.0.6 with 5 workers gives ~2800 requests/sec > > the numbers for 3.0 are slightly better than what I was getting with the full >ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from the >last round of tests (with either the full or simplified ruleset) > > so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the >ability to use multiple worker processes in 3.2 doesn't make up for this. > > the time taken seems to almost all be in the ACL avaluation as eliminating all >the ACLs takes 1 worker with 3.2 up to 4200 requests/sec. > > one theory is that even though I have IPv6 disabled on this build, the added >space and more expensive checks needed to compare IPv6 addresses instead of IPv4 >addresses accounts for the single worker drop of ~66%. that seems rather >expensive, even though there are 293 http_access lines (and one of them uses >external file contents in it's acls, so it's a total of ~2400 source/destination >pairs, however due to the ability to shortcut the comparison the number of tests >that need to be done should be <400) > > > > In addition, there seems to be some sort of locking betwen the multiple worker >processes in 3.2 when checking the ACLs as the test with almost no ACLs scales >close to 100% per worker while with the ACLs it scales much more slowly, and >above 4-5 workers actually drops off dramatically (to the point where with 8 >workers the throughput is down to about what you get with 1-2 workers) I don't >see any conceptual reason why the ACL checks of the different worker threads >should impact each other in any way, let alone in a way that limits scalability >to ~4 workers before adding more workers is a net loss. > > David Lang > > >> On Wed, 13 Apr 2011, Marcos wrote: >> >>> Hi David, >>> >>> could you run and publish your benchmark with squid 2.7 ??? >>> i'd like to know if is there any regression between 2.7 and 3.x series. >>> >>> thanks. >>> >>> Marcos >>> >>> >>> ----- Mensagem original ---- >>> De: "david_at_lang.hm" <david_at_lang.hm> >>> Para: Amos Jeffries <squid3_at_treenet.co.nz> >>> Cc: squid-users_at_squid-cache.org; squid-dev_at_squid-cache.org >>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12 >>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues >>> >>> On Sat, 9 Apr 2011, Amos Jeffries wrote: >>> >>>> On 09/04/11 14:27, david_at_lang.hm wrote: >>>>> A couple more things about the ACLs used in my test >>>>> >>>>> all of them are allow ACLs (no deny rules to worry about precidence of) >>>>> except for a deny-all at the bottom >>>>> >>>>> the ACL line that permits the test source to the test destination has >>>>> zero overlap with the rest of the rules >>>>> >>>>> every rule has an IP based restriction (even the ones with url_regex are >>>>> source -> URL regex) >>>>> >>>>> I moved the ACL that allows my test from the bottom of the ruleset to >>>>> the top and the resulting performance numbers were up as if the other >>>>> ACLs didn't exist. As such it is very clear that 3.2 is evaluating every >>>>> rule. >>>>> >>>>> I changed one of the url_regex rules to just match one line rather than >>>>> a file containing 307 lines to see if that made a difference, and it >>>>> made no significant difference. So this indicates to me that it's not >>>>> having to fully evaluate every rule (it's able to skip doing the regex >>>>> if the IP match doesn't work) >>>>> >>>>> I then changed all the acl lines that used hostnames to have IP >>>>> addresses in them, and this also made no significant difference >>>>> >>>>> I then changed all subnet matches to single IP address (just nuked /## >>>>> throughout the config file) and this also made no significant difference. >>>>> >>>> >>>> Squid has always worked this way. It will *test* every rule from the top down >>>>to the one that matches. Also testing each line left-to-right until one fails or >>>>the whole line matches. >>>> >>>>> >>>>> so why are the address matches so expensive >>>>> >>>> >>>> 3.0 and older IP address is a 32-bit comparison. >>>> 3.1 and newer IP address is a 128-bit comparison with memcmp(). >>>> >>>> If something like a word-wise comparison can be implemented faster than >>>>memcmp() we would welcome it. >>> >>> I wonder if there should be a different version that's used when IPv6 is >>>disabled. this is a pretty large hit. >>> >>> if the data is aligned properly, on a 64 bit system this should still only be 2 >>>compares. do you do any alignment on the data now? >>> >>>>> and as noted in the e-mail below, why do these checks not scale nicely >>>>> with the number of worker processes? If they did, the fact that one 3.2 >>>>> process is about 1/3 the speed of a 3.0 process in checking the acls >>>>> wouldn't matter nearly as much when it's so easy to get an 8+ core system. >>>>> >>>> >>>> There you have the unknown. >>> >>> I think this is a fairly critical thing to figure out. >
Received on Mon Apr 25 2011 - 19:15:36 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 26 2011 - 12:00:03 MDT