Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

From: <david_at_lang.hm>
Date: Mon, 25 Apr 2011 16:23:36 -0700 (PDT)

On Mon, 25 Apr 2011, Alex Rousskov wrote:

> On 04/14/2011 09:06 PM, david_at_lang.hm wrote:
>> Ok, I finally got a chance to test 2.7STABLE9
>>
>> it performs about the same as squid 3.0, possibly a little better.
>>
>> with my somewhat stripped down config (smaller regex patterns, replacing
>> CIDR blocks and names that would need to be looked up in /etc/hosts with
>> individual IP addresses)
>>
>> 2.7 gives ~4800 requests/sec
>> 3.0 gives ~4600 requests/sec
>> 3.2.0.6 with 1 worker gives ~1300 requests/sec
>> 3.2.0.6 with 5 workers gives ~2800 requests/sec
>
> Glad you did not see a significant regression between v2.7 and v3.0. We
> have heard rather different stories. Every environment is different, and
> many lab tests are misguided, of course, but it is still good to hear
> positive reports.
>
> The difference between v3.2 and v3.0 is known and have been discussed on
> squid-dev. A few specific culprits are also known, but more need to be
> identified. We are working on identifying these performance bugs and
> reducing that difference.

let me know if there are any tests that I can run that will help you.

> As for 1 versus 5 worker difference, it seems to be specific to your
> environment (as discussed below).
>
>
>> the numbers for 3.0 are slightly better than what I was getting with the
>> full ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I
>> got from the last round of tests (with either the full or simplified
>> ruleset)
>>
>> so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and
>> the ability to use multiple worker processes in 3.2 doesn't make up for
>> this.
>>
>> the time taken seems to almost all be in the ACL avaluation as
>> eliminating all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec.
>
> If ACLs are the major culprit in your environment, then this is most
> likely not a problem in Squid source code. AFAIK, there are no locks or
> other synchronization primitives/overheads when it comes to Squid ACLs.
> The solution may lie in optimizing some 3rd-party libraries (used by
> ACLs) or in optimizing how they are used by Squid, depending on what
> ACLs you use. As far as Squid-specific code is concerned, you should see
> nearly linear ACL scale with the number of workers.

given that my ACLs are IP/port matches or regex matches (and I've tested
replacing the regex matches with IP matches with no significant change in
performance), what components would be used.

>
>> one theory is that even though I have IPv6 disabled on this build, the
>> added space and more expensive checks needed to compare IPv6 addresses
>> instead of IPv4 addresses accounts for the single worker drop of ~66%.
>> that seems rather expensive, even though there are 293 http_access lines
>> (and one of them uses external file contents in it's acls, so it's a
>> total of ~2400 source/destination pairs, however due to the ability to
>> shortcut the comparison the number of tests that need to be done should
>> be <400)
>
> Yes, IPv6 is one of the known major performance regression culprits, but
> IPv6 ACLs should still scale linearly with the number of workers, AFAICT.
>
> Please note that I am not an ACL expert. I am just talking from the
> overall Squid SMP design point of view and from our testing/deployment
> experience point of view.

that makes sense and is what I would have expected, but in my case (lots
of ACLs) I am seeing a definante problem with more workers not completing
more work, and beyond about 5 workers I am seeing the total work being
completed drop. I can't think of any reason besides locking that this may
be the case.

>> In addition, there seems to be some sort of locking betwen the multiple
>> worker processes in 3.2 when checking the ACLs
>
> There are pretty much no locks in the current official SMP code. This
> will change as we start adding shared caches in a week or so, but even
> then the ACLs will remain lock-free. There could be some internal
> locking in the 3rd-party libraries used by ACLs (regex and such), but I
> do not know much about them.

what are the 3rd party libraries that I would be using?

David Lang

>
> HTH,
>
> Alex.
>
>
>>> On Wed, 13 Apr 2011, Marcos wrote:
>>>
>>>> Hi David,
>>>>
>>>> could you run and publish your benchmark with squid 2.7 ???
>>>> i'd like to know if is there any regression between 2.7 and 3.x series.
>>>>
>>>> thanks.
>>>>
>>>> Marcos
>>>>
>>>>
>>>> ----- Mensagem original ----
>>>> De: "david_at_lang.hm" <david_at_lang.hm>
>>>> Para: Amos Jeffries <squid3_at_treenet.co.nz>
>>>> Cc: squid-users_at_squid-cache.org; squid-dev_at_squid-cache.org
>>>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12
>>>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues
>>>>
>>>> On Sat, 9 Apr 2011, Amos Jeffries wrote:
>>>>
>>>>> On 09/04/11 14:27, david_at_lang.hm wrote:
>>>>>> A couple more things about the ACLs used in my test
>>>>>>
>>>>>> all of them are allow ACLs (no deny rules to worry about precidence
>>>>>> of)
>>>>>> except for a deny-all at the bottom
>>>>>>
>>>>>> the ACL line that permits the test source to the test destination has
>>>>>> zero overlap with the rest of the rules
>>>>>>
>>>>>> every rule has an IP based restriction (even the ones with
>>>>>> url_regex are
>>>>>> source -> URL regex)
>>>>>>
>>>>>> I moved the ACL that allows my test from the bottom of the ruleset to
>>>>>> the top and the resulting performance numbers were up as if the other
>>>>>> ACLs didn't exist. As such it is very clear that 3.2 is evaluating
>>>>>> every
>>>>>> rule.
>>>>>>
>>>>>> I changed one of the url_regex rules to just match one line rather
>>>>>> than
>>>>>> a file containing 307 lines to see if that made a difference, and it
>>>>>> made no significant difference. So this indicates to me that it's not
>>>>>> having to fully evaluate every rule (it's able to skip doing the regex
>>>>>> if the IP match doesn't work)
>>>>>>
>>>>>> I then changed all the acl lines that used hostnames to have IP
>>>>>> addresses in them, and this also made no significant difference
>>>>>>
>>>>>> I then changed all subnet matches to single IP address (just nuked /##
>>>>>> throughout the config file) and this also made no significant
>>>>>> difference.
>>>>>>
>>>>>
>>>>> Squid has always worked this way. It will *test* every rule from the
>>>>> top down to the one that matches. Also testing each line
>>>>> left-to-right until one fails or the whole line matches.
>>>>>
>>>>>>
>>>>>> so why are the address matches so expensive
>>>>>>
>>>>>
>>>>> 3.0 and older IP address is a 32-bit comparison.
>>>>> 3.1 and newer IP address is a 128-bit comparison with memcmp().
>>>>>
>>>>> If something like a word-wise comparison can be implemented faster
>>>>> than memcmp() we would welcome it.
>>>>
>>>> I wonder if there should be a different version that's used when IPv6
>>>> is disabled. this is a pretty large hit.
>>>>
>>>> if the data is aligned properly, on a 64 bit system this should still
>>>> only be 2 compares. do you do any alignment on the data now?
>>>>
>>>>>> and as noted in the e-mail below, why do these checks not scale nicely
>>>>>> with the number of worker processes? If they did, the fact that one
>>>>>> 3.2
>>>>>> process is about 1/3 the speed of a 3.0 process in checking the acls
>>>>>> wouldn't matter nearly as much when it's so easy to get an 8+ core
>>>>>> system.
>>>>>>
>>>>>
>>>>> There you have the unknown.
>>>>
>>>> I think this is a fairly critical thing to figure out.
>
>
Received on Mon Apr 25 2011 - 23:29:04 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 26 2011 - 12:00:03 MDT