Re: [squid-users] squid 3.2.0.5 smp scaling issues

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 10 Apr 2011 12:18:54 +1200

On 10/04/11 03:56, david_at_lang.hm wrote:
> On Sat, 9 Apr 2011, Amos Jeffries wrote:
>
>> On 09/04/11 14:27, david_at_lang.hm wrote:
<snip>
>>>
>>> so why are the address matches so expensive
>>>
>>
>> 3.0 and older IP address is a 32-bit comparison.
>> 3.1 and newer IP address is a 128-bit comparison with memcmp().
>>
>> If something like a word-wise comparison can be implemented faster
>> than memcmp() we would welcome it.
>
> I wonder if there should be a different version that's used when IPv6 is
> disabled. this is a pretty large hit.
>
> if the data is aligned properly, on a 64 bit system this should still
> only be 2 compares. do you do any alignment on the data now?

Not specifically. We just use the system types and compare macros. I
believe they are aligned sometimes and unaligned at others.

>
>>> and as noted in the e-mail below, why do these checks not scale nicely
>>> with the number of worker processes? If they did, the fact that one 3.2
>>> process is about 1/3 the speed of a 3.0 process in checking the acls
>>> wouldn't matter nearly as much when it's so easy to get an 8+ core
>>> system.
>>>
>>
>> There you have the unknown.
>
> I think this is a fairly critical thing to figure out.
>
>>>
>>> it seems to me that all accept/deny rules in a set should be able to be
>>> combined into a tree to make searching them very fast.
>>>
>>> so for example if you have
>>>
>>> accept 1
>>> accept 2
>>> deny 3
>>> deny 4
>>> accept 5
>>>
>>> you need to create three trees (one with accept 1 and accept 2, one with
>>> deny3 and deny4, and one with accept 5) and then check each tree to see
>>> if you have a match.
>>>
>>> the types of match could be done in order of increasing cost, so if you
>>
>> The config file is specific structure configured by admin under
>> guaranteed rules of operation for access lines (top-down,
>> left-to-right, first-match-wins) to perform boolean-logic calculations
>> using ACL sets.
>> Sorting access line rules is not an option.
>> Sorting ACL values and tree-forming them is already done (regex being
>> the one exception AFAIK).
>> Sorting position-wise on a single access line is also ruled out by
>> interactions with deny_info, auth and external ACL types.
>
> It would seem that as long as you don't cross boundries between the
> different types, you should be able to optimize within a group.
>
> using my example above, you couldn't combine the 'accept 5' with any of
> the other accepts, but you could combine accept 1 and 2 and combine deny
> 3 and 4 togeather.
>
> now, I know that I don't fully understand all the possible ACL types, so
> this may not work for some of them, but I believe that a fairly common
> use case is to have either a lot of allow rules, or a lot of deny rules
> as a block (either a list of sites you are allowed to access, or a list
> of sites that are blocked), so an ability to optimize these use cases
> may be well worth it.

? um. You seem to be talking about this kind of config (which are bad)

   acl A dstdomain .facebook.com
   acl B dstdomain .hotmail.com
   http_access allow A
   http_access allow B
   http_access deny all

Which (manually) can redux to this:
   acl A dstdomain .facebook.com
   acl A dstdomain .hotmail.com
   http_access allow A
   http_access deny all

<snip>
>>> it seems to me that a simple array that you can
>>> do a binary search on will work for the port, src, and dst trees. The
>>> url regex is probably easiest to initially create by just doing a list
>>> of regex strings to match and working down that list, but eventually it
>>
>> This is already how we do these. But with a splay tree instead of binary.
>
> I wondered about that. I've gotten splay tree related warning messages.

Those warnings are that the splay tree "compression" step has detected
two exactly overlapping matches. It will drop the one we think is the
worst, but has been known to go wrong in some rare cases and the step of
figuring it out is slowing down the reload. The longest/worst of the two
should be removed from the config.

>
>>> may be best to create a parse tree so that you only have to walk down
>>> the string once to see if you have a match.
>>
>> That would be nice. Care to implement?
>> You just have to get the regex library to adjust its pre-compiled
>> patterns with OR into (existing|new) whenever a new pattern string is
>> added to an ACL.
>
> I'm actually watching the libnorm project closely, with an intention of
> leveraging it for this sort of thing. It's a project being developed by
> the maintainer of rsyslog trying to efficiently match log entries. it
> doesn't support regex notation for defining it's rules at this point,
> but I've poked Rainer in that direction, so we'll see how things go.

Where can I find this project? the only thing available by that name
seems to be a multicast optimization protocol.

Even if it does plain-text compression and matching better than regex it
will be of some use. One of the common cases we have is people matching
the URL path prefix to map a web-app into a site directory.
  Full regex is overkill for that case, and worse it gets a lot of tests.

<snip>
>> The ACL storage types are all defined in the src/acl/*Data.* source
>> files. If you wish to work on finding us some faster types or even
>> faster matching algorithm for an existing type that would be welcome.
>> We do ask for some unit/micro-benchmarks of the old vs new match()
>> function so we know the change is an actual improvement.
>
> I don't know how much I will get a chance to do any coding on this (as
> opposed to being available to test anything that you folks try, testing
> I will make time for), but I definantly agree on the need to do benchmarks.
>
> again, thanks for the info.
>
> David Lang

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.6
Received on Sun Apr 10 2011 - 00:19:01 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 10 2011 - 12:00:04 MDT