Re: [squid-users] squid 3.2.0.5 smp scaling issues

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 09 Apr 2011 21:09:22 +1200

On 09/04/11 14:27, david_at_lang.hm wrote:
> A couple more things about the ACLs used in my test
>
> all of them are allow ACLs (no deny rules to worry about precidence of)
> except for a deny-all at the bottom
>
> the ACL line that permits the test source to the test destination has
> zero overlap with the rest of the rules
>
> every rule has an IP based restriction (even the ones with url_regex are
> source -> URL regex)
>
> I moved the ACL that allows my test from the bottom of the ruleset to
> the top and the resulting performance numbers were up as if the other
> ACLs didn't exist. As such it is very clear that 3.2 is evaluating every
> rule.
>
> I changed one of the url_regex rules to just match one line rather than
> a file containing 307 lines to see if that made a difference, and it
> made no significant difference. So this indicates to me that it's not
> having to fully evaluate every rule (it's able to skip doing the regex
> if the IP match doesn't work)
>
> I then changed all the acl lines that used hostnames to have IP
> addresses in them, and this also made no significant difference
>
> I then changed all subnet matches to single IP address (just nuked /##
> throughout the config file) and this also made no significant difference.
>

Squid has always worked this way. It will *test* every rule from the top
down to the one that matches. Also testing each line left-to-right until
one fails or the whole line matches.

>
> so why are the address matches so expensive
>

3.0 and older IP address is a 32-bit comparison.
3.1 and newer IP address is a 128-bit comparison with memcmp().

If something like a word-wise comparison can be implemented faster than
memcmp() we would welcome it.

> and as noted in the e-mail below, why do these checks not scale nicely
> with the number of worker processes? If they did, the fact that one 3.2
> process is about 1/3 the speed of a 3.0 process in checking the acls
> wouldn't matter nearly as much when it's so easy to get an 8+ core system.
>

There you have the unknown.

>
> it seems to me that all accept/deny rules in a set should be able to be
> combined into a tree to make searching them very fast.
>
> so for example if you have
>
> accept 1
> accept 2
> deny 3
> deny 4
> accept 5
>
> you need to create three trees (one with accept 1 and accept 2, one with
> deny3 and deny4, and one with accept 5) and then check each tree to see
> if you have a match.
>
> the types of match could be done in order of increasing cost, so if you

The config file is specific structure configured by admin under
guaranteed rules of operation for access lines (top-down, left-to-right,
first-match-wins) to perform boolean-logic calculations using ACL sets.
  Sorting access line rules is not an option.
  Sorting ACL values and tree-forming them is already done (regex being
the one exception AFAIK).
  Sorting position-wise on a single access line is also ruled out by
interactions with deny_info, auth and external ACL types.

> have acl entries of type port, src, dst, and url regex, organize the
> tree so that you check ports first, then src, then dst, then only if all
> that matches do you need to do the regex. This would be very similar to
> the shortcut logic that you use today with a single rule where you bail
> out when you don't find a match.
>
> you could go with a complex tree structure, but since this only needs to
> be changed at boot time,

Um, "boot"/startup time and arbitrary "-k reconfigure" times.
With a reverse-configuration display dump on any cache manager request.

> it seems to me that a simple array that you can
> do a binary search on will work for the port, src, and dst trees. The
> url regex is probably easiest to initially create by just doing a list
> of regex strings to match and working down that list, but eventually it

This is already how we do these. But with a splay tree instead of binary.

> may be best to create a parse tree so that you only have to walk down
> the string once to see if you have a match.

That would be nice. Care to implement?
  You just have to get the regex library to adjust its pre-compiled
patterns with OR into (existing|new) whenever a new pattern string is
added to an ACL.

>
> you wouldn't quite be able to get this fast as you would have to
> actually do two checks, one if you have a match on that level and one
> for the rules that don't specify something in the current tree (one
> check for if the http_access line specifies a port number and one for if
> it doesn't for example)

We get around this problem by using C++ types. ACLChecklist walks the
tree holding the current location, expected result, and all details
available about the transaction. Each node in the tree has a match()
function which gets called at most once per walk. Each ACL data type
provides its own match() algorithm.

That is why the following config is invalid:
  acl foo src 1.2.3.4
  acl foo port 80

>
> this sort of acl structure would reduce a complex ruleset down to ~O(log
> n) instead of the current O(n) (a really complex ruleset would be log n
> of each tree added togeather)
>
> there are cases where this sort of thing would be more expensive than
> the current, simple approach, but those would be on simple rulesets
> which aren't affected much by a few extra checks.

Um, You have pretty much described our existing code. Even with the
details of *how* hidden away the small bit exposed to admin is fairly
complex.

  I am thinking you did not understand ACLs very well earlier when
designing your config rules. With large rulsets that can easily lead to
an inefficient config and the worst-case results you seem to have achieved.
  If you care to share your actual configuration file contents I'm happy
to read through and point out any optimizations that can be made.
  Though you may want to use the above info and see if you can find any
optimizations first.

The ACL storage types are all defined in the src/acl/*Data.* source
files. If you wish to work on finding us some faster types or even
faster matching algorithm for an existing type that would be welcome.
  We do ask for some unit/micro-benchmarks of the old vs new match()
function so we know the change is an actual improvement.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.6
Received on Sat Apr 09 2011 - 09:09:40 MDT

This archive was generated by hypermail 2.2.0 : Sat Apr 09 2011 - 12:00:02 MDT