Re: external acl cache level

From: Henrik Nordstrom <henrik@dont-contact.us>
Date: Tue, 23 May 2006 13:49:54 +0200

mån 2006-05-22 klockan 23:17 -0300 skrev Gonzalo Arana:

> Ah, I see now. As long as the helper & squid follow "lower level
> number, higher priority" policy, there is no need for cache
> invalidation.

Correct.

> > To be able to make sane lookup structures it is very beneficial if the
> > data can be structured in a path like structure. This worked out quite
> > okay except that there is acl types where the acl arguments (the data in
> > the acl statement) is more important than some request details
> > (external_acl_type format tags)...
>
> I may be wrong, but reordering is needed in those cases, which is why
> I proposed 'combining' key components: letting the helper specify
> which request-tokens may be used for caching this response.

Problem with supporting dynamic reordering like this (different from the
fixed cache levels) is that you then need to perform a 2^N lookup
instead of linear N. For 1-2 levels this is no difference, but
complexity quickly grows with the number of levels.

> Sure! Here is an example:
> external_acl_type useless %{Host} %| /some/helper some-argument
> acl yet_another_useless external useless %{Cookie} %| %{MYADDR}
>
> We could just demand that /some/helper should be aware of request
> levels (this is something you pointed out below). Sooner or later
> this will lead to confusions.

I honestly do not see a problem with that. Anyway we demand that the
helper knows which arguments it will get, which implies that the we
demand that the admin knows what external_acl_type definition should be
used.

But I am not a strong supporter of allowing format tags within the acl
data. These belong in the external_acl_type. Gets too messy otherwise
with high risk of misconfiguration I think, and I don't see many
practical situations where it would help.

The above should be expressed as

external_acl_type useless %{Host} %| %{Cookie} %| %{MYADDR} %| some/helper some-argument
acl yet_another_useless external useless YetMoreUselessInfo

> Options:
> 1) To expand '%|' to some string that we know it won't be present in
> any other tags. I fear no matter which string we choose for '%|'
> expansion; that string could be present in (for instance) Cookie
> request header.

I think it is better left outside of the helper protocol. If there
should be a change then perhaps adding a startup message where Squid
announces to the helper the protocol used, allowing the helper to verify
the configuration and alert the admin on errors..

> 2) As you proposed:
> > Another approach would be to mark the arguments per their key detail level.
> Unless I misunderstand this, you are proposing that each request could
> look something like this (I know that there are cleaner ways to do it,
> this is just an example):
> 1=localhost 1=blah1 2=user_xxx 3=1.1.1.1
> where each integer represent the key level.
> With this approach, key-component level is assigned by squid
> configuration, and is not per-request (which perhaps is what is
> wanted).

I was more thinking on allowing the cache levels to be disjoint. I.e.

level1 = %LOGIN
level2 = %HOST
level3 = %LOGIN %HOST
level4 = %LOGIN acldata

instead of just a linear path like today.

level1 = %LOGIN
level2 = %LOGIN %HOST
level3 = %LOGIN %HOST acldata

> 3) We could let external helper to decide key-component level by using
> something like XMLRPC or we could come up with our own protocol based
> in, say, HTTP.

Problem is again how to maintain the lookup cache in an efficient manner
if the levels and their priorities is not static.

> This encoding/protocol/structure (whatever this should be called)
> should add support for something like HTTP's Vary: in the response,
> the helper should indicate which components of the request were taken
> into consideration for building the reply.

After years of study I have yet to come up with any lookup structure
supporting HTTP's Vary in an efficient manner. It's a very complex
thing.

It's manageable if one assumes there is a single Vary specification per
URL (which translates to external_acl_type in this discussion I would
say), but the HTTP specifications do not really place this limitation so
servers can respond with different Vary specifications in different
responses of the same URL making the problem explode.. (and making heads
explode when trying to explain the cache effects of doing so..)

> Let me see If I follow correctly: with %DATA you can switch the order
> of the arguments to external_acl, right? So you can make acl
> arguments have higher priorities than external_acl formats.

Yes.

> > Problem: %DATA have a slight problem with whitespace characters if the
> > helper is to handle arguments with whitespace AND multiple arguments in
> > the same acl type.. as currently written they both looks the same in the
> > %DATA expansion.. (a space character, appropriately encoded per the
> > helper protocol).
>
> we seem to fall into "some higher level structure is needed" again.
> Mainly because the external helper is needed to tell squid which
> arguments have been used ("combining" approach).

I am not sure we need to care about these corner cases.
external_acl_type is a protocol specification. It needs to be correct
for the helper to operate proper. Adding some more characters to the
specification doesn't make this much more complex in the general case as
the external_acl_type definition understood by the helper needs to be in
the helper documentation, and any sane admin copy-pastes it from there
instead of trying to guess how the helper operates.

But some structure may be needed for Squid to tell the helper which
argument is what. See below.

> I vote -1 for this, basically it is a headache-maker.

Is it?

How about this solution: Move the format specification from squid.conf
to the helper, allowing Squid to query the helper on startup on what
format the helper expects. This includes cache level definitions (if
used by the helper).

and if the option to manually specify is kept all is gained and nothing
is lost. Main problem with the automatic config query is that it doesn't
allow easy reuse of a helper for a different purpose than it was
originally intended (i.e. using %SRC instead of %LOGIN etc..), but the
benefits in ease of configuration is significant, especially when adding
cache levels...

> Unless we let
> each 'token' of the line sent to external helper to be a 'level'.
> This would lead to potentially more hash_lookup calls (which should be
> fast anyway).

Doable, but as the cache level applications are rather specialized I
think there is benefit in having them explicitly defined. Most helpers
won't be using this, and the ones who do may have relatively large
groups of arguments always going together.

> > With the lack of %DATA above this approach fails if the data from the
> > acl is more important than some request details.
>
> Reordering is needed in these cases. "Combining" provides "reordering".

I defenitely vote -1 to a non-linear cache lookup where the helper can
dynamically decide per request which of the request parameters was
relevant in the query without a prior agreement between the helper and
Squid on which divisions exists and their relative priorities.

> > Another approach would be to mark the arguments per their key detail
> > level. With this approach %DATA is not needed as the request parameters
> > do not need to be sorted on their detail level and could even be
> > extended into alternate priorities. However it shares the first problem
> > above (if it is a problem..).
>
> I don't understand why this shares the first problem (about external
> helper being aware of the key levels).

Simply because it still doesn't indicate to the helper that Squid is
aware about the different cache levels. No change in helper
communication, just reordering of the arguments to have the acl specific
data earlier..

> key detail level markup could be good; but the result of any format we
> choose for indicating key level could be the result of some of the
> tags (%{Cookie}, %{EXT_USER}).

As indicated above I wasn't thinking about communicating this to the
helper, only for internal use within Squid to define how the request
arguments maps to different cache levels.

> Personally, I am against using complex structures like XML/HTTP-alike
> for the 'level' idea, mainly because it make helpers more complex and
> more CPU/RAM intensive, and the hole idea is to keep them small &
> simple. If providing 'cache level' along with 'reordering' makes
> things too complex, sysadmin can always fallback to just use more than
> one external acl (each external acl for each possible 'level').

My thinking as well.

> I am aware that if we implement key-token-combining, we might need to
> perform 2**N hash_lookup calls (where N is the number of tokens
> present in the external acl key). If the 'combining' idea wins, this
> O(2**N) issue must be tackled (using another data structure seems the
> most promising alternative).

I am still not convinced this dynamics is needed, and not at all sure it
would make helper implementation any easier compared to explicit
definitions. Also, as with HTTP Vary priority of conflicting results
with different combinations gets interesting... (which means increased
risk of unexpected results)

Regards
Henrik

Received on Tue May 23 2006 - 05:50:49 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Jun 01 2006 - 12:00:04 MDT