Re: [PATCH] ICAP service chains and sets

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 27 Jun 2009 16:44:37 -0600

On 06/27/2009 06:28 AM, Amos Jeffries wrote:
>>> Limit adaptation iterations to adaptation_service_iteration_limit to
>>> protect
>>> Squid from infinite adaptation loops caused by ICAP services constantly
>>> including themselves in the dynamic adaptation chain they request.
>>> When the
>>> limit is exceeded, the master transaction fails. The default limit of 16
>>> should be large enough to not require an explicit configuration in most
>>> environments yet may be small enough to limit side-effects of loops.

> I see your comment at src/adaptation/AccessCheck.cc: ~156
> TODO: should we shift and checkCandidates if and only if (!g) ?!
>
> If the TODO is about the case of self-reference you mention above then I
> would say yes we must. To drop such loops into error condition as
> quickly as possible. I understand the limit needs to be kept anyway for
> multi-stage loops. But losing the loop early saves a lot of resources.

No, this is different. I have not modified AccessCheck much. Its overall
logic is as follows:

  1. AccessCheck::check: Identify service groups that may be applicable
to the virgin message. Start checking candidates by calling
checkCandidates().

  2. AccessCheck::checkCandidates: Get the top candidate and spawn a
non-blocking ACL check for it. This ends this call. The non-blocking
check will call us back in step #3.

  3. AccessCheck::noteAnswer: If there is no match, remove the top
candidate and go back to step #2. If there is a match, go to step #4.

  4. AccessCheck::do_callback: Check that the top candidate service is
still there and send it (or a NULL answer) to whoever requested the
adaptation. Remove the top candidate from the list. Stop the job.

You may have noticed that step #4 always terminates the search. Yet, we
may not have found a valid service (e.g., due to a reconfiguration) and
still have candidate services left to check. I added a TODO because it
smelled fishy to me when I looked at the code, but my focus was not on
changing how adaptation ACLs are processed so I left the code "as is".

Your question prompted me to investigate this further and I am convinced
there is a bug. If not everything is peachy in step #4, we must go back
to step #2 again. I have now fixed this in my sources and polished
surrounding code a little (there were other small problems). The TODO is
done. The patch is attached.

This bug is difficult to reproduce because the service must disappear
while an ACL check is pending and it will only have an effect for checks
pending before reconfigure; future checks will work OK again. It is
probably possible to time an appropriate reconfiguration just right, but
I have not tried to do that.

> Also, should we prevent circular loops by way of preventing a filter
> being run twice over the same request? There have to be few cases where
> a single adaptation step needs to be repeated exactly.

It would be nice to warn about the same service appearing twice in a set
or chain (a patch for that is attached and I added it to my sources).

I would not try to forcefully prevent duplicates though because it is
not illegal and might be used in practice for "try me twice" poor-man
backup implementations or for services that do different things
depending on the message state but cannot do it all at once. I do agree
that these situations are unusual (but very handy for testing! :-).

> IMO we are finding more and more reasons to add an ERR_*_CONFIG page :)

In most cases, I would just quit (with an error message) if a
configuration error is discovered during startup. However, I was
overruled on that by the "start at any cost" crowd.

Errors that are discovered at runtime should probably generate a fairly
generic response with an error "code" or similar information for admins
to match to cache.log entries and documentation. Users do not benefit
from knowing that a Squid cache has a misconfigured ICAP service and
many admins would not want to disclose that fact to end users anyway.

>>> Fixed "canonical" Request URL maintenance when ICAP clones requests.
>>> TODO: The urlCanonical() must become HttpRequest::canonical(), hiding
>>> the
>>> often out-of-sync canonical data member.
>
> IMO: TODO: url cleanup bug needs closing to handle canonical and all
> other URL display cases properly behind a single URL object. Requires
> decent string code though or memory duplicates will blow out of
> proportion to usefulness.

Yes, those are two separate issues: cache consistency (my TODO) and
cache storage implementation (TODO: add a URL or URI class). My TODO is
much easier to implement, but was outside of this project scope.

> I took a look at the patch, and decided to take your word that it works :)

Yeah, it is difficult to find macro-level bugs in this code. It does
work in my tests. I do not think we can get much further without wider
testing.

Should I interpret your comment as "I am not going to vote on this one"?

> Please reference and close
> http://www.squid-cache.org/bugs/show_bug.cgi?id=2087
> in the commit if/when this goes in.

I will try to remember that.

> Some stuff I found in the code:
>
> in src/HttpRequest.cc please add brackets to make the code more readable:
> "loggingNeedsHistory = LogfileStatus == LOG_ENABLE &&
> alLogformatHasAdaptToken;"

Done:

  const bool loggingNeedsHistory = (LogfileStatus == LOG_ENABLE) &&
    alLogformatHasAdaptToken;

> in src/adaptation/AccessCheck.cc you create a ServiceFilter via:
> "new AccessCheck( ServiceFilter(method, vp, req, rep), cb, cbdata);"
> which takes a copy by & reference. The ServiceFilter locks the given
> request and reply but I can't see where the ServiceFilter is deleted (&
> references cannot be and the object is not RefCounted.)
> - I expect this to lead to memory leaks. Possibly big ones.

ServiceFilter instances in this context are not allocated on the heap.
Thus, we do not (and cannot) delete them. The compiler generates code to
destruct the temporary variable and the class data member. Did I
misunderstood your concern?

> in adaptation/icap/ModXact.cc when testing for HDR_X_NEXT_SERVICES I
> think it would be worth warning if a non-routable service attempted to
> hand back a routing header. How to do it efficiently may be a problem
> though.

I do not think there would be a critical performance impact if we check
using the current code. The difference between a hash and a simple array
with linear search ought to be small when you are dealing with less than
10 integers.

Another problem, IMO, is how not to warn on every response (which is far
more expensive operation than searching). I wanted to port a
warn-once-in-a-while facility from Polygraph to Squid for a long time...

Added TODO.

> This TODO needs to be enacted rather soon. We are aiming for performance
> in 3.2...
> // TODO: use HttpMsg::clone()!

Agreed. A usable clone() method has been available at least since the
eCAP addition. This is not a new TODO or a new problem. This is a nice,
small-context project for volunteers :-).

> On the side; this has brought to my attention the adaptation/forward.h.
> Can we not call that adaptation/Adaptation.h instead? the forward.h
> breaks CamelCase and will clash with src/forward.h on those recently
> vocal compilers.

"Forward" is kind of standard (e.g., <iosfwd>) and more intuitive, IMO:
We are providing forward declarations to avoid too many unimportant
dependencies via header files. I would rather move or rename
src/forward.h :-)

> notes.dox ***
> \section label takes two parameters: an ID then the text title.

Fixed.

> mixing \b and the <b></b> equivalents on alternate lines looks a bit
> mucky. Can we at least do it consistently within one document?

I have no idea where I copied that ugly mix from :-). Replaced all with
<b></b>.

> Thats it from me. Just ... no more huge project for a few decades, yes? ;)

You call this huge?! I spent more time on StringNg. ;-)

Please look at the enhanced logging patch as well. This work depends on
that patch. If the changes are approved, the enhanced logging stuff must
go into the trunk first.

Thank you,

Alex.

Received on Sat Jun 27 2009 - 22:44:51 MDT

This archive was generated by hypermail 2.2.0 : Sun Jun 28 2009 - 12:00:06 MDT