Re: [PATCH] ICAP service chains and sets

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 28 Jun 2009 17:41:44 +1200

Alex Rousskov wrote:
> On 06/27/2009 06:28 AM, Amos Jeffries wrote:
>>>> Limit adaptation iterations to adaptation_service_iteration_limit to
>>>> protect
>>>> Squid from infinite adaptation loops caused by ICAP services constantly
>>>> including themselves in the dynamic adaptation chain they request.
>>>> When the
>>>> limit is exceeded, the master transaction fails. The default limit of 16
>>>> should be large enough to not require an explicit configuration in most
>>>> environments yet may be small enough to limit side-effects of loops.
>
>> I see your comment at src/adaptation/AccessCheck.cc: ~156
>> TODO: should we shift and checkCandidates if and only if (!g) ?!
>>
>> If the TODO is about the case of self-reference you mention above then I
>> would say yes we must. To drop such loops into error condition as
>> quickly as possible. I understand the limit needs to be kept anyway for
>> multi-stage loops. But losing the loop early saves a lot of resources.
>
> No, this is different. I have not modified AccessCheck much. Its overall
> logic is as follows:
>
> 1. AccessCheck::check: Identify service groups that may be applicable
> to the virgin message. Start checking candidates by calling
> checkCandidates().
>
> 2. AccessCheck::checkCandidates: Get the top candidate and spawn a
> non-blocking ACL check for it. This ends this call. The non-blocking
> check will call us back in step #3.
>
> 3. AccessCheck::noteAnswer: If there is no match, remove the top
> candidate and go back to step #2. If there is a match, go to step #4.
>
> 4. AccessCheck::do_callback: Check that the top candidate service is
> still there and send it (or a NULL answer) to whoever requested the
> adaptation. Remove the top candidate from the list. Stop the job.
>
> You may have noticed that step #4 always terminates the search. Yet, we
> may not have found a valid service (e.g., due to a reconfiguration) and
> still have candidate services left to check. I added a TODO because it
> smelled fishy to me when I looked at the code, but my focus was not on
> changing how adaptation ACLs are processed so I left the code "as is".
>
> Your question prompted me to investigate this further and I am convinced
> there is a bug. If not everything is peachy in step #4, we must go back
> to step #2 again. I have now fixed this in my sources and polished
> surrounding code a little (there were other small problems). The TODO is
> done. The patch is attached.
>
> This bug is difficult to reproduce because the service must disappear
> while an ACL check is pending and it will only have an effect for checks
> pending before reconfigure; future checks will work OK again. It is
> probably possible to time an appropriate reconfiguration just right, but
> I have not tried to do that.
>
>
>> Also, should we prevent circular loops by way of preventing a filter
>> being run twice over the same request? There have to be few cases where
>> a single adaptation step needs to be repeated exactly.
>
> It would be nice to warn about the same service appearing twice in a set
> or chain (a patch for that is attached and I added it to my sources).
>
> I would not try to forcefully prevent duplicates though because it is
> not illegal and might be used in practice for "try me twice" poor-man
> backup implementations or for services that do different things
> depending on the message state but cannot do it all at once. I do agree
> that these situations are unusual (but very handy for testing! :-).
>

Okay. Fair enough.

>
>> IMO we are finding more and more reasons to add an ERR_*_CONFIG page :)
>
> In most cases, I would just quit (with an error message) if a
> configuration error is discovered during startup. However, I was
> overruled on that by the "start at any cost" crowd.
>
> Errors that are discovered at runtime should probably generate a fairly
> generic response with an error "code" or similar information for admins
> to match to cache.log entries and documentation. Users do not benefit
> from knowing that a Squid cache has a misconfigured ICAP service and
> many admins would not want to disclose that fact to end users anyway.
>

They have a point. We don't _have_ to make every config error fatal. But
we do need some way to make the admin aware of it such that they have
incentive to fix fast. Erros in cache.log are doing a lot but not enough.
Errors in their users faces saying they did wrong is a great incentive
to get it right, and a useful debug tool during pre-rollout tests.

But this is outside your patch scope. I should be starting a different
thread on this...

>
>>>> Fixed "canonical" Request URL maintenance when ICAP clones requests.
>>>> TODO: The urlCanonical() must become HttpRequest::canonical(), hiding
>>>> the
>>>> often out-of-sync canonical data member.
>> IMO: TODO: url cleanup bug needs closing to handle canonical and all
>> other URL display cases properly behind a single URL object. Requires
>> decent string code though or memory duplicates will blow out of
>> proportion to usefulness.
>
> Yes, those are two separate issues: cache consistency (my TODO) and
> cache storage implementation (TODO: add a URL or URI class). My TODO is
> much easier to implement, but was outside of this project scope.
>
>> I took a look at the patch, and decided to take your word that it works :)
>
> Yeah, it is difficult to find macro-level bugs in this code. It does
> work in my tests. I do not think we can get much further without wider
> testing.
>
> Should I interpret your comment as "I am not going to vote on this one"?

Um, well. I wrote that before deciding to read the code again for a
'quick' audit.

>
>> Please reference and close
>> http://www.squid-cache.org/bugs/show_bug.cgi?id=2087
>> in the commit if/when this goes in.
>
> I will try to remember that.
>
>
>> Some stuff I found in the code:
>>
>> in src/HttpRequest.cc please add brackets to make the code more readable:
>> "loggingNeedsHistory = LogfileStatus == LOG_ENABLE &&
>> alLogformatHasAdaptToken;"
>
> Done:
>
> const bool loggingNeedsHistory = (LogfileStatus == LOG_ENABLE) &&
> alLogformatHasAdaptToken;
>
>> in src/adaptation/AccessCheck.cc you create a ServiceFilter via:
>> "new AccessCheck( ServiceFilter(method, vp, req, rep), cb, cbdata);"
>> which takes a copy by & reference. The ServiceFilter locks the given
>> request and reply but I can't see where the ServiceFilter is deleted (&
>> references cannot be and the object is not RefCounted.)
>> - I expect this to lead to memory leaks. Possibly big ones.
>
> ServiceFilter instances in this context are not allocated on the heap.
> Thus, we do not (and cannot) delete them. The compiler generates code to
> destruct the temporary variable and the class data member. Did I
> misunderstood your concern?

Ah right. I had forgotten that C++ trick you seem to like using.
This works then despite the stack changes during AccessCheck callbacks?

Remember we still have not gotten to the bottom of that 10MB/minute
memory leak near the last one of these I noticed. Personally I'd rather
go the long way with new/delete and RefCount to be sure of the copies
than use this trick.

>
>> in adaptation/icap/ModXact.cc when testing for HDR_X_NEXT_SERVICES I
>> think it would be worth warning if a non-routable service attempted to
>> hand back a routing header. How to do it efficiently may be a problem
>> though.
>
> I do not think there would be a critical performance impact if we check
> using the current code. The difference between a hash and a simple array
> with linear search ought to be small when you are dealing with less than
> 10 integers.
>
> Another problem, IMO, is how not to warn on every response (which is far
> more expensive operation than searching). I wanted to port a
> warn-once-in-a-while facility from Polygraph to Squid for a long time...
>
> Added TODO.

Good point.

>
>
>> This TODO needs to be enacted rather soon. We are aiming for performance
>> in 3.2...
>> // TODO: use HttpMsg::clone()!
>
> Agreed. A usable clone() method has been available at least since the
> eCAP addition. This is not a new TODO or a new problem. This is a nice,
> small-context project for volunteers :-).
>

:-)

>
>> On the side; this has brought to my attention the adaptation/forward.h.
>> Can we not call that adaptation/Adaptation.h instead? the forward.h
>> breaks CamelCase and will clash with src/forward.h on those recently
>> vocal compilers.
>
> "Forward" is kind of standard (e.g., <iosfwd>) and more intuitive, IMO:
> We are providing forward declarations to avoid too many unimportant
> dependencies via header files. I would rather move or rename
> src/forward.h :-)

Fair point. It's not currently causing issue for most places. So its out
of scope here if we go that way.

>
>> notes.dox ***
>> \section label takes two parameters: an ID then the text title.
>
> Fixed.
>
>> mixing \b and the <b></b> equivalents on alternate lines looks a bit
>> mucky. Can we at least do it consistently within one document?
>
> I have no idea where I copied that ugly mix from :-). Replaced all with
> <b></b>.
>
>
>> Thats it from me. Just ... no more huge project for a few decades, yes? ;)
>
> You call this huge?! I spent more time on StringNg. ;-)
>
>
> Please look at the enhanced logging patch as well. This work depends on
> that patch. If the changes are approved, the enhanced logging stuff must
> go into the trunk first.
>

I'm up to that now. This updated patch is a vote yes from me.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
   Current Beta Squid 3.1.0.9
Received on Sun Jun 28 2009 - 05:41:57 MDT

This archive was generated by hypermail 2.2.0 : Sun Jun 28 2009 - 12:00:06 MDT