Re: [PATCH] ICAP service chains and sets

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 28 Jun 2009 00:28:24 +1200

Alex Rousskov wrote:
> Hello,
>
> Please consider the following changes for Squid3 trunk inclusion.
> They have been been tested in the lab and will be put in production. The
> ICAP chains feature (i.e., a "pipeline" of ICAP services) has been on
> many wish lists. ICAP service sets allow for backup ICAP servers which
> are typical in busy/critical adaptation environments.
>
> I compressed the 90KB patch (am I being too paranoid?), but the attached
> text file shows squid.conf documentation describing the features listed
> below. I have also quoted relevant commit messages that may help with
> code understanding. The patch includes adaptation notes.dox file with
> developer-centric documentation.

Not too paranoid, this one got through to me :)

>
> If approved, I will try to do a "bzr merge" instead of a raw patch to
> preserve commit messages, using a Robert-provided trick (thanks, Robert!).
>
> This work depends on the enhanced logging features submitted previously.
>
> Thank you,
>
> Alex.
> bb:approve
>
> --------------------------------
> Support adaptation sets and chains, including dynamic ICAP chains.
>
> - Support adaptation service sets and chains
> (adaptation_service_set and adaptation_service_chain)
>
> - Dynamically form chains based on ICAP X-Next-Services header
> (icap_service routing=on)
> ------------------------------------
>
>> Support adaptation service sets and chains.
>>
>> An adaptation service set contains similar, interchangeable services. No more
>> than one service is successfully applied. If one service is down or fails,
>> Squid can use another service. Think "hot standby" or "spare" ICAP servers.
>>
>> Sets may seem similar to the existing "service bypass" feature, but they allow
>> the failed adaptation to be retried and succeed if a replacement service is
>> available. The services in a set may be all optional or all essential,
>> depending on whether ignoring the entire set is acceptable. The mixture of
>> optional and essential services in a set is supported, but yields results that
>> may be difficult for a human to anticipate or interpret. Squid warns when it
>> detects such a mixture.
>>
>> When performing adaptations with a set, failures at a service (optional or
>> essential, does not matter) are retried with a different service if possible.
>> If there are no more replacement services left to try, the failure is treated
>> depending on whether the last service tried was optional or essential: Squid
>> either tries to ignore the failure and proceed or terminates the master
>> transaction.
>>
>>
>> An adaptation chain is a list of different services applied one after another,
>> forming an adaptation pipeline. Services in a chain may be optional or
>> essential. When performing adaptations, failures at an optional service are
>> ignored as if the service did not exist in the chain.
>>
>> Request satisfaction terminates the adaptation chain.
>>
>>
>> When forming a set or chain for a given transaction, optional down services
>> are ignored as if they did not exist.
>>
>> ICAP and eCAP services can be mixed and matched in an adaptation set or chain.
>>
>>
>>
>> * Implementation notes
>>
>> The notes below focus on _changes_. Adaptation terminology and current layers
>> are now being documented in src/adaptation/notes.dox
>>
>> Service sets and chains are implemented as ServiceGroup class kids. They are
>> very similar in most code aspects. The primary external difference is that
>> ServiceSet can "replace" a service and ServiceChain can find the "next"
>> service. The internal search code is implemented in ServiceGroup parent and
>> is parametrized by the kids.
>>
>> Before the adaptation starts, Squid calculates the adaptation "plan", which is
>> just an iterator into the ServiceGroup. The client- and server-side adaptation
>> initiators used to deal with Service pointers. They now deal with ServiceGroup
>> pointers. The only interesting difference is that a ServiceGroup does not have
>> a notion of being optional or essential. Thus, if adaptation start fails, we
>> do not know whether the failure can be bypassed. Fortunately, starting an
>> adaptation does not require anything that depends on the adaptation services,
>> so we now simply assert that the start succeeds.
>>
>> If the entire adaptation fails, the callers are notified as before. They are
>> told whether they can ignore the failure as before. No changes there.
>>
>> A new Adaptation::Iterator class has been added to execute the adaptation
>> plan. That class is responsible for iterating the services in a service group
>> until the plan is exhausted or cannot progress due to a final failure.
>
>
>> Dynamically form adaptation chains based on the ICAP X-Next-Services header.
>>
>> If an ICAP service with the routing=1 option in squid.conf returns an ICAP
>> X-Next-Services response header during a successful REQMOD or RESPMOD
>> transaction, Squid abandons the original adaptation plan and forms a new
>> adaptation chain consisting of services identified in the X-Next-Services
>> header value (using a comma-separated list of adaptation service names from
>> squid.conf). The dynamically created chain is destroyed once the new plan is
>> completed or replaced.
>>
>> This feature is useful when a custom adaptation service knows which other
>> services are applicable to the message being adapted.
>>
>> Limit adaptation iterations to adaptation_service_iteration_limit to protect
>> Squid from infinite adaptation loops caused by ICAP services constantly
>> including themselves in the dynamic adaptation chain they request. When the
>> limit is exceeded, the master transaction fails. The default limit of 16
>> should be large enough to not require an explicit configuration in most
>> environments yet may be small enough to limit side-effects of loops.

I see your comment at src/adaptation/AccessCheck.cc: ~156
   TODO: should we shift and checkCandidates if and only if (!g) ?!

If the TODO is about the case of self-reference you mention above then I
would say yes we must. To drop such loops into error condition as
quickly as possible. I understand the limit needs to be kept anyway for
multi-stage loops. But losing the loop early saves a lot of resources.

Also, should we prevent circular loops by way of preventing a filter
being run twice over the same request? There have to be few cases where
a single adaptation step needs to be repeated exactly.

IMO we are finding more and more reasons to add an ERR_*_CONFIG page :)

>>
>> TODO: Add metadata support to eCAP API and honor X-Next-Services there as
>> well. Currently, only ICAP services can form dynamic chains but the formed
>> chains may contain eCAP services.
>>
>>
>> Other improvements:
>>
>> Polished adaptation service configuration in squid.conf. Old format with an
>> anonymous bypass option is deprecated but still supported. Quit with a fatal
>> message if an adaptation service is misconfigured (debugging level-0 messages
>> do not seem to work at that stage, but that is probably another, general bug).

Aye general bug. Pre-configure startup hits it as well :(

>>
>>
>> Polished HttpRequest::adaptHistory() interface so that the code that knows the
>> history is needed can force history creation without complex
>> configuration-time preparations and state. Currently, all adaptation history
>> users but the logging-related ones know runtime whether the history must be
>> created (e.g., when a certain ICAP header is received).
>>
>>
>> Fixed "canonical" Request URL maintenance when ICAP clones requests.
>> TODO: The urlCanonical() must become HttpRequest::canonical(), hiding the
>> often out-of-sync canonical data member.

IMO: TODO: url cleanup bug needs closing to handle canonical and all
other URL display cases properly behind a single URL object. Requires
decent string code though or memory duplicates will blow out of
proportion to usefulness.

>>
>>
>> Fixed ICAP request parsing (for ICAP logging). We used to parse Request-Line
>> as if it were the first header. TODO: optimize by parsing only when needed.

I took a look at the patch, and decided to take your word that it works :)

Please reference and close
http://www.squid-cache.org/bugs/show_bug.cgi?id=2087
in the commit if/when this goes in.

Some stuff I found in the code:

in src/HttpRequest.cc please add brackets to make the code more readable:
  "loggingNeedsHistory = LogfileStatus == LOG_ENABLE &&
alLogformatHasAdaptToken;"

in src/adaptation/AccessCheck.cc you create a ServiceFilter via:
"new AccessCheck( ServiceFilter(method, vp, req, rep), cb, cbdata);"
which takes a copy by & reference. The ServiceFilter locks the given
request and reply but I can't see where the ServiceFilter is deleted (&
references cannot be and the object is not RefCounted.)
  - I expect this to lead to memory leaks. Possibly big ones.

in adaptation/icap/ModXact.cc when testing for HDR_X_NEXT_SERVICES I
think it would be worth warning if a non-routable service attempted to
hand back a routing header. How to do it efficiently may be a problem
though.

This TODO needs to be enacted rather soon. We are aiming for performance
in 3.2...
   // TODO: use HttpMsg::clone()!

On the side; this has brought to my attention the adaptation/forward.h.
Can we not call that adaptation/Adaptation.h instead? the forward.h
breaks CamelCase and will clash with src/forward.h on those recently
vocal compilers.

notes.dox ***
   \section label takes two parameters: an ID then the text title.
   mixing \b and the <b></b> equivalents on alternate lines looks a bit
mucky. Can we at least do it consistently within one document?

Thats it from me. Just ... no more huge project for a few decades, yes? ;)

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
   Current Beta Squid 3.1.0.9
Received on Sat Jun 27 2009 - 12:28:31 MDT

This archive was generated by hypermail 2.2.0 : Sun Jun 28 2009 - 12:00:06 MDT