Re: protocol clarity

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Tue, 10 Oct 2006 22:47:02 -0600

On Wed, 2006-10-11 at 11:54 +1000, Robert Collins wrote:
> > > http://wiki.squid-cache.org/ForwardRework
> I haven't considered icap deeply. In terms of forwarding icap is
> mostly/nearly orthogonal. There are things we handle that are not icap
> interceptable - but perhaps they should be. (For instance, internal
> static urls).

IMHO, an ideal implementation would allow to filter through ICAP
everything that internally resembles an HTTP message and goes into Squid
or out of Squid. I believe you have stated a similar ideal. The only way
to do that with our more than limited resources is to provide a single
way to hookup ICAP into an HTTP message processing stream, whatever that
stream is.

Today, adding hookups is too much work; too many "similar but different"
places need to be changed. However, if you are rewriting how messages
are represented and shoveled around, you have a fighting chance of
creating necessary protocol-, source-, and destination-agnostic places
to hookup ICAP.

[lot of good stuff snipped]

I agree that the canonization of HTTP-like message representation is the
right direction, but am not qualified to comment on how to get there
from what we have now or on the details of that interface.

> In terms of data copying, if the icap adaption says 'no change', you can
> just hand the body object over, or have a decorator object that uses the
> original one with no copying.

FWIW, this approach does not work well in the current code because

(a) There is no "body object" to hand over. Object body is represented
by a few data structures _and_ by the complex stateful code that handles
them (e.g., http.cc). The body state and metainfo is not encapsulated in
one object/place.

(b) The virgin message carries not just the headers and body state, but
a bunch of protocol and vectoring-point specific attributes that will
often become stale or inapplicable after passing through ICAP, even if
ICAP performed no message modification. For example, the metadata about
"connection" will be wrong if the HTTP connection was long gone while
the ICAP was thinking whether to adapt the response.

I do not know if your changes will help with the above, but it would be
nice if they did :-)

> I have to stop here, but I think the shape is clear - what do you
> think,does it sound doable, are there holes ?

It all sounds doable, but I can only comment on ICAP needs (I do not
remember or know enough about the rest of the code). Here are few
additional ICAP-related things that I hope you would keep in mind when
designing your changes:

1) ICAP request satisfaction: ICAP should be able to take a virgin HTTP
message and return an HTTP response to that message. The code should not
assume that ICAP is only a "pass through" mechanism or that all HTTP
responses originally came from an HTTP server/peer or store.

2) ICAP bypass: When ICAP fails to process a message (e.g., when ICAP
service is down), it would be wonderful if ICAP-transaction-independent
code would be able to recover and go on as if ICAP was not in the loop
(if still possible and allowed by Squid configuration). I am guessing
that would be somewhat similar to recovering from a peer failure.

3) ICAP buffers: Ideally, Squid client and server sides would pump data
through ICAP without copying the data when ICAP is not modifying it. A
logging-only ICAP server that wants to look at most message bodies is a
good example to consider here. HTTP client, server, and ICAP speeds may
all differ, making the above far from trivial.

4) ICAP chains: There may be more than one ICAP service interested in
processing a given message. Ideally, the ICAP hookup interface will take
care of multiple services instead of making an individual ICAP
transaction concerned with ICAP transactions before or after it. This
aspect complicates all of the above items!

5) ICAP encoding: Any HTTP transfer encodings and hop-by-hop headers
must be removed before feeding messages to ICAP servers. Perhaps this
removal should _not_ be done by ICAP because of #4 and, perhaps, HTTP
caching reasons.

Finally, there are other places in Squid that might have some needs
similar to ICAP because they look at and modify messages on-the-fly.
Proxy-side includes is one example of an "internal ICAP server", I
guess.

HTH,

Alex.
P.S. I wonder if FreeBSD netgraph API is worth studying in this context!
Received on Tue Oct 10 2006 - 22:47:47 MDT

This archive was generated by hypermail pre-2.1.9 : Wed Nov 01 2006 - 12:00:06 MST