Re: protocol clarity

From: Robert Collins <robertc@dont-contact.us>
Date: Wed, 11 Oct 2006 11:54:58 +1000

On Tue, 2006-10-10 at 09:24 -0600, Alex Rousskov wrote:
> On Sun, 2006-09-10 at 21:40 +1000, Robert Collins wrote:
> > So I've finished my analysis of the protocol stuff in squid - protocol_t
> > etc.
> >
> > http://wiki.squid-cache.org/ForwardRework has some thoughts about how we
> > can improve the current api, to make SSL fit in more cleanly, and make
> > doing some nifty things, like forwarding via ssh tunnels, or peers that
> > need a vpn etc, easy to do.
>
> Do you perceive any related significant changes to the ICAP code?

I haven't considered icap deeply. In terms of forwarding icap is
mostly/nearly orthogonal. There are things we handle that are not icap
interceptable - but perhaps they should be. (For instance, internal
static urls).

> Squid implements an ICAP client, but the ICAP implementation differs
> from other protocols because ICAP is not a "forward-and-forget" protocol
> or, at least, it cannot be implemented that way today (perhaps
> unfortunately).
>
> Today, the client- and server-side HTTP code pipes the HTTP message to
> ICAP and expects a new message to be piped back, possibly at the same
> time. Various ICAP aborts are rather difficult to recover from today. It
> would be nice if your scheme would help with that.
>
> Another issue is message data copying. Today, HTTP bodies are copied
> when they go to ICAP and are copied again when they go back to HTTP,
> even though the content is known to be identical (e.g., the ICAP server
> said 204 "No Content"). I am not sure whether this should be optimized
> on the MemBuf level (i.e., optimize copying itself via lazy copying and
> such) or on a protocol-handling level (i.e., remove the need for
> copying). The latter approach may seem better from the design point of
> view, but is far more difficult to implement with the existing code; we
> currently do not have sane means of sharing the same I/O buffer among
> three protocols (HTTP client, HTTP server, and ICAP client).

Typing out loud follows...

So lets start sketching out how these things interact. My scheme
identifies three separate sets of protocols:
listening protocols
handleable urls
forwardable protocols.

For instance, we have listening implementations for ICP, HTCP, HTTP,
HTTPS [and arguably one for interception ;)).

We handle urls for http/ftp/wais/gopher/https/connect[yes, its not http
kthxbye]/protocolless-aka-internal/urn/...

We can send any request to a http/https peer, and we can send to the
origin requests for ftp/http/https/gopher/wais/... [roughly].

So how/where does icap fit into this?

Ideally icap would be able to subvert any request/response on a
listening protocol, and any request/response made to a peer or origin.
For instance, it would be nice if icap could easily/automatically
intercept the urn scheme requests and hand those off to an adaptation
server (which might then give an early response, or leave it
unmodified), and as the response is recieved, also modify that - for
instance removing undesirable urls that were resolved to such as spam or
trojan links.

So there are four places to consider if I recall icap correctly: client
side in and out, and server side in and out. I dont /recall/
modification points for 'enter the store', 'leave the store' etc.

Client side in: All listening protocols perform some sort of parsing on
the incoming bytes to create a structured request. So far all our
listening protocols are message based protocols where there is no
chatting back-and-forth within a single request, but we need to fix that
to support 100 Continue properly. Even with 100 continue though, there
is clearly a stage where we have a logically defined request, which may
or may not have body data associated with in, and that body data is in
some coding such as octets. If all listening protocols were to have a
common callback function they call to get this logically defined request
handled within squid - be it ICP/HTCP/HTTP/HTTPS in origin - then there
would be a single point that ICAP can hook into to adapt the request. To
create this we'll need to rearrange some code ;). Right now I dont think
all requests end up creating HttpRequest objects, and in fact I think
that we should make a more minimal class such as 'ClientRequest' which
all ICPRequest, ClientHttpRequest, ClientHttpsRequest, HTCPRequest
classes would derive from. - That base class would then be the type
passed to this hypothetical common handler function. This would occer
before all the cache lookup in client side - it would be the very first
thing done after parsing.

Client side out: I dont recall ICAP's specific needs here : does it want
the bytestream of the native protocol, or does it want a canonical
http-like representation of the response ? I'm assuming a canonicalised
response [how would it deal with FTP otherwise ?]. Achieving this is
even less clear than client side in : many protocols deliver their data,
including headers, direct as octets from the originating code within
squid. To arrange for all outbound responses to get ICAP modified we
need to ensure that there is a clear step from 'logical response' to 'on
the wire in protocol X' - for instance, an HTTPS response to a client is
unusable for icap once its in byte form, but should be quite handleable
as a HttpResponse message with plain text spooling out of the store.
Assuming that this is correct, I think arranging for a clear
'serialisation happens here' step in each protocol, and having that be
called back with logical response by the handler, will allow for icap to
work nicely: the same function that diverts if needed to do incoming
adaption can just register itself as the serialiser for the request if
it is going to adapt the response [or wants the chance], and it can
arrange to forward to the real serialiser if/when its done.

Server side out: Similarly to client side out, but I have a patch that
gets us pretty close to having it nice already. AIUI we need to do icap
before the hand off to the actual ftp/http/wais/urn code happens. (if we
need to hand off after the protocol serialisation within squid, then
icap would be acting like a peer and it would be easier to just
configure the icap server as a peer).

And finally, server side in: we'll want to parse the response ourselves,
canonicalise it, and then hand it off to icap before sending data to the
store. So the icap engine should appear to the protocol specific layer
as the client to get the data.

In short, I think we want a structure something like the following:

ListeningProtocol:
 * parses requests to subclasses of ClientRequest. Canonically requests
have body objects to access more data, http-like headers, and something
to represent the 100 continue mechanism.
 * is configured with a protocol specific handler to hand off
ClientRequests to, and however many sockets or other resources it needs
to listen.
 * Can serialise a ClientResponse to bytes, including pulling body data
from the responses body and encoding it on the wire.

ClientRequest:
 * A logical client request
 * configured with a client protocol reference which will handle and
further parsing, and serialisation when a response is being sent (i.e.
errors, upstream content etc).
 * configured with a body object to allow access to body data (and the
body object will understand coding issues such as 'the body is TE
coded')

ClientResponse:
 * A logical client response
 * configured with a client protocol to encode via, and a body object
for getting body response data.

ClientRequestHandler:
 * an object that can take a ClientRequest and 'handle' it.
 * May be protocol specific. For instance, handling of requests on a
rtsp stream may be very different to those on a FTP control socket.
 * When ICAP is enabled, there will be an ICAP handler decorator which
the configuration logic will insert between the protocol and the normal
handler for that protocol. This decorator will divert off to icap as
needed, and if icap allows continued processing, then hand off to the
protocol specific handler.

In terms of data copying, if the icap adaption says 'no change', you can
just hand the body object over, or have a decorator object that uses the
original one with no copying.

I have to stop here, but I think the shape is clear - what do you
think,does it sound doable, are there holes ?

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Received on Tue Oct 10 2006 - 20:13:59 MDT

This archive was generated by hypermail pre-2.1.9 : Wed Nov 01 2006 - 12:00:06 MST