Re: Feedback about the content processing framework

From: Robert Collins <robert.collins@dont-contact.us>
Date: Wed, 31 Jan 2001 20:39:20 +1100

----- Original Message -----
From: "Moez Mahfoudh" <moez.mahfoudh@imag.fr>
To: "Squid Dev" <squid-dev@squid-cache.org>; "Robert Collins" <robert.collins@itdomain.com.au>
Sent: Wednesday, January 31, 2001 8:26 PM
Subject: Feedback about the content processing framework

> Hi Robert,
>
> I tested last night the new content processing framework. It is great
> especially because of the acl and configuration features.

thank you.

> But I was unable to use it because of the lack of some details I'll
> explain later. Fortunately, it will not a big work to have a nice
> solution as I guess.

I'll try :-]

> So let's try to have a concrete content processing module (called
> hi_subs): for some reason, I want to have squid send me all text/plain
> files with the word "Hello" substituted by "Hi".
> So when squid calls the function
> clientFilterHiSubs(*buf,len,*filter_list,filters,flags,data), it'll send
> me in buf, the headers, then the file content and then stops calling
> this function.

correct.

> The first problem is that my filter have not to substitute "Hello" by
> "Hi" in the headers. Second, the filter need to know when is it called
> with the last chunk to do some terminating actions(for example flushing
> its buffers in our case).
> So I suggest that the flags parameter contain some information which
> will guide the filter while processing data. When we call it with the
> headers in buf, you can put flags |=CP_HEADERS and when we call it with
> the last chunk, flags |= CP_LAST_CHUNK.

At the moment, the API rule is that the first buffer will always be the headers, with no body.
There is a flag FILTER_EOF which matches your CP_LAST_CHUNK.

FILTER is more appropriate, because content processing is a sub case of filters which are more generic.

> I recommend also to ensure that the headers are always sent not mixed
> with the body of the reply. I mean, there always not less than two calls
> to the filter. One for the headers, and the other for the body.

Already guaranteed. clientSendMoreData, and the equivalent function in http.c buffer content until the full headers arrive, OR the
connection aborts.

> This implies some clarification:
> * If the headers size is < MAX_CHUNK_SIZE (I think it is 4096), call the
> filter with the headers and flag = CP_HEADERS

There is no max chunk size. A filter may pass a buffer of any size. The terminating filter has the responsibilty of parcelling out
the data in appropriate sized chunks if there is a limit at that point. The reason for this is, with content generating filters (ie
gunzippping) we don't know how much data we will end up with at the beginning of the chain. So the end of the chain must set any
such limits.

> * If the headers size is > MAX_CHUNK_SIZE, call the filter with the
> headers chunks and flag = CP_HEADERS as long as required. (If headers
> size is 10000 for example, call it first with 4096 bytes, then with 4096
> bytes then with 1808 bytes before starting sending the body).

See above: it'll be the first chunk, and all of the first chunk. In fact your Add routine is called with the headers parsed and in
metadata format, so you can modify them there if need be).

> * If the body size is 0, do a call to the filter with flag=CP_LAST_CHUNK
> (I mean, always do a call to filters for the body even if it is empty).

Check for FILTER_EOF. It's not 100% used (yet), but getting it placed appropriately is on my todo list. If a filter decides to strip
content from the end of the file, it currently cannot signal backwards that it is aborting/flushing/going_to_throw_away the rest of
the content. At the moment, the filter must send a buff=NULL, len=0, flags |=FILTER_EOF to the rest of the filters. When I have the
code checking for FILTER_EOF everywhere, I'll get onto experimenting with allowing filters to trigger EOF while they are still
receiving data.

> Last request, I'd prefer to have in the prototype of the filters the
> reply/request struct for read only. The programmer can use to have some
> information about what it is processing.

Which? reply or request?. Add a link in your MakeState function & cbdatalock the request, then your filter can use the content (and
will know which it should be).

> That's all folks...
> Waiting your comments and patches....

There will be patches :-] - but I think most of what you need is covered already.

Perhaps my first target should be more doco?

Rob
Received on Wed Jan 31 2001 - 02:38:44 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:26 MST