Re: Inline content modification? from Joe Cooper on 2001-01-15 (squid-dev)

From: Joe Cooper <joe@dont-contact.us>
Date: Mon, 15 Jan 2001 17:53:10 -0600

Thanks for the excellent advice Robert. More comments inline below.

Robert Collins wrote:

> I've been working on several things in this area Joe....
>
>> From a 'where to make the change' angle you have two choices:
>> client_side, where you make the change on every request (read CPU
> hog), or in http.c (aka server_side!) where you can modify the data
> coming into squid.

Right you are. Forgetting client_side.c and deleting that source tree.
All wrong, and I'm glad I asked before going too far down it.

> You can look at the changes to http.c in the te branch on
> squid.sourceforge.net to see how to alter incoming data. The filter
> model that Patrick McManus put toghether for te codings would also
> make sense for in-squid data modifications (process the data
> recieved chunk be recieved chunk). (Although it wouldn't be marked
> as a te coding :-]).
>
> The advantage of altering the incoming data is that a) the
> modifications get cached. and b) after the first retrieval, you can
recalculate
> the content-length for future requests, keep http/1.0 persistent
> conns happy. I don't suggest you touch the te code just yet, unless
> this is a medium term project :-]
>
> In the filter code you could use callbacks if you need external
> helpers (I'm already considering the need for that), but you'll have
to split the htttp function that calls perform_te (for me - for you
performurlrewrite/...) If you want to head down that path letme
> know and I'll split it up for you (save duplicate work)..

I'll look at it right now and see if I can figure out what it's doing
and what I would need to do with it. I welcome any guidance/assistance
you care to offer.

>> Am I an idiot? It appears to me that it is possible to read and work on
>> all of the object in client_side.c, and the noanim patch posted here a
>> few weeks ago does just that without problems. But I very well could be
>> missing something.
>
>
> It doesn't cache the results. String matching on blocked data isn't
the cheapest operation, and doing it n * without caching the
> results seems silly to me.

I /knew/ it! I am an idiot, after all. ;-) So I'll put it on the
server side and cache the results.

>> Assuming it is possible, can I use the ACL interface to generate the
>> match lists, or do I need to come up with a method to handle the match
>> string and the replacement string? It would be nice to have a named ACL
>> for the match strings, and it seems reasonable that this would work. So
>> can I run /anything/, including whole html pages, through a regex or
>> string matching ACL? Anyone have pointers for how to tackle this one?
>
>
> I think you need a new ACL - you may have data comin and be sitting
on the block boundary
> (ie s/jobloggs/johnloggs/ - the first block of data you receive may be
>
> asdasdasdasdasdjob
> and the second block
> loggs is a strange person
>
> you will need to buffer the possible string hit 'job' and not send it
on until you've seen the second block or hit EOF & flush any
> buffered data.

Ok. I think I can manage that, and it's something I didn't think of at
all. This is going to be a little more complicated than I'd
anticipated. But I think I can manage it.

I'm sure I'll be back with more questions once I've begun the
implementation.

Thanks.

--
Joe Cooper <joe@swelltech.com>
http://www.swelltech.com

Received on Mon Jan 15 2001 - 16:45:23 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:18 MST