Re: Inline content modification?

From: Robert Collins <robert.collins@dont-contact.us>
Date: Wed, 17 Jan 2001 12:04:07 +1100

Joe,
I think this code may still be slightly broken for what you want to do - however it should make a good reference point. I'm avoiding
dlopened libraries until I have time to learn the peculiarities of portable dlopens across MS Visual C & gcc/cygwin. Also beware the
new authentication code: it won't fit with auth_rewrite (I did look at it).

However I've followed up an idea I had, and built a (rather primitive) data filter that uses a pipe conecpt rather than a loop of
called once filters (as both the original TE code, and Olaf's filtering code does).

It's somewhat easier to use because you can consider the next filter in the chain to be a write(buf,len) function, allowing a bunch
of malloc's & special cases to be avoided.

If you want to play with it, it's tagged rbcollins_filters on sourceforge.

BTW: client_side is good for filtering too - I was pointing out that for a accelerator that alters every request with those URL's,
server sided is more efficient. For altering based on username/client-agent or a number of other things, client_side is more
efficient. What I'm building hooks filters in at four different locations: incoming data request & response, outgoing data request
& response.

'Course it's 100% up to you whether to modify the response for the whole server or each client :-]

here are my metnal examples for places to hook things:
virus scanning - incoming responses
content translation - outgoing responses
on the fly Content-Encoding - outgoing responses (compression) and incoming responses (store canonical)
pattern blocking - incoming responses & outgoing requests (Ie block httptunnel & the irc via CONNECT stuff)

Rob

----- Original Message -----
From: "Joe Cooper" <joe@swelltech.com>
To: "Henrik Nordstrom" <hno@hem.passagen.se>
Cc: "Squid Dev" <squid-dev@squid-cache.org>
Sent: Wednesday, January 17, 2001 10:12 AM
Subject: Re: Inline content modification?

> Ok, now that I've dug in deeper, I see what you're talking about.
>
> I've also realized that someone (Olaf Titz--the link is on the
> squid.sourceforge page) has already done just what I need (and a lot
> more) for Squid 2.3S2. So I'm forward porting it now and converting it
> to the new cbdata stuff, and will start a new branch in CVS once it's
> compilable. I'm going to write to Olaf to see if he has intentions to
> maintain this work, and if not, I'll adopt it. It's much larger than
> anything I had planned to do, but since it's already been done, I'd hate
> to see it disappear in favor of a lesser implementation.
>
> It has some weirdness in it that confuses me, wherein it has 'faked'
> object orientation via preprocessor trickery. Since I haven't the
> foggiest notion of object oriented programming or complex preprocessor
> fun I'll have to wade my way through it to figure out what it's doing
> and probably bring it back to plain ol' C. It also does dynamically
> loadable modules which seems overly complicated and not very cross
> platform (from comments in the code). But I'll leave it as is for now,
> because loadable modules are cool. ;-)
>
> It also operates on the client side rather than server side, which has
> possible performance issues, as Robert pointed out. Then again, now
> that I've actually seen a fully fleshed out design of this I realize
> that client side allows better flexibility, and so I'll probably leave
> it as is for now. The added flexibility is that Squid can decide which
> URLs to modify based on the client as well as the source. Meaning
> clients in the US could get unmodified URLs while Indian clients can get
> modified URLs. Pretty neat and probably worth the CPU hit in some cases.
>
> Thanks for the tips, Henrik. I'll be back with more concrete questions
> as I go along, I'm sure.
>
> Henrik Nordstrom wrote:
>
> > Joe Cooper wrote:
> >
> >
> >> Assuming it is possible, can I use the ACL interface to generate the
> >> match lists, or do I need to come up with a method to handle the match
> >> string and the replacement string? It would be nice to have a named ACL
> >> for the match strings, and it seems reasonable that this would work. So
> >> can I run /anything/, including whole html pages, through a regex or
> >> string matching ACL? Anyone have pointers for how to tackle this one?
> >
> >
> > You need to parse the HTML and only run the links thru a ACL list. But
> > you are probably better of by defining some new kind of rewrite list
> > using regex pattern and substitution pairs with back references, much
> > like what you do in sed/perl. ACL's cannot rewrite data, only return
> > "true/false".
> >
> > /Henrik
>
>
Received on Tue Jan 16 2001 - 17:52:59 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:24 MST