Inline content modification?

From: Joe Cooper <joe@dont-contact.us>
Date: Mon, 15 Jan 2001 17:10:52 -0600

Hello all,

I have a couple of queries that I hope someone here can answer for me,
since I don't have much familiarity with this part of the code.

First a bit of background. I'd like to add a small hack (and possibly a
completely configurable squid.conf option) to provide content
modification of pages as they pass through the cache. (I know, Red
Flag, "Danger Will Robinson!", Copyright issues!) This is for a website
accelerator that will be located in India and accelerating several sites
in other countries. Because of the layout of the network and this sites
partner sites (who serve users all over the world) they are not able to
modify the links on the partner sites pages, and some of them are
absolute links--so those links will cause the client to be bumped off of
the proxy and onto the much slower and more distant origin server. We
can pretty easily get the entry page through the cache, but from there,
I need to be able to modify http links to direct through the cache--then
a redirector will direct the cache back to the origin server.
Complicated, I know.

Here is what I envision doing:

Provide Squid a list of URL ACL's to match and rewrite, like so...

acl_dstdomain somesite http://www.somesite.com/

And in keeping with the current ACL style, a rewrite rule...

url_rewrite somesite http://www.accelhost.com/www.somesite.com/

This URL could then be redirected via Squirm, or similar, to translate
it to the actual origin server, including whatever comes after the
domain name.

So...Now to the questions:

Am I an idiot? It appears to me that it is possible to read and work on
all of the object in client_side.c, and the noanim patch posted here a
few weeks ago does just that without problems. But I very well could be
missing something.

Assuming it is possible, can I use the ACL interface to generate the
match lists, or do I need to come up with a method to handle the match
string and the replacement string? It would be nice to have a named ACL
for the match strings, and it seems reasonable that this would work. So
can I run /anything/, including whole html pages, through a regex or
string matching ACL? Anyone have pointers for how to tackle this one?

Finally, I welcome comments and suggestions for how best to proceed
and/or anything to look out for when implementing this.

Thanks.

--
Joe Cooper <joe@swelltech.com>
http://www.swelltech.com
Received on Mon Jan 15 2001 - 16:03:01 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:18 MST