Re: [squid-users] partial word replacement in HTML data

From: Antony Stone <Antony@dont-contact.us>
Date: Thu, 31 Jul 2003 18:55:32 +0100

On Wednesday 30 July 2003 7:42 am, Shunichi Tabata wrote:

> Hi! I'm Shunichi from Japan.
> I wanna make the squid to change HTML data partially according to a certain
> pattern when the data come through the squid. For example, I wanna
> substitute each and every word "America" on HTML data to "USA". At first, I
> added the data changing function to "httpReadReply" in "http.c". But I
> found that you might not able to find out the pattern to be substituted
> when the web server replied the data in a few times but at once. For
> example, the first data may contains the "Ame" at the end of the data and
> second data may contains the "rica" at the beginning of the data. So I'm
> looking for the point in which I can handle the whole HTML reply data. Do
> you have any idea ??

There may be another problem which is even worse for you - so unless you can
deal with that one as well, this one you've identified is only slightly
important.

The other problem is that not all HTTP responses are in plain text. Some
web servers save on bandwidth by sending back a response with the
"Content-Encoding:" header set to "gzip" (Google does this, for example), and
the browser then unzips the response and interprets the decompressed html.

That means your replace function may need to uncompress the data stream to
find the "America", replace it with "USA" and then recompress the data stream
again before sending it on to the client.

Not impossible, admittedly, but probably harder than you'd expected to
encounter...

If you do find the ideal place in the Squid source to do this sort of thing,
however (or if someone else has any bright ideas), I'd be interested to know
- I'm interested in scanning (not changing) the content returned by servers,
before selecting whether to pass on to the client, cache the response, etc.
 

Regards,
 

Antony.

-- 
The first ninety percent of an engineering project takes ninety percent
of the time, and the last ten percent takes the remaining ninety percent.
Received on Thu Jul 31 2003 - 11:55:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:18:23 MST