Re: [squid-users] reverse proxy filtering?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 19 Apr 2009 18:46:48 +1200

Jeff Sadowski wrote:
> On Sat, Apr 18, 2009 at 10:24 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> Jeff Sadowski wrote:
>>> On Sat, Apr 18, 2009 at 5:18 PM, Amos Jeffries <squid3_at_treenet.co.nz>
>>> wrote:
>>>> Jeff Sadowski wrote:
>>>>> I'm new to trying to use squid as a reverse proxy.
>>>>>
>>>>> I would like to filter out certain pages and if possible certain words.
>>>>> I installed perl so that I can use it to rebuild pages if that is
>>>>> possible?
>>>>>
>>>>> My squid.conf looks like so
>>>>> <==== start
>>>>> acl all src all
>>>>> http_port 80 accel defaultsite=outside.com
>>>>> cache_peer inside parent 80 0 no-query originserver name=myAccel
>>>>> acl our_sites dstdomain outside.com
>>>> aha, aha, ..
>>>>
>>>>> http_access allow all
>>>> eeek!!
>>> I want everyone on the outside to see the inside server minus one or
>>> two pages. Is that not what I set up?
>> By lucky chance of some background defaults only, and assuming that the web
>> server is highly secure on its own.
>>
>> If you have a small set of sites, such as those listed in "our_sites" then
>> its best to be certain and use that ACL for the allow as well.
>>
>> http_access allow our_sites
>> http_access deny all
>>
>> ... same on the cache_peer_access below.
>>
>>>>> cache_peer_access myAccell all
>>>>> <==== end
>>>>>
>>>>> how would I add it so that for example
>>>>>
>>>>> http://inside/protect.html
>>>>>
>>>>> is blocked?
>>>> http://wiki.squid-cache.org/SquidFaq/SquidAcl
>>> so I want redirector_access?
>>> Is there an example line of this in a file
>>>
>>> I tried using
>>>
>>> url_rewrite_program c:\perl\bin\perl.exe c:\replace.pl
>>>
>>> but I guess that requires more to use it? an acl?
>>> should "acl all src all" be "acl all redirect all" ?
>> No to all three. The above is all line you mention trying is all thats
>> needed.
>>
>> url_rewrite_access allow all
>>
>> but the above should be the default when a url_rewrite_program is set.
>
> so how do you tell it to use the url_rewrite_program with the inside site?
> Or does it use the script on all pages passing through the proxy?

It changes the request as passed on to the web server in-transit. So the
client still sees what they clicked on, but gets content from the other
site. Does not affect link or such on the page content.

>
> Is this only a rewrite on the requested url from the web browser?
> Ahh that might answer some of my questions before. I never tried
> clicking on it after implementing the rewrite script. I was only
> hovering over the url and seeing that it was still the same.
>
>> What is making you think its not working? and what do the logs say about it?

If you only checked the pages links, they may not change. Logs should
show where the client went to and IP/name of server fetched from. Which
would be the name of redirected server.

>> Also what is the c:/replace.pl code?
>>
>
> <=== start
> #!c:\perl\bin\perl.exe
> $| = 1;
> $replace="<a href=http://inside/login.html.*?</a>";
> $with="no login";
> while ($INPUT=<>) {
> $INPUT=~s/$replace/$with/gi;
> print $INPUT;
> }
> <=== end
>
> I think I see the problem now I guess I am looking for something else
> besides url_rewrite maybe a full text replacement :-/

thats what your code wants, not what I pointed you to using.

You know I'm thinking you could get away without altering those pages,
but just blocking external clients from visiting those URL.

>
>>>>> and is it possible to filter/replace certain words on the site
>>>>>
>>>>> like replace "Albuquerque" with "Duke City" for an example on all pages?
>>>> No. no. no. Welcome to copyright violation hell.
>>> This was an example. I have full permission to do the real translations.
>>> I am told to remove certain links/buttons to login pages. thus I
>>> replace "<a herf=inside>button</a>" with "" Currently I have a
>>> pathetic perl script that doesn't support cookies and is gong through
>>> each set of previous pages to bring up the content. I was hoping squid
>>> would greatly simplify this.
>>> I was using www::mechanize I know this isn't the best way but they
>>> just need a fast and dirty way.
>> Ah, okay. Well the only ways squid has for doing content alteration is far
>> too much as well for that use. (coding up an ICAP server and processing
>> rules or a full eCAP adaptor plugin).
>>
>> IMO you need to kick the webapp developers to make their app do the removal
>> under the right conditions. It would solve many more problems than having
>> different copies of a page available with identical identifiers.
>>
>> Amos
>> --
>> Please be using
>> Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14
>> Current Beta Squid 3.1.0.7
>>

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14
   Current Beta Squid 3.1.0.7
Received on Sun Apr 19 2009 - 06:46:48 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 19 2009 - 12:00:02 MDT