Re: [squid-users] url_rewrite_program doesn't seem to work on squid 2.6 STABLE17

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 04 Jul 2008 11:56:43 +1200

Martin Jacobson (Jake) wrote:
> Hi,
>
> I hope that someone on this group can give me some pointers. I have a squid proxy setup running version 2.6 stable 17 of squid. I recently upgraded from a very old version of squid, 2.4 something. The proxy sits in front of a search appliance and all search requests goes through the proxy.
>
> One of my requirements is to have all search requests for cache:SOMEURL go to a URL rewrite program that compares the requested URL to a list of URLs that have been blacklisted. These URLs are one per line in a text file. Any line that starts with # or is blank is discarded by the url_rewrite_program. This Perl program seemed to work fine in the old version but now it doesn't work at all.
>
> Here is the relevant portion of my Squid conf file:
> -------------------------------------------------------------------------------
> http_port 80 defaultsite=linsquid1o.myhost.com accel
>
> url_rewrite_program /webroot/squid/imo/redir.pl
> url_rewrite_children 10
>
>
> cache_peer searchapp3o.myhost.com parent 80 0 no-query originserver name=searchapp proxy-only
> cache_peer linsquid1o.myhost.com parent 9000 0 no-query originserver name=searchproxy proxy-only
> acl bin urlpath_regex ^/cgi-bin/
> cache_peer_access searchproxy allow bin
> cache_peer_access searchapp deny bin
>
> Here is the Perl program
> -------------------------------------------------------------------------------
> #!/usr/bin/perl
>
>
> $| = 1;
>
> my $CACHE_DENIED_URL = "http://www.mysite.com/mypage/pageDenied.intel";
> my $PATTERNS_FILE = "/webroot/squid/blocked.txt";
> my $UPDATE_FREQ_SECONDS = 60;
>
> my $last_update = 0;
> my $last_modified = 0;
> my $match_function;
>
> my $url, $remote_host, $ident, $method, $urlgroup;
> my $cache_url;
>
> my @patterns;
>
>
> while (<>) {
> chomp;
> ($url, $remote_host, $ident, $method, $urlgroup) = split;
>
> &update_patterns();
>
> $cache_url = &cache_url($url);
> if ($cache_url) {
> &update_patterns();
> if (&$match_function($cache_url)) {
> $cache_url = &url_encode($cache_url);
> print "302:$CACHE_DENIED_URL?URL=$cache_url\n";
> next;
> }
> }
> print "\n";
> }
>
> sub update_patterns {
> my $now = time();
> if ($now > $last_update + $UPDATE_FREQ_SECONDS) {
> my @a = stat($PATTERNS_FILE);
> my $mtime = $a[9];
> if ($mtime != $last_modified) {
> @patterns = &get_patterns();
> $match_function = build_match_function(@patterns);
> $last_modified = $mtime;
> }
> }
> }
>
>
> sub get_patterns {
> my @p = ();
> my $p = "";
> open PATTERNS, "< $PATTERNS_FILE" or die "Unable to open patterns file. $!";
> while (<PATTERNS>) {
> chomp;
> if (!/^\s*#/ && !/^\s*$/) { # disregard comments and empty lines.
> $p = $_;
> $p =~ s#\/#\\/#g;
> $p =~ s/^\s+//g;
> $p =~ s/\s+$//g;
> if (&is_valid_pattern($p)) {
> push(@p, $p);
> }
> }
> }
> close PATTERNS;
> return @p;
> }
>
> sub is_valid_pattern {
> my $pat = shift;
> return eval { "" =~ m|$pat|; 1 } || 0;
> }
>
>
> sub build_match_function {
> my @p = @_;
> my $expr = join(' || ', map { "\$_[0] =~ m/$p[$_]/io" } (0..$#p));
> my $mf = eval "sub { $expr }";
> die "Failed to build match function: $@" if $@;
> return $mf;
> }
>
> sub cache_url {
> my $url = @_[0];
> (my $script, $qs) = split(/\?/, $url);
> if ($qs) {
> my $param, $name, $value;
> my @params = split(/&/, $qs);
> foreach $param (@params) {
> ($name, $value) = split(/=/, $param);
> $value =~ tr/+/ /;
> $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg;
> if ($value =~ /cache:([A-z0-9]{7,20}:)?([A-z]+:\/\/)?([^ ]+)/) {
> if ($2) {
> return $2 . $3;
> } else {
> # return "http://" . $3;
> return $3;
> }
> }
> }
> }
> return "";
> }
>
> sub url_encode {
> my $str = @_[0];
> $str =~ tr/ /+/;
> $str =~ s/([\?&=:\/#])/sprintf("%%%02x", ord($1))/eg;
> return $str;
> }
>
> Below is a sample of the blocked URLs file
> ################################################################################
> #
> # URL Patterns to be Blocked
> #---------------------------
> # This file contains URL patterns which should be blocked
> # in requests to the Google cache.
> #
> # The URL patterns should be entered one per line.
> # Blank lines and lines that begin with a hash mark (#)
> # are ignored.
> #
> # Anything that will work inside a Perl regular expression
> # should work.
> #
> # Examples:
> # http://www.bad.host/bad_directory/
> # ^ftp:
> # bad_file.html$
> ################################################################################
> # Enter URLs below this line
> ################################################################################
>
>
> www.badsite.com/
>
>
> So my question, is there a better way of doing this?

You would be much better off defining this as an external_acl program
and possibly using deny_info to do the 'redirect' when it blocks a request.
That way also, the ACL-lookup results can be cached in squid and reduce
the server load doing url re-writes.

> Does someone see anything wrong that is keeping this from working in 2.6?

Your old way should still have worked though. inefficient as it was.
The squid.conf snippet you have shown appears to be correct, there may
be something elsewhere unexpectedly affecting it though.

Or the perl program may simply be failing to access its data file
properly (remember squid changes its permissions down to a non-root user
who need access to all resources.)

A view of the error should be helpful in tracking this down. That may be
in syslog, or cache.log. Or by running the Perl program from command
line as the squid effective user.

Amos

-- 
Please use Squid 2.7.STABLE3 or 3.0.STABLE7
Received on Thu Jul 03 2008 - 23:56:43 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 04 2008 - 12:00:02 MDT