[squid-users] url_rewrite_program doesn't seem to work on squid 2.6 STABLE17

From: Martin Jacobson \(Jake\) <jake.jacobson_at_ugov.gov>
Date: Thu, 3 Jul 2008 18:34:59 +0000 (GMT+00:00)

Hi,

I hope that someone on this group can give me some pointers. I have a squid proxy setup running version 2.6 stable 17 of squid. I recently upgraded from a very old version of squid, 2.4 something. The proxy sits in front of a search appliance and all search requests goes through the proxy.

One of my requirements is to have all search requests for cache:SOMEURL go to a URL rewrite program that compares the requested URL to a list of URLs that have been blacklisted. These URLs are one per line in a text file. Any line that starts with # or is blank is discarded by the url_rewrite_program. This Perl program seemed to work fine in the old version but now it doesn't work at all.

Here is the relevant portion of my Squid conf file:
-------------------------------------------------------------------------------
http_port 80 defaultsite=linsquid1o.myhost.com accel

url_rewrite_program /webroot/squid/imo/redir.pl
url_rewrite_children 10

cache_peer searchapp3o.myhost.com parent 80 0 no-query originserver name=searchapp proxy-only
cache_peer linsquid1o.myhost.com parent 9000 0 no-query originserver name=searchproxy proxy-only
acl bin urlpath_regex ^/cgi-bin/
cache_peer_access searchproxy allow bin
cache_peer_access searchapp deny bin

Here is the Perl program
-------------------------------------------------------------------------------
#!/usr/bin/perl

$| = 1;

my $CACHE_DENIED_URL = "http://www.mysite.com/mypage/pageDenied.intel";
my $PATTERNS_FILE = "/webroot/squid/blocked.txt";
my $UPDATE_FREQ_SECONDS = 60;

my $last_update = 0;
my $last_modified = 0;
my $match_function;

my $url, $remote_host, $ident, $method, $urlgroup;
my $cache_url;

my @patterns;

while (<>) {
   chomp;
   ($url, $remote_host, $ident, $method, $urlgroup) = split;
  
   &update_patterns();

   $cache_url = &cache_url($url);
   if ($cache_url) {
      &update_patterns();
      if (&$match_function($cache_url)) {
         $cache_url = &url_encode($cache_url);
         print "302:$CACHE_DENIED_URL?URL=$cache_url\n";
         next;
      }
   }
   print "\n";
}

sub update_patterns {
   my $now = time();
   if ($now > $last_update + $UPDATE_FREQ_SECONDS) {
      my @a = stat($PATTERNS_FILE);
      my $mtime = $a[9];
      if ($mtime != $last_modified) {
         @patterns = &get_patterns();
         $match_function = build_match_function(@patterns);
         $last_modified = $mtime;
      }
   }
}

sub get_patterns {
   my @p = ();
   my $p = "";
   open PATTERNS, "< $PATTERNS_FILE" or die "Unable to open patterns file. $!";
   while (<PATTERNS>) {
      chomp;
      if (!/^\s*#/ && !/^\s*$/) { # disregard comments and empty lines.
         $p = $_;
         $p =~ s#\/#\\/#g;
         $p =~ s/^\s+//g;
         $p =~ s/\s+$//g;
         if (&is_valid_pattern($p)) {
            push(@p, $p);
         }
      }
   }
   close PATTERNS;
   return @p;
}

sub is_valid_pattern {
   my $pat = shift;
   return eval { "" =~ m|$pat|; 1 } || 0;
}

sub build_match_function {
   my @p = @_;
   my $expr = join(' || ', map { "\$_[0] =~ m/$p[$_]/io" } (0..$#p));
   my $mf = eval "sub { $expr }";
   die "Failed to build match function: $@" if $@;
   return $mf;
}

sub cache_url {
   my $url = @_[0];
   (my $script, $qs) = split(/\?/, $url);
   if ($qs) {
      my $param, $name, $value;
      my @params = split(/&/, $qs);
      foreach $param (@params) {
         ($name, $value) = split(/=/, $param);
         $value =~ tr/+/ /;
         $value =~ s/%([\dA-Fa-f][\dA-Fa-f])/pack("C", hex($1))/eg;
         if ($value =~ /cache:([A-z0-9]{7,20}:)?([A-z]+:\/\/)?([^ ]+)/) {
            if ($2) {
               return $2 . $3;
            } else {
               # return "http://" . $3;
               return $3;
            }
         }
      }
   }
   return "";
}

sub url_encode {
   my $str = @_[0];
   $str =~ tr/ /+/;
   $str =~ s/([\?&=:\/#])/sprintf("%%%02x", ord($1))/eg;
   return $str;
}

Below is a sample of the blocked URLs file
################################################################################
#
# URL Patterns to be Blocked
#---------------------------
# This file contains URL patterns which should be blocked
# in requests to the Google cache.
#
# The URL patterns should be entered one per line.
# Blank lines and lines that begin with a hash mark (#)
# are ignored.
#
# Anything that will work inside a Perl regular expression
# should work.
#
# Examples:
# http://www.bad.host/bad_directory/
# ^ftp:
# bad_file.html$
################################################################################
# Enter URLs below this line
################################################################################

www.badsite.com/

So my question, is there a better way of doing this?
Does someone see anything wrong that is keeping this from working in 2.6?

Thanks,
Martin C. Jacobson (Jake)
Received on Thu Jul 03 2008 - 18:35:10 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 04 2008 - 12:00:02 MDT