Re: no_chache directive from Jens-S. Voeckler on 1998-11-16 (squid-users)

From: Jens-S. Voeckler <voeckler@dont-contact.us>
Date: Mon, 16 Nov 1998 15:04:20 +0100

On Mon, 16 Nov 1998, Andrew Smith wrote:

]Actually it worked for Squid 2.0 patch2, but that is because it recursed
]through the cache objects themselves. Yes it took a while, but it didn't
]slow down Squid very much, and we have very busy proxies. I suspect
]strings would be faster than doing it in Perl, but I didn't check.

I wouldn't really wager on that, since 'strings' reads rather largish
chunks of the files, and my version only read the first nnn byte. But hey,
as long as it get things done, we are all glad to have alternatives.

First, thanks Mark for improving the script. From Andrew's strings idea I
discovered an error in my part of Mark's and my script, the corrected and
improved version is appended below.

o The bugfix is that our first version yielded URLs trailed by characters
  whose ASCII code was below space.
o The version 0.2 uses HTTP/1.0 sockets itself instead of going via the
  client program. Should save lots of forks and execs.
o The version 0.2 uses RE anchors while searching the cache directories,
  thus lowering the chances of misplaces file to screw the script.
o Many option to configure for your taste without having to modify the
  source.

While using the improved version, it occurred to me that many files I found
in my spool directory and wanted to purge were already removed from the
squid - but not from the disk.

This leads me to a question: Would it be safe to remove() those files, for
which the PURGE query results in a

        HTTP/1.0 404 Not Found
        Server: Squid/2.0.PATCH2
        Mime-Version: 1.0
        ...

Also, Alex suggested something using squid-internal queries, which would
yield only those objects that squid actually knows about.

As further improvement to the script, I would suggest using persistent
connection(s) for the purge queries. You have to enable the PURGE acl in
your squid, and for real purges, the "-f" option has to specified. The
statistics will tell you what was really purged...

Le deagh dhùrachd,
Dipl.-Ing. Jens-S. Vöckler (voeckler@rvs.uni-hannover.de)
Institute for Computer Networks and Distributed Systems
University of Hanover, Germany; +49 511 762 4726

--- &< snip, snip ---
#!/usr/bin/perl
# for use with squid-2
# USE AT YOUR OWN RISK! NO GUARANTEES, WHATSOEVER! DON'T BLAME US!
# Authors: Mark Reynolds <mark@rts.com.au>,
# Jens-S. Vöckler <voeckler@rvs.uni-hannover.de>
#
# By default, the program will do a dry run, that is "print but not purge".
# You have to specify the "-f" commandline option in order to really purge.
# Please note that, even though purged, the file remains on the disk...
#
require 5.003;
use strict;
use IO::Handle;
use IO::Socket;
use Getopt::Std;

my %option;
getopts('c:d:p:hfv?',\%option);

my $VERSION = 0.2; # for the interested reader
my $cachedir = $option{d} || '/var/spool/cache';
my $squidport = $option{p} || '3128';
my $squidhost = $option{c} || 'localhost';
my $regexpfile = shift || usage();
usage() if ( $option{'?'} || $option{h} );

sub usage () {
    warn "Usage: $0 [-d cachedir] [-c host] [-p port] [-f] [-v] RE-file\n\n";
    warn " -f \tdo not dry run, *really* purge the URLs.\n";
    warn " -d cachedir\tLocation of the cache spool base directory\n";
    warn " -c host \tName of the squid host, defaults to localhost.\n";
    warn " -p port \tPort number of squid, defaults to 3128.\n";
    warn " -v \tmore verbose information (NYI).\n";
    die "\n";
}

# package local variables
my ($top,$sub,$file,$line,@regexps,%statistics);

# check for the existence of the re input file, and read its contents
open (FILE, "$regexpfile" ) || die "can't open file : $regexpfile \n";
chomp(@regexps = <FILE>);
close(FILE);

# catch Ctrl-C, and have it print statistics
$SIG{INT} = sub { exit(1); };

# the glob operator fails with a regular C shell (>8kB args, see perlop),
# thus we have to do it manually - it is faster, anyway.
opendir( TOP, "$cachedir" ) || die "opendir($cachedir): $!\n";
while ( ($top = readdir(TOP)) ) {
    next unless $top =~ /^[0-9A-F]{2,2}$/;
    print STDERR "# processing in $cachedir/$top\n";
    if ( opendir( SUB, "$cachedir/$top" ) ) {
        while ( ($sub = readdir(SUB)) ) {
            next unless $sub =~ /^[0-9A-F]{2,2}$/;
            print STDERR "# processing in $cachedir/$top/$sub\n" if $option{v};
            if ( opendir( FILES, "$cachedir/$top/$sub" ) ) {
                while ( ($file = readdir(FILES)) ) {
                    next unless $file =~ /^[0-9A-F]{8,8}$/;
                    match("$cachedir/$top/$sub/$file");
                }
                closedir(FILES);
            } else {
                warn "opendir($sub): $!\n";
            }
        }
        closedir(SUB);
        last;
    } else {
        warn "opendir($top): $!\n";
    }
}
closedir(TOP);
exit(0);

END {
    # print statistics on termination (if there are any).
    if ( defined $statistics{all} ) {
        print STDERR "\nstatistics of return codes:\n";
        foreach $line ( sort keys %statistics ) {
            printf " %5s: %d\n", $line, $statistics{$line};
        }
    }
}

sub match ($) {
    my $fn = shift;
    my ($regexps,$result);
    if ( open(IN, "<$fn") ) {
        # first 60 Bytes are binary data, for now throw them away
        # a simple seek( IN, 60, 0 ) would probably do...
        if ( sysread( IN, $line, 60 ) < 60 ) {
            warn "# $fn is strange...\n";
            close(IN);
            return undef;
        }
        # now read just *one* line, which might look like
        # "http://www.netapp.com/images/logondmp.gif\0\bHTTP/1.1 200 OK\r\n"
        # this is another read()s --> slightly inefficient, but
        # allows for arbitrarily sized URLs.
        $line = <IN>;
        # throw away anything up to first NUL byte (C string termination).
        $_ = substr($line,0,index($line,"\0"));
        #
        # use the -f commandline option, if you really want to PURGE!
        foreach $regexps (@regexps) {
            if ( /$regexps/i ) {
                $result = $option{f} ? purge($_) : 'dry';
                if ( defined $result ) {
                    print "$_ ($result)\n";
                    $statistics{$result}++;
                    $statistics{all}++;
                }
            }
        }
        close(IN);
    } else {
        warn "open($fn): $!\n";
    }
    1;
}

sub purge ($) {
    #
    # TODO:
    # 1) convert to HTTP/1.1 and persistent connections
    # of course, that would need a little redesign...
    # 2) implement a timeout while waiting on caches, as many
    # OS don't implement POSIX.1g socket timeouts
    #
    my $url = shift || die "URL missing in sub!\n";
    my $socket = IO::Socket::INET->new("$squidhost:$squidport");
    if ( ! defined $socket ) {
        warn "# unable to open \"$squidhost:$squidport\": $!\n";
        return undef;
    } else {
        print STDERR "# about to purge $url\n" if $option{v};
    }

    # write request
    my $request = "PURGE $url HTTP/1.0\r\nAccept: */*\r\n\r\n";
    print $socket $request;
    $socket->flush();
    print STDERR "# sent $request" if $option{v};

    # read reply
    my $result = -1;
    $line = <$socket>; # HTTP reply status line
    warn "# first reply line: $line" if $option{v};
    if ( $line !~ m(^HTTP/1\.\d (\d{3,3})) ) {
        print STDERR "# unable to PURGE $url, request returned:\n";
        print STDERR "#> $line#> ", join("\n#> ",split("\n",<$socket>)), "\n";
        close($socket);
        return undef;
    } else {
        $result = $1;
        <$socket>;
    }
    close($socket);
    $result;
}
Received on Mon Nov 16 1998 - 08:00:27 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:06 MST