Re: no_chache directive (purge.pl)

From: Mark Reynolds <mark@dont-contact.us>
Date: Sat, 14 Nov 1998 22:10:01

At 15:07 13/11/98 +0100, you wrote:

>This is a quick hack. Use at your own risk. No guarantees, whatsoever!
>E.g. will break if you happen to have other non-related material named
>'asdf01qwer' or similar underneath your $cachedir. Hopefully, pine's
>quoted printable encoding will not garble my script too much.

Thanks Jens-S. Vöckler !

I took the liberty of adding the purge bits. I realised I
wanted to purge quite a few objects, so I made a list file
which is loaded at first run.

You will need to adjust local file locations, and uncomment purge
line for actual purges to start. Otherwise, it will print a list
of objects it would purge.

Please forgive my perl code. It works for me, but I make no
promises regarding speed or functionality.

Looking through what's actually cached, I found lots of things
which I doubt should be cached. Decided squids default cgi-bin
was too narrow, so I've changed it to go direct for, and not cache,
'cgi' period.

I also found lots of apparently cached search engine queries with a
semicolon in them. I'd be interested in what other people see there.
http://ad.doubleclick.net/adl/altavista.digital.com/result_next;kw=goes;ord=
1465478654
http://adaver1.altavista.yellowpages.com.au/image.ng;spacedesc=/yp_browse&gm
t=1998.11.05.17.42.32

I'd also be interested in why many of the objects reported as cached,
aren't purgeable with this code. As per the FAQ
http://squid.nlanr.net/Squid/FAQ/FAQ-7.html#ss7.5
They result in an HTTP/1.0 404 Not Found output, rather than the preferred
HTTP/1.0 200 OK

Nearly forgot. You have to add the purge acl's before this will work
as well. They're detailed more in the faq part mentioned above.
But if you bang this in and do a -k reconf, purge will work on localhost.

        acl PURGE method purge
        acl localhost src 127.0.0.1
        http_access allow purge localhost
        http_access deny purge

You probably wouldn't want to run this on a busy cache during the day.
On mine it used up 30 - 50% CPU.
You will also need to do each cache drive one at a time with this
current code. Probably not a bad thing really :-)

The other thing I learnt from all this is that just because you use
an always_direct doesn't mean it isn't cached. Just that it doesn't poll
peers. You also have to enter them into no_cache to go direct and also
not cache the results.

This code would also be good for looking for certain types of cached
files, like .mp3's, which I recall someone asking about. Actually,
while reports generated by calamaris and pwebstats show you the changes
and volume flows over a period of time, it would also be useful to know
exactly what's in the cache. Percentages, volumes etc. Another day.

Ran purge.pl on a 2 gig cache drive. It only purged 30 meg,
which is about 1.5%, which hardly makes it worth it really :-)
But that's with all the unpurgeable's still to be explained by someone
who knows more about this than me.

Create file of regexp lines you want purged. I used
/usr/local/squid/purge.regexp.list
;
\?
cgi
\.asp$
:2800/
\.iinet\.net\.au
\.iinet\.com\.au
\.omen\.net\.au
\.omen\.com\.au
\.aamsurveys\.com\.au

#!/usr/bin/perl
# for use with squid-2
# hards bits by "Jens-S. Voeckler" <voeckler@rvs.uni-hannover.de>
# easy bits by "Mark Reynolds" <mark@rts.com.au>

require 5.003;
 
my $cachedir = '/usr/local/squid/cache1';
my $regexpfile = "/usr/local/squid/purge.regexp.list";
my $squidclient = "/usr/local/squid/bin/client";
my $squidhost = "proxy";
my $squidport = "3128";
 
my ($top,$sub,$file,$line);
 
        open (FILE, "$regexpfile" ) || die "can't open file : $regexpfile \n";
        chomp(@regexps = <FILE>);
        close(FILE);
 
# the glob operator fails, thus we have to do it manually...
opendir( TOP, "$cachedir" ) || die "opendir($cachedir): $!\n";
while ( ($top = readdir(TOP)) ) {
    next unless $top =~ /[0-9A-F]{2,2}/;
    print STDERR "# processing in $cachedir/$top\n";
    if ( opendir( SUB, "$cachedir/$top" ) ) {
        while ( ($sub = readdir(SUB)) ) {
            next unless $sub =~ /[0-9A-F]{2,2}/;
            if ( opendir( FILES, "$cachedir/$top/$sub" ) ) {
                while ( ($file = readdir(FILES)) ) {
                    next unless $file =~ /[0-9A-F]{8,8}/;
                    match("$cachedir/$top/$sub/$file");
                }
                closedir(FILES);
            } else {
                warn "opendir($sub): $!\n";
            }
        }
        closedir(SUB);
    } else {
        warn "opendir($top): $!\n";
    }
}
closedir(TOP);
 
sub match ($) {
    my $fn = shift;
    if ( open(IN, "<$fn") ) {
        if ( sysread( IN, $line, 8192 ) > 60 ) {
            # throw away first 60 byte, and everything after LF
            $_ = substr($line,60,index($line,"\n",60)-60);
            # throw away the HTTP return code
            s(HTTP/1\.\d.*)();
            #
 
            foreach $regexps (@regexps) {
                if ($_ =~ /$regexps/i) {
                        print "$_\n";
                        #system "$squidclient -h $squidhost -p $squidport
-m PURGE $_ ";
                }
            }
 
            #print "$_\n";
 
        } else {
            warn "# $fn is strange...\n";
        }
        close(IN);
    } else {
        warn "open($fn): $!\n";
    }
}
 
 

.----------------------------------------------.------------------------.
| Mark Reynolds <mark@rts.com.au> | Phone 08 9474 1211 |
| Network Manager, Reynolds Technology Pty Ltd | Fax 08 9474 4772 |
| 8 Preston Street Como 6152 Western Australia | Pager 08 9480 5884 |
| PO Box 120 Como 6952 Western Australia | http://www.rts.com.au/ |
`----------------------------------------------^------------------------'
Received on Sat Nov 14 1998 - 07:24:00 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:03 MST