perl function to process squid cache metadata

From: Craig Sanders <cas@dont-contact.us>
Date: Sat, 10 Mar 2001 14:37:16 +1100

while searching the list archives for info on how to do it, i noticed a
few other people wanted information on the cache object file format.

after being pointed in the right direction by Brian Beuning, i wrote
a perl subroutine to do that. i use it in scripts to trawl through my
proxy cache looking for img, script, iframe, and applet tags which are
of suspicious sizes: e.g. 468x60 for banner ads, and 1x1 for web bugs.

btw, speaking of trawling the cache....is there any reason why i
shouldn't modify squid to log the object file name, content-type, and
URL to the store.log? there doesn't seem to be any other use for that
log file, and it would make it possible to write a script which just
monitored the log file (using the File::Tail module) for object files to
investigate rather than thrashing the disks to trawl through the entire
cache.

anyway, here it is. i hope it is of use to people.

i'm interested in receiving comments and suggestions about it. in
particular, whether the array of hashes is the best way to return the
TLVs - aside from the URL i don't have much use for the metadata in
my scripts, i only returned it for "completeness" to make the routine
generically useful.

--cut here---
sub ParseSquidMeta {
    # Copyright Craig Sanders <cas@taz.net.au> 2001
    #
    # This subroutine is free software; you can redistribute it and/or
    # modify it under the terms of the GNU General Public License as
    # published by the Free Software Foundation; either version 2 of the
    # License, or (at your option) any later version.
    #
    # This subroutine was modelled on the storeSwapMetaUnpack function
    # in the squid source file store_swapmeta.c by Kostas Anagnostakis.
    #
    # Synopsis:
    # Parses the metadata at the start of a squid cache file. returns
    # the TLVs in an array of hashes, and also returns $buflen and $url.
    #
    # Call with:
    # ($buflen, $url) = ParseSquidMeta($buf,\@TLV) ;
    #
    # Input:
    # $buf is a scalar containing the first 4K or so of the cache file
    #
    # Output:
    # \@TLV is an array reference to store the TLVs in.
    # $buflen is the real total length of the metadata
    # $url is for convenience -- the most useful TLV for my needs is
    # the object's URL, TLV type 0x04 (^D)
    #
        # Typical Usage:
    # open(FILE,"<$file") || die "couldn't open $file: $!" ;
    # # get the store_swapmeta data
    # # read in a large chunk of the file into $buf
    # read(FILE, $buf, 4096);
    # # then get the real length of the metadata and the URL
    # ($buflen,$url) = ParseSquidMeta($buf,\@TLV) ;
    #
    # # now rewind the file and skip over the metadata so we can
    # # read the headers and the file contents.
    # seek(FILE,0,0) ;
    # read(FILE,$metajunk,$buflen);

    my($buf,$TLV) = @_ ;
    my ($i, $j, $buflen,$sizeofchar,$sizeofint,$sizes,$type,$length,$T,$url) ;
    $j = 0;
    $buflen = 0;

    $sizeofchar = length pack "c", 1;
    $sizeofint = length pack "i", 42 ;
    $sizes = $sizeofchar + $sizeofint ;

    $type = substr($buf,$j,$sizeofchar);
    $j += $sizeofchar ;
    $buflen = substr($buf,$j,$sizeofint);
    $buflen = unpack("i",$buflen);
    $j += $sizeofint ;

    $i = 0;
    while ($buflen - $j > $sizes) {
        $type = substr($buf,$j,$sizeofchar);
        $j += $sizeofchar ;
        $length = substr($buf,$j,$sizeofint);
        $length = unpack("i",$length);
        $j += $sizeofint ;
        $T = substr($buf,$j,$length -1); # -1 to skip NUL char
        $TLV->[$i]->{type} = $type ;
        $TLV->[$i]->{length} = $length ;
        $TLV->[$i]->{value} = $T ;
        $j += $length ;
        #print "type='$type' length='$length' T='$T'\n" ;
        if ($type eq "\cD") { $url = $T } ;
        $i++ ;
    } ;

    return ($buflen,$url) ;
};
--cut here---

--
craig sanders <cas@taz.net.au>
      GnuPG Key: 1024D/CD5626F0 
Key fingerprint: 9674 7EE2 4AC6 F5EF 3C57  52C3 EC32 6810 CD56 26F0
Received on Fri Mar 09 2001 - 20:37:22 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:58:36 MST