Re: Logfile analysis scripts

From: Martin Gleeson <gleeson@dont-contact.us>
Date: Tue, 20 Aug 1996 07:36:33 +1000

>At 10:31 18.08.1996 +0200, you wrote:
>>
>>On Sun, 18 Aug 1996 tomaz@mail.siol.net wrote:
>>
>>> > Are the logfile analysis scripts that I find on www.nlanr.net
>>> > compatible with an access.log made from Squid 1.0.4? I try
>>> > access-extract.pl < access.log > summary
>>> > and the summary contains just a few lines, looks like it
>>> > did not find any entries.
>>> > I'm using Perl 5.003.
>>
>>Look at your squid configuration. Maybe, you have the emulate_httpd_log
>>option enabled (that's the default!). If so, you have to start
>>access-extract.pl and access-extract-urls.pl with the option -h. Look at
>>
>Attention! 'emulate_httpd_log' changed from 'on' in 1.0.x to 'off'
>in 1.1.alphaX. All my statistics (pwebstats!) is broken.

I guess that's my cue (as the author of pwebstats). Pwebstats will be
supporting the squid native log format when I get the time to make the
changes (among many, many other changes). In the meantime, here is a perl
script that will convert native squid logs into cern-style cache hit and
miss logs in common log format.

------------( start native2common.pl )-----------
#!/usr/local/bin/perl
################################################################################
#
# native2common.pl : convert squid native log file into cern-style
# cache hit & cache miss logs (in common log format)
#
# Martin Gleeson <gleeson@unimelb.edu.au>, August 1996
#
# (c) Copyright, The University of Melbourne, 1996
#
################################################################################

#------------ You must set the following variable -----------------------------#

$gmtoffset = "+1000"; ###### Offset from GMT

#------------------------------------------------------------------------------#

# uncomment the following line and the ones marked STOPLIST below if you want
# to ignore particular IP numbers in the log - e.g. ignore neighbours if you
# only want stats for local clients. Variable points to a file consisting of
# a single line with a list of one or more IP numbers, delimited by the 'pipe'
# symbol, e.g.:
# 123.45.67.89|234.56.78.90|234.5.6.78

# $stoplist = "/servers/http/squid/etc/neighbours";

#------------------------------------------------------------------------------#

$address_type=2;

%months = ( '0','Jan', '1','Feb', '2','Mar', '3','Apr', '4','May', '5','Jun',
          '6','Jul', '7','Aug', '8','Sep', '9','Oct', '10','Nov', '11','Dec');

%longmonths = ( '0','January', '1','February', '2','March', '3','April',
                '4','May', '5','June', '6','July', '7','August',
                '8','September', '9','October', '10','November',
                '11','December');

#------------------------------------------------------------------------------#

$usage = "usage: native2common.pl <logfile>\n";

if( ! $ARGV[0] ) { die $usage; }

$logfile = shift ( @ARGV );

#------------------------------------------------------------------------------#

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$year += 1900;
$time_now = "$hour:$min:$sec on $mday $longmonths{$mon} $year";

printf STDERR
"==============================================================\n";
printf STDERR "native2common.pl started on $logfile at $time_now.\n";
printf STDERR
"==============================================================\n";
printf STDERR "\n";

open(COUNT,"/bin/wc -l $logfile |");
while( <COUNT> ){ chop; ($line_count) = /^\s+(\d+)\s+\S+$/; }
close(COUNT);
$inc = sprintf "%d", ( $line_count / 50 );
print STDERR " The logfile has $line_count entries.\n";
print STDERR " Processing...\n";
print STDERR " 0% 50% 100%\n";
print STDERR " |-----------------------|------------------------|\n ";
$counter=0; $hash_counter=0;

$counter=$linecount=0;

# >>STOPLIST<< Uncomment these lines for the stoplist function
# open(STOPLIST,"$stoplist");
# $stops = <STOPLIST>; chop($stops) if($stops =~ /\n/);
# close(STOPLIST);

open(LOG_FILE,"$logfile");
open(PROXY_FILE,">> proxy.convert");
open(CACHE_FILE,">> cache.convert");

while(<LOG_FILE>){
        $linecount++;
        $counter++;
        if( $counter >= $inc )
        {
                $counter = 0;
                $hash_counter++;
                printf STDERR '#' if( $hash_counter <= 50 );
        }

# split the input line into its various components

        chop;

        @line = split(/\s+/,$_,7);

        $time = $line[0];
        $elapsed = $line[1];
        $host = $line[2];
        $codes = $line[3];
        $size = $line[4];
        $htype = $line[5];
        $url = $line[6];

        # >>STOPLIST<< Uncomment this line for the stoplist function.
        # next if( $host =~ /$stops/);

        ($seconds,$milliseconds) = split(/\./,$time);
        ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime($seconds);
        $year += 1900;
        if($mday < 10){ $mday = "0" . "$mday";} if($hour<10){ $hour="0" .
"$hour";}
        if($min < 10){ $min = "0" . "$min"; } if($sec<10){ $sec="0" . "$sec"; }

        $www_date = "$mday/$months{$mon}/$year:$hour:$min:$sec $gmtoffset";

        ($type,$code,$remotetype) = split(/\//,$codes);

        # convert IP into hostname
        if( $hosts{$host} ) { $name = $hosts{$host};}
        else
        {
                if($host =~ /\d\d\.\d\d/)
                {
                        @address = split(/\./,$host);
                        $addpacked = pack('C4',@address);
                        ($name,$aliases,$addrtype,$length,@addrs)
                                = gethostbyaddr($addpacked,$address_type);
                }
                if ( $name eq "") { $name = $host };
                $name = "\L$name";
                $hosts{$host} = $name;
        }
        if( $type eq "TCP_DENIED" ){ # fetched from external source
                $line_new = "$hosts{$host} - - [$www_date] \"$htype $url
HTTP/1.0\" 401 $size\n";
                print PROXY_FILE "$line_new";
                $sizeproxied += $size;
                $totalproxied += 1;
        }
        if( $type eq "TCP_MISS" || ($type eq "TCP_IFMODSINCE" && $size < 220)
                ||$type eq "TCP_EXPIRED" || $type eq "TCP_REFRESH"
                || $type eq "TCP_SWAPFAIL"){ # fetched from external source
                $line_new = "$hosts{$host} - - [$www_date] \"$htype $url
HTTP/1.0\" 200 $size\n";
                print PROXY_FILE "$line_new";
                $sizeproxied += $size;
                $totalproxied += 1;
        }
        elsif( $type eq "TCP_HIT" || ($type eq "TCP_IFMODSINCE" && $size
>= 220) ){
            if( $size != 0 ) {
                        $line_new = "$hosts{$host} - - [$www_date] \"$htype
$url HTTP/1.0\" 200 $siz
e\n";
                print CACHE_FILE "$line_new";
                $sizecached += $size;
            $totalcached += 1;
           }
        }
}
if( $hash_counter < 50 )
{
        while( $hash_counter <= 50 )
        {
                $hash_counter++;
                printf STDERR '#';
        }
}
printf STDERR "\n";
printf STDERR "\n";

close(LOG_FILE);
close(PROXY_FILE);
close(CACHE_FILE);

printf STDERR "%s lines processed.\n\nTotal hits: Proxy %s requests, Cache
%s requests.\n\n",

&commas($linecount),&commas($totalproxied),&commas($totalcached);
printf STDERR "Total sizes: Proxy %s bytes, Cache %s bytes\n\n",
                                  &commas($sizeproxied),&commas($sizecached);
if( ($totalproxied != 0 || $totalcached != 0) && ($sizeproxied!=0 ||
$sizecached!=0)){
printf STDERR "Hit rates: %s%% requests, %s%% bytes\n\n",
               &commas( ($totalcached/($totalproxied+$totalcached)) * 100),
               &commas( ($sizecached/($sizeproxied+$sizecached)) * 100);
}

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);
$year += 1900;
$time_now = "$hour:$min:$sec on $mday $longmonths{$mon} $year";

printf STDERR
"==============================================================\n";
printf STDERR "native2common.pl finished on $logfile at $time_now.\n";
printf STDERR
"==============================================================\n";

exit(0);

sub commas {
        local($_)=@_;
        $_ = sprintf "%ld", $_;
        1 while s/(.*\w)(\w\w\w)/$1,$2/;
        $_;
}
------------( end native2common.pl )-----------

Cheers,
Marty.
-------------------------------------------------------------------------
Martin Gleeson Webmeister | http://www.unimelb.edu.au/%7Egleeson/
Information Technology Services | Email : gleeson@unimelb.edu.au
The University of Melbourne, Oz. | Opinions : Mine, all mine.
      "I hate quotations." -- Ralph Waldo Emerson, Journals (1843)
-------------------------------------------------------------------------
Received on Mon Aug 19 1996 - 14:41:18 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:32:49 MST