Re: hostnames in log?

From: Martin Gleeson <gleeson@dont-contact.us>
Date: Mon, 1 Jul 1996 07:24:33 +1000

>> Been using Squid b7 on a trial basis and I'm wondering if there is a code
>> hack to get hostnames into our access log rather than the IP no's. We're
>> only allowing access from a limited range of IP's for our downstream
>> sites.
>> We do a lot of cache logfile processing from our exising CERN files for
>> billing reasons and hostnames would be nice...
>> i.e foo (not foo.manawatu.gen.nz) instead of 202.36.148.70
>
>To get FQDNs in your logfiles you can define "LOG_FQDN" (stat.c)
>and recompile squid.
>But this slows squid down.

We use the following perl script to process squid's log file into CERN-style
cache hit and miss (cache and proxy) log files. It also converts IP numbers
into hostnames, We roll over the log files each night, convert them, and
consolidate them once a week to run a stats program on them, which is
pwebstats, <URL:http://www.unimelb.edu.au/pwebstats/pwebstats.html>,
(sorry about the shameless plug :), but I don't know of another web stats
program that does proxy cache stats.

Anyhow, here are the details:
Notes: 1) The delay between the USR1 signal and the running of the daily
          script is because it can take a few minutes for the rollover to
          happen, depending on how busy squid is. We are doing about 1.3
          million per week, so squid can still be fairly busy at midnight.
       2) You'll see a few references to 'pixel' - pixel is the name of the
          machine that this stuff runs on (see 'The cat who walks through
          walls' - Robert A. Heinlen :-).

Crontab says:
 0 0 * * * kill -USR1 `cat /servers/http/squid/logs/squid.pid`
30 0 * * * /servers/http/squid/bin/daily-log-rotate
 0 2 * * 0 /servers/http/squid/bin/weekly-log-rotate

--------------------------
/servers/http/squid/bin/daily-log-rotate:

#!/bin/sh

D=`date +%Y-%m-%d`
NEWLOGNAME="pixel-squid-log.$D"
NEWERRORLOGNAME="pixel-squid-errors.$D"
NEWHIERARCHYLOGNAME="pixel-squid-hierarchy.$D"

cd /servers/http/squid/logs

# count the number of ICP_* lines in the logfile
(echo -n $D " " ;fgrep ICP access.log.0 | wc -l ) >> access.log.icp.count

# convert the log file into cache & proxy logs, and append them to
# proxy.convert and cache.convert
/servers/http/squid/bin/squid2common.pl access.log.0 ### <- conversion program

mv access.log.0 $NEWLOGNAME
/usr/local/bin/gzip -9 $NEWLOGNAME

mv cache.log.0 $NEWERRORLOGNAME
/usr/local/bin/gzip -9 $NEWERRORLOGNAME

mv hierarchy.log.0 $NEWHIERARCHYLOGNAME
/usr/local/bin/gzip -9 $NEWHIERARCHYLOGNAME
--------------------------

/servers/http/squid/bin/squid2common.pl:

#!/usr/local/bin/perl
#
# Martin Gleeson, Daniel O'Callaghan, June 1996
#
# (c) Copyright, The University of Melbourne, 1996
#
#
$logfile = shift ( @ARGV );
$address_type=2;
$date_now=`date +"%I:%M %p, %A %B %e %Y"`;chop($date_now);
printf STDERR "convert-harvest-log.pl started on $logfile at $date_now.\n";
printf STDERR "==========================================================\n";

open(COUNT,"/bin/wc -l $logfile |");
while( <COUNT> ){ chop; ($line_count) = /^\s+(\d+)\s+\S+$/; }
close(COUNT);
printf STDERR "The logfile has $line_count entries.\n";
printf STDERR "Processing... [ \# = 500 log entries ]\n";

$counter=$linecount=0;

open(LOG_FILE,"$logfile");
open(PROXY_FILE,">> proxy.convert");
open(CACHE_FILE,">> cache.convert");

while(<LOG_FILE>){
        $linecount++;
        $counter++;
        if( $counter >= 500 )
        {
                $counter = 0;
                printf STDERR "\#";
        }
# split the input line into its various components

        $line=$_;
        ( $host, $rfc931, $user, $www_date, $request, $type, $size)
        = /^(\S+) (\S+) (\S+) \[(.+)\] \"(.*)\" (\S+) (\S+)\s/;

        # convert IP into hostname
        if( $hosts{$host} ) { $name = $hosts{$host};}
        else
        {
                if($host =~ /\d\d\.\d\d/)
                {
                        @address = split(/\./,$host);
                        $addpacked = pack('C4',@address);
                        ($name,$aliases,$addrtype,$length,@addrs) =
gethostbyaddr($addpacked,$address_type);
                }
                if ( $name eq "") { $name = $host };
                $name = "\L$name";
                $hosts{$host} = $name;
        }
        if( $type eq "TCP_DENIED" ){ # fetched from external source
                $line_new = "$hosts{$host} $rfc931 $user [$www_date]
\"$request HTTP/1.0\" 401 $size\n";
                print PROXY_FILE "$line_new";
                $sizeproxied += $size;
                $totalproxied += 1;
        }
        if( $type eq "TCP_MISS" || ($type eq "TCP_IFMODSINCE" && $size < 220)
                ||$type eq "TCP_EXPIRED" || $type eq "TCP_REFRESH"
                || $type eq "TCP_SWAPFAIL"){ # fetched from external source
                $line_new = "$hosts{$host} $rfc931 $user [$www_date]
\"$request HTTP/1.0\" 200 $size\n";
                print PROXY_FILE "$line_new";
                $sizeproxied += $size;
                $totalproxied += 1;
        }
        elsif( $type eq "TCP_HIT" || ($type eq "TCP_IFMODSINCE" && $size
>= 220) ){
            if( $size != 0 ) {
                        $line_new = "$hosts{$host} $rfc931 $user
[$www_date] \"$request HTTP/1.0\" 200 $size\n";
                print CACHE_FILE "$line_new";
                $sizecached += $size;
            $totalcached += 1;
           }
        }
}
close(LOG_FILE);
close(PROXY_FILE);
close(CACHE_FILE);

printf STDERR "\n%s lines processed, finding %s proxy requests and %s cache
requests.\n\n",

&commas($linecount),&commas($totalproxied),&commas($totalcached);
printf STDERR "Total sizes: Proxy %s bytes, Cache %s bytes\n\n",
                                  &commas($sizeproxied),&commas($sizecached);
if( ($totalproxied != 0 || $totalcached != 0) && ($sizeproxied!=0 ||
$sizecached!=0)){
printf STDERR "Hit rates: %s%% requests, %s%% bytes\n\n",
               &commas( ($totalcached/($totalproxied+$totalcached)) * 100),
               &commas( ($sizecached/($sizeproxied+$sizecached)) * 100);
}
$date_now=`date +"%I:%M %p, %A %B %e %Y"`;
chop($date_now);
printf STDERR "convert-harvest-log.pl finished on $logfile at $date_now.\n";
printf STDERR "==========================================================\n";

exit(0);

sub commas {
        local($_)=@_;
        $_ = sprintf "%ld", $_;
        1 while s/(.*\w)(\w\w\w)/$1,$2/;
        $_;
}
--------------------------

/servers/http/squid/bin/weekly-log-rotate:

#!/bin/sh

D=`date +%Y-%m-%d`

cd /servers/http/squid/logs

mv cache.convert pixel-cache-log.$D
chmod 664 pixel-cache-log.$D

mv proxy.convert pixel-proxy-log.$D
chmod 664 pixel-proxy-log.$D

--------------------------

Hope this is of assistance.

Cheers,
Marty.
-------------------------------------------------------------------------
Martin Gleeson Webmeister | http://www.unimelb.edu.au/%7Egleeson/
Information Technology Services | Email : gleeson@unimelb.edu.au
The University of Melbourne, Oz. | Opinions : Mine, all mine.
      "I hate quotations." -- Ralph Waldo Emerson, Journals (1843)
-------------------------------------------------------------------------
Received on Sun Jun 30 1996 - 14:28:24 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:32:33 MST