[squid-users] problems writing squid logs into a database

From: Alex Sharaz <A.Sharaz@dont-contact.us>
Date: Mon, 21 Jun 2004 12:28:25 +0100

Hi all,

i've been working on a perl script to write the contents of access.log into
a DB2 database. The script was originally one by Eugene Chernyshev
[
      This is a part of Squid2MySQL accounting system.
      CopyRight 2001 by Eugene V. Chernyshev mailto:bicc@mail.ru
]

but modified a bit.

The problem I've got is that I'm trying to do this in real time, and while
the script certainly works,
it doesn't seem to have as many input lines as I'd expect. I.e. the access
log is filing up faster than the database is. It looks as if the perl
script isn't getting as many lines fed to it as the access.log file.

I'm using the tail command to take access.log output and pipe it into my
prog. The additional options let it track access log file rotation (which
happens every day at midnight)

If anyone has an idea why the perl script isn't processing

the tail command is

#!/bin/sh
#
# $Id: RunCache.in,v 1.9.8.1 2003/01/06 20:52:55 wessels Exp $
. /home/db2inst3/sqllib/db2profile

        echo "Running rtsquid2db2 daemon">>/logs/squid.out 2>&1
        tail --retry --lines=1 --follow=name --max-unchanged-stats=2
/logs/access.log 2>>/logs/squid2db2.log|/usr/local/squid/bin/rtsquid2db2
150.237.47.4 >>/logs/squid2db2.log 2>>/logs/squid2db2.err &

and the appropriate part of the perl program is:-

#!/usr/bin/perl
# Squid2Db2
#
# Version: 0.6
#
#[ Taken from
# This is a part of Squid2MySQL accounting system.
# CopyRight 2001 by Eugene V. Chernyshev mailto:bicc@mail.ru
#]

# Changes
# 10/12/03 - add database field for the IP address of the system
# - don't save head queries either
#
# 12/12/03 - We want to save everything in the database and not just the
# userid authenticated stuff, If weneed to filter out info
# we can do it later on. Removed the chreck to see if the
username
# is "-"
# 15/12/03 - Date is out by 7 hours - the calculations after gmtime affs 7
hours
# onto the time - removed the adjustment
# Added time in milliseconds field to database
# 18/12/03 - open ;ldap conection to radius server at start of prog and not
# in function routine
#
# 19/01/04 - dont update the rdn table if the userid is "-"
# disable the ldap lookup for now
# remove the date and time entries as we have the time in msec
# 26/01/04 - Added finish statement to close down the prepare statement
# 26/01/04 - Added finish statement to close down the prepare statement
# 27/01/04 - Real time version of squid2db2
# setup
# 28/01/04 - Test for null URL string. If present fill with a dummy one and
# output input string components.
# 30/01/04 - Check for null username returned from the radius server
#
# 05/03/04 - Added more comprehensive tests to check for null parameters
# being passed over to the prepared statement.
# remove the billing calculation - set to 0
#
# Remove the die statement after the execute. We want to keep
going
# and output the error.
# Generated a subroutine for the output
# 11/03/04 - Ldap lookup from radius server always assumes that the server
is up
# and only talks to one server. Try and get it to try multiple
# servers before it dies and move the connection to the lookup
phase
# 19/03/04 - really need to close the function as well - in the right place.
# Added input parameter for ip address to use as own.
# tidied up logging and removed old commented code
# 18/06/04 - Changed some of the database names
# 21/06/04 - Added username to logging function
# Use local time instead of gmtime to get date right
# added msec parameter to db2 timestamp
# don't need the billing function
#
$db2user="YYYYY"; # user name
$db2pass="XXXXXX"; # user password
$db2serv="XX.YY.XX.BB"; # server
$db2port="MMMM"; # server port
$db2dbas="sqstats"; # database name
$db2tabl="squid.logger"; # database table
$backuplog="/logs/backup.log"; # Backup log if couldn't connect to
# billing setup
$valuepermeg=4; # four roubles per meg
$valueperhour=0; # none per hour, leased line +2880p
# billing method
# 0 for user accounting, 1 for hosts accounting
$usehosts=1;

use DBI;
use DBD::DB2::Constants;
use DBD::DB2;

use Net::LDAP;

$dbh=DBI->connect("DBI:DB2:sqstats",$db2user,$db2pass)||&errorconn;

$sth=$dbh->prepare("INSERT INTO squid.logger(cacheip,tstamp,elapsed,code,\
status,bytes,url,userident,clientip)\
 VALUES(?,?,?,?,?,?,?,?,?)");

open(FBLOGRET,$backuplog);

close(FBLOGRET);

#
# Get the IP address
#
$ourip=$ARGV[0];
print STDERR "our ip address is $ourip\n";

while($inputLine = <STDIN>) {
#
# 0 timestamp.millisec
# 1 duration
# 2 remotehost
# 3 code/status
# 4 bytes
# 5 method
# 6 URL
# 7 username
# 8 peerstatus/peerhost
# 9 type
#
#
# chop; # here we now get a string!
# splitting string onto subdata
    @lines=split(' ',$inputLine);

# timestamp
    $lines[0]=~tr/./ /;
    @_timestamp=split(' ',$lines[0]);
    $cts=$_timestamp[0];
    $msecs=$_timestamp[1];
# @t = gmtime($cts);

# date=
    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime($cts);
    $year=$year+1900;
    $mon=$mon+1;
    if ($mon < 10) {$mon="0$mon";}
    if ($mday < 10) {$mday="0$mday";}
    if ($min < 10) {$min="0$min";}
    if ($hour < 10) {$hour="0$hour";}
    if ($sec < 10) {$sec="0$sec";}
    $db2_ts="$year-$mon-$mday-$hour.$min.$sec.$msecs";
    if ($hour>24) {
        $hour=$hour-24;
        $mday=$mday+1;
    }
    $cdate="$year-$mon-$mday";

# time=
    $ctime="$hour:$min:$sec";

# elapsed=
    $duration=$lines[1];
# host=
    $remotehost=$lines[2];

    @codestatus=split('/',$lines[3]);
    if ($codestatus[0] eq "TCP_HIT") { $_code=0; }
    if ($codestatus[0] eq "TCP_MISS") { $_code=1; }
    if ($codestatus[0] eq "TCP_REFRESH_HIT") { $_code=2; }
    if ($codestatus[0] eq "TCP_REF_FAIL_HIT") { $_code=3; }
    if ($codestatus[0] eq "TCP_REFRESH_MISS") { $_code=4; }
    if ($codestatus[0] eq "TCP_CLIENT_REFRESH_MISS") { $_code=5; }
    if ($codestatus[0] eq "TCP_IMS_HIT") { $_code=6; }
    if ($codestatus[0] eq "TCP_SWAPFILE_MISS") { $_code=7; }
    if ($codestatus[0] eq "TCP_NEGATIVE_HIT") { $_code=8; }
    if ($codestatus[0] eq "TCP_MEM_HIT") { $_code=9; }
    if ($codestatus[0] eq "TCP_DENIED") { $_code=10; }
    if ($codestatus[0] eq "TCP_OFFLINE_HIT") { $_code=11; }
    if ($codestatus[0] eq "UDP_HIT") { $_code=12; }
    if ($codestatus[0] eq "UDP_MISS") { $_code=13; }
    if ($codestatus[0] eq "UDP_DENIED") { $_code=14; }
    if ($codestatus[0] eq "UDP_INVALID") { $_code=15; }
    if ($codestatus[0] eq "UDP_MISS_NOFETCH") { $_code=16; }
    if ($codestatus[0] eq "NONE") { $_code=17; }
    if ($codestatus[0] eq "HEAD") { $_code=18; }
# status=
    $_status=$codestatus[1];
# bytes=
    $objectsize=$lines[4];
# bill=
#
$bill=(($duration/60000)*($valueperhour/60))+(($objectsize/1048576)*$valuepermeg);
# method=
    $fetchmethod=$lines[5];
#
    if (length($lines[6]) eq 0)
    {
       $URLlink="dummy string";
       print "------\n Null URL\n";
       print "@lines\n";
       print "------\n";
    }
    else
    {
#
# here truncate the url back to the last / charaacter
#
       $end=rindex($lines[6],"/");
       $tempUrl=substr($lines[6],0,$end+1);
       if(length($tempUrl) >1022)
       {
          $URLlink=substr($tempUrl,0,1022);
       }
       else {
          $URLlink=$tempUrl;
       }
    }
    $username=$lines[7];

    @peerstatus_host=split('/',$lines[8]);
    $peerstatus=$peerstatus_host[0];
    $peerhost=$peerstatus_host[1];

    $objecttype=$lines[9];

############################################################
##
## If user is on our dialin service get the userid
##
############################################################

if ( (index($remotehost,"150.237.44") ne -1) ||
(index($remotehost,"150.237.19") ne -1) ) {
#
# Get the username logged on via radius server
#
   $username=getUserid($remotehost);
#
# Now check for it being valid
#
  if($username eq "") {
     print "------\n";
     print " null userid returned from radius server, ip=$remotehost\n";
     print "--------\n";
     $usename="-";
  }
}

#
# don;t bother with queries from other web caches on our site they'll log
the queries
#

if (index($remotehost,"150.237.47") eq -1) {
    $sth->execute($ourip,
    $db2_ts,
    $duration,
    $_code,
    $_status,
    $objectsize,
    $URLlink,
    $username,
    $remotehost) || logsquidlog();
}
}
$sth->finish();
$dbh->disconnect();

exit 1;

##############################
sub logsquidlog {

   print STDERR "-------gmtime()------\n";
   print STDERR "Error executing prepared statement\n";
   print STDERR "$sth->errstr()n";
   print STDERR "------\n";
   print STDERR "input line=$inputline\n";
   print STDERR "My IP=$ourip\n";
   print STDERR "Timestamp=$db2_ts\n";
   print STDERR "Duration=$duration\n";
   print STDERR "HTTP code=$_code\n";
   print STDERR "HTTP status=$_status\n";
   print STDERR "Object size=$objectsize\n";
   print STDERR "URL=$URLlink\n";
   print STDERR "client=$remotehost\n";

}
##############################

sub errorconn {
    while(<>) {
        open(FBLOG,">>".$backuplog);
        print FBLOG $_;
        close(FBLOG);
    }
    die "cannot log to DB2 -- data buffered";
}

##############################

-

Sent using Mulberry 3.1.2
Received on Mon Jun 21 2004 - 05:28:54 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Jul 01 2004 - 12:00:03 MDT