Re: UserAgentLog

From: WWW server manager <webadm@dont-contact.us>
Date: Tue, 15 Jul 1997 00:23:13 +0100 (BST)

Martin Svoboda wrote:
> Do you have some script to analyze UserAgentLog?
> Or where can I find them?

I haven't seen any replies, so here's a perl script I threw together recently
(used with perl 5.002) - it's quite short - apologies if including it in the
message annoys anyone.

No fancy output - the aim was to summarise logs while retaining enough detail
that longer-term reports could be generated at a later date. Output lines
starting "#" are notionally comments - the non-comment lines comprise counts
and the full user agent descriptions. Those are followed by blocks of comments
giving counts and percentages for browser ignoring platform details (or
whatever other miscellaneous information is included), just browser type and
version, and then an even more basic summary without version. One special case
- if the miscellaneous information looks like it is saying the browser is (for
example) MSIE pretending to be Netscape =(Mozilla), that information is
retained.

Anyway, here it is - nothing special, but it may be useful as-is or as a
starting point. No guarantees, but it seems to work for me.

It's intended to cope sensibly with multiple input files, not necessarily in
date/time order, and will report bad lines to stderr (unless there are a
ridiculous number of them). Input on stdin, summary output to stdout.

                                John Line

===== start of script
#!/bin/perl

$NBADMAX = 100; # don't warn beyond this many errors

@MONTH = ( 'jan', 'feb', 'mar', 'apr', 'may', 'jun',
           'jul', 'aug', 'sep', 'oct', 'nov', 'dec' );
for ($i=0;$i<=$#MONTH;$i++) { $MONTH{$MONTH[$i]} = $i; }

# Warn about bad input line. (Relies on global variables as input.)
sub WarnBad
{
        if ($nbad == $NBADMAX)
        {
            warn "$0: will not report further bad input.\n";
            $nbad++;
        }
        elsif ($nbad > $NBADMAX) { next; }
        else { warn "$0: bad input: ",$_; }
}

$nbad = 0; # number of bad entry warnings
$ngood = 0; # number of valid log entries

$oldest = ''; # oldest timestamp in input
$newest = ''; # newest timestamp in input

while (<>)
{
    if (/\S+\s+\[(\S+)\s+(\S+)\]\s+"([^"]*)"$/)
    {
        $timestamp = $1;
        $tzoffset = $2;
        $useragent = $3;
        
# Work out if timestamp is newer than newest or older than oldest...
# (Ignore timezone ... aim is to indicate local time range covered by summary)

        if ($timestamp =~ m=^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d:\d\d:\d\d)$=)
        {
            $day = $1; $month = $2; $year = $3; $time = $4;
            $month =~ tr/A-Z/a-z/; # lowercase month for lookup
            $monnum = $MONTH{$month};
            unless (defined($monnum)) { &WarnBad; next; }
 
            $check = sprintf("%04d-%02d-%02d:%s",$year,$monnum+1,$day,$time);
            if ($check lt $oldest || $oldest eq '')
                { $oldest = $check; }
            if ($check gt $newest || $newest eq '')
                { $newest = $check; }
        }
        else { &WarnBad; next; }

        $ua{$useragent}++;
        $ngood++;
    }
    else
    {
        &WarnBad;
    }
}

print "# user agent counts for $oldest to $newest.\n";
print "# $ngood log entries summarised; $nbad bad entries ignored.\n";

unless ($ngood > 0) { exit 1; }

foreach $useragent (sort(keys(%ua)))
{
    printf "%6d %s\n",$ua{$useragent},$useragent;

# Derive summary information for version without platform, and without
# version or platform - *but* retain real browser identity from pseudo-
# Netscape entries.

    $uabase = $useragent;
    $uanover = $uabase;
    $compat = '';

    if ($useragent =~ /^(\S+)(\s*\(|\s+)(\S.*)\s*$/)
    {
        $uabase = $1;
        $uarest = $3;
        if ($uarest =~ /^\((compatible;\s*[^;\)]*).*/) { $compat = " ($1)"; }
    }
    ($uanover = $uabase) =~ s:/.*$::;
    $uabase{"$uabase$compat"} += $ua{$useragent};
    if ($compat ne '') { $compat = ' (compatible)'; }
    $uamin{"$uanover$compat"} += $ua{$useragent};
}

print "#\n#====================\n# Summary without platform details:\n";
foreach $useragent (sort(keys(%uabase)))
{
    printf("# %6d %6.2f%% %s\n",$uabase{$useragent},
        $uabase{$useragent}/$ngood*100,$useragent);
}

print "#\n#====================\n# Summary without platform or version:\n";
foreach $useragent (sort(keys(%uamin)))
{
    printf("# %6d %6.2f%% %s\n",$uamin{$useragent},
        $uamin{$useragent}/$ngood*100,$useragent);
}
print "#\n#====================\n#\n";
===== end of script

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk
Received on Mon Jul 14 1997 - 16:27:05 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:35:45 MST