Squid native log file analyzer (comments and suggestions needed)

From: Lars Slettjord <Lars.Slettjord@dont-contact.us>
Date: Wed, 05 Feb 1997 19:50:48 +0100


This is a beta version of a log file analyzer for Squid 1.1.X. It is
written in perl5. It analyzes about 500 lines/sec on a 166 MHz pentium
processor running FreeBSD. I have tried to minimalize the memory used
by the program, and it will also save data (after each day) during the
run This is very handy when analyzing large log files.

This package is written for UNINETT as a part of the Desire
project. It will analyze the log file for parameters we are interested
in. This may very well be what you want to know too, or almost what
you are after. I'd like to have comments about this script, and I will
try to include your suggestions as far as our project permits. I will
be very busy for the next two days, but I'll try to answer any
questions about this package in the few spare moments I hope to get.

The script analyzes the log file on a daily basis. The result will be
the same if you break the log file for one or many day(s) in several
pieces, or if you run it on the whole log file.

The included archive contains these 5 files:

        Describes how the different data are stored, and what keys I use.
        A small library to load, save and print perl Hashes.
        A small library to load, save and print perl Hashes of Hashes
        of Hashes.
        A very small program for displaying Hashes of Hashes of Hashes.
        The program which analyzes native format access log files from

The h1.pl and h2.pl may be used to easily read the analyzed data into
other applications, and demonstrates how to traverse the data
structures used. The dispH3.pl program can be used to look at the
extracted data.

Please check if you agree on my way of counting hits, misses, ims,
errors, deny and refresh. The code is the documentation (at the
moment), and the lists near the start of squidstat-1.4b0.pl describes
which logtags I count as what.

This version of the program will analyze Squid native access log files
with no regard for the configuration of the server. By writing this I
realized that I have to read the sibling/parent configuration of the
server if I am going to get the numbers I need if I want to see how a
parent/sibling performs. The numbers I get now will tell me the
percentage of hits a sibling/parent serves compared to total accesses
to the whole server. I want to see the percentage of hits compared to
the total of accesses that are actually sent to the sibling/parent. To
do this I have to read the configuration file for Squid, and check
what domains the server will query a sibling/parent for. And these
values may change and thereby make it impossible to analyze old logs.

I have not made any tools for visualizing these data yet. I have an
unfinished program that makes LARGE HTML tables of these data, but a
graphical presentation would be far superior. If anyone wants to write
something that visualizes these numbers, please go ahead. :-)

| Lars Slettjord             | EMAIL: larss@cc.uit.no
| University Computer Centre | URL  : http://www.cc.uit.no/~larss/
| University of Tromsoe      | PHONE: +47 77644115
| N-9037 Tromsoe             | FAX  : +47 77644100

Received on Wed Feb 05 1997 - 11:32:52 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:34:23 MST