log analysis scripts

From: Oskar Pearson <oskar@dont-contact.us>
Date: Wed, 14 May 1997 09:00:00 +0200 (GMT)

Hi people

I have been having a problem with the log analysis scripts.

They either:

Take too long to run
        or
Don't work properly with siblings
        or
don't work with large log files (the awk ones especially - out of ram messages)

I wrote some basic ones of my own last night.

We have 3 cache machines, all setup as siblings, one of which has a lot
of disk space, and the others all have a small amount of disk space.

Basically my objectives are:
One page report, including:
        total (megs) downloaded from the cache
        total (megs) that the cache downloaded
        total hits to the cache
        total hits that the cache missed on

Another thing I wanted to do is download the logs every day, so they don't
build up into huge files, and analyse them then. I would then write that
to a log file, so once a week I could print a report, yet only have 1 day's
logs on disk.

ftp://ftp.is.co.za/private/oskar/log-ana-oskar.tar

I am NOT SURE THAT THEY ANALYSE LOGS CORRECTLY! Could someone check this?

Basically my understanding is:
I have cache1, cache2 and cache3.

If I count all "TCP_HITS" for each of the caches, and ignore "SIBLING_HIT"
messages, I will count the total that the caches served from disk.

If I count all TCP_MISS messages, I count the stuff that had to be retrieved
from the original site (remember, if it's a SIBLING_HIT, it gets counted
when I analyse the next log, from the other cache machine?)

The best example is the source, of course.

To run it:

./simple.pl access.log.file title_for_this_cache > log
./report.pl < log

currently the "title_for_this_cache" stuff isn't used, but we can modify
the report script to tell us "give me a report on cache1 only, please".

Can someone check the logic in my script? It would be much appreciated.

        Oskar
Received on Tue Jul 29 2003 - 13:15:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:18 MST