Re: [squid-users] Squid 3.0 Stable1 with MySql Logging

From: Marcello Romani <mromani@dont-contact.us>
Date: Mon, 03 Mar 2008 08:50:27 +0100

Marcello Romani ha scritto:

[...]

Ok, here's my first implementation of a mysql logfile daemon in Perl.

Any comments are welcome.

Due to list security policy I cannot attach the files, so I copy-n-paste
them here.

The first file is the actual Perl daemon.
The second file is a sql script that creates the table and some views
that extract some useful data from the table.
The third file is the html documentation of the daemon, produced with
pod2html from the Perl script itself.

############# BEGIN logfile-daemon_mysql.pl ################
#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use English qw( -no_match_vars );

# utility routine to print messages on stderr (so they appear in cache log)
# without using warn, which would clutter the log with source line numbers
sub log_info {
     my $msg = shift;
     print STDERR "$msg\n";
}

# the first argument to this script is the log file path
my $log_file = shift;

# we use logfile to pass database access information to this script
# sample configuration:
# access_log daemon:/host/database/table/username/password squid
# to let a parmeter unspecified, e.g. the database host, use a double slash:
# access_log daemon://database/table/username/password squid
my ( $host, $database, $table, $user, $pass ) = $log_file =~ /
\/(.*?)\/(.*?)\/(.*?)\/(.*?)\/(.*?) \z /xms;

if ( !$host ) {
     $host = 'localhost';
     log_info("Database host not specified. Using $host.");
}

if ( !$database ) {
     $database = 'squid_log';
     log_info("Database name not specified. Using $database.");
}

if ( !$table ) {
     $table = 'access_log';
     log_info("Table parameter not specified. Using $table.");
}

if ( !$user ) {
     $user = 'squid';
     log_info("User parameter not specified. Using $user.");
}

if ( !$pass ) {
     log_info('No password specified. Connecting with NO password.');
}

# fields that we should have in the table
my @fields = qw(
     id
     time_since_epoch
     response_time
     client_src_ip_addr
     squid_request_status
     http_status_code
     reply_size
     request_method
     request_url
     username
     squid_hier_status
     server_ip_addr
     mime_type
);

my $dsn;
my $dbh;
my $sth;

# perform db connection
$dsn = "DBI:mysql:database=$database";
eval {
     warn "Connecting... dsn='$dsn', username='$user', password='...'";
     $dbh = DBI->connect($dsn, $user, $pass, { AutoCommit => 1,
RaiseError => 1, PrintError => 1 });
};
if ($EVAL_ERROR) {
     die "Cannot connect to database: $DBI::errstr";
}

# a simple test to assure the specified table exists
eval {
     my $q = 'SELECT ' . join(',',@fields) . " FROM $table LIMIT 1";
     my $sth = $dbh->prepare($q);
     $sth->execute;
};
if ($EVAL_ERROR) {
     die "Cannot SELECT from $table: $DBI::errstr";
}

# for better performance, prepare the statement at startup
eval {
     my $q = "INSERT INTO $table VALUES(NULL" . ',?' x
(scalar(@fields)-1) . ')';
     #$sth = $dbh->prepare("INSERT INTO $table
VALUES(NULL,?,?,?,?,?,?,?,?,?,?,?,?)");
     $sth = $dbh->prepare($q);
};
if ($EVAL_ERROR) {
     die "Error while preparing sql statement: $EVAL_ERROR";
}

# main loop
while (my $line = <>) {
     chomp $line;

     my $cmd = substr($line, 0, 1); # extract command byte
     substr($line, 0, 1, ' '); # replace command byte with a blank

     if ( $cmd eq 'L' ) {
         my @values = split / \s+ /xms, $line;
         shift @values; # the first blank generates an empty
bind value that has to be removed
         eval { # we catch db errors to avoid crashing
squid in case something goes wrong...
             $sth->execute(@values) or die $sth->errstr
         };
         if ( $EVAL_ERROR ) { # leave a trace of the error in the logs
             warn $EVAL_ERROR . " values=(" . join(', ', @values) . ')';
         }
     }

}

$dbh->disconnect();

__END__

=head1 NAME

C<logfile-daemon_mysql.pl> - Write squid access log into a mysql database

=head1 SYNOPSIS

   mysql -u root -p squid_log < logfile_daemon-mysql.sql
   cp logfile_daemon-mysql.pl /path/to/squid/libexec/

then, in squid.conf:

   logformat squid_mysql %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un
%Sh %<A %mt
   access_log daemon:/mysql_host/database/table/username/password
squid_mysql
   logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl

=head1 DESCRIPTION

This module exploits the new logfile daemon support available in squid
2.7 to store the access_log entries in a MySQL database.

=head1 CONFIGURATION

=head2 Squid configuration

=head3 logformat directive

This script expects the following log format (it's the default 'squid'
log format without the two '/' characters):

   logformat squid_mysql %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un
%Sh %<A %mt

=head3 access_log directive

The path to the access log file is used to provide the database
connection parameters.

   access_log daemon:/mysql_host/database/table/username/password
squid_mysql

The 'daemon' prefix is mandatory and tells squid that the logfile_daemon
is to be used instead of the normal logging routines.

The last parameter, 'squid_mysql' in the example, tells squid which log
format to use when writing lines to the log daemon.

=over 4

=item mysql_host

Host where the mysql server is running. If left empty, 'localhost' is
assumed.

=item database

Name of the database where to connect to. If left empty, 'squid_log' is
assumed.

=item table

Name of the database table where log lines are stored. If left empty,
'access_log' is assumed.

=item username

Username to use when connecting to the database. If left empty, 'squid'
is assumed.

=item password

Password to use when connecting to the database. If left empty, no
password is used.

=back

To leave all fields to their default values, you can use a single slash:

   access_log daemon:/ squid_mysql

To specify only the database password, which by default is empty, you
must leave unspecified all the other parameters by using null strings:

   access_log daemon://///password squid_mysql

=head3 logfile_daemon directive

This is the current way of telling squid where the logfile daemon resides.

   logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl

Obviously the script must be copied to the location specified in the
directive.

=head2 Database configuration

Let's call the database 'squid_log' and the log table 'access_log'. The
username and password for the db connection will be both 'squid'.

=head3 Database

Create the database:

   CREATE DATABASE squid_log;

=head3 User

Create the user:

   GRANT INSERT,SELECT ON squid_log.* TO 'squid'@'localhost' IDENTIFIED
BY 'squid';
   FLUSH PRIVILEGES;

=head3 Table

Create the table:

   CREATE TABLE access_log(
     id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
     time_since_epoch DECIMAL(15,3),
     response_time INTEGER,
     client_src_ip_addr CHAR(15),
     squid_request_status VARCHAR(20),
     http_status_code VARCHAR(10),
     reply_size INTEGER,
     request_method VARCHAR(20),
     request_url VARCHAR(1000),
     username VARCHAR(20),
     squid_hier_status VARCHAR(20),
     server_ip_addr CHAR(15),
     mime_type VARCHAR(50)
   );

=head1 VERSION INFORMATION

This document refers to logfile-daemon_mysql.pl script version 0.3.

The script has been developed and tested in the following environment:

=over 4

=item squid-2.7.DEVEL0-20080220

=item mysql 5.0.26

=item perl 5.8.8

=item OpenSUSE 10.2

=back

=head1 DATA EXTRACTION

=head2 Sample queries.

=over 4

=item Clients accessing the cache

   SELECT DISTINCT client_src_ip_addr FROM access_log;

=item Number of request per day

   SELECT
     DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
     COUNT(*) AS num_of_requests
   FROM access_log
   GROUP BY 1
   ORDER BY 1;

=item Request status count

To obtain the raw count of each request status:

   SELECT squid_request_status, COUNT(*) AS n
   FROM access_log
   GROUP BY squid_request_status
   ORDER BY 2 DESC;

To calculate the percentage of each request status:

   SELECT
     squid_request_status,
     (COUNT(*)/(SELECT COUNT(*) FROM access_log)*100) AS percentage
   FROM access_log
   GROUP BY squid_request_status
   ORDER BY 2 DESC;

To distinguish only between HITs and MISSes:

   SELECT
     'hits',
     (SELECT COUNT(*)
     FROM access_log
     WHERE squid_request_status LIKE '%HIT%')
     /
     (SELECT COUNT(*) FROM access_log)*100
     AS percentage
   UNION
   SELECT
     'misses',
     (SELECT COUNT(*)
     FROM access_log
     WHERE squid_request_status LIKE '%MISS%')
     /
     (SELECT COUNT(*) FROM access_log)*100
     AS pecentage;

=item Response time ranges

   SELECT
     '0..500',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time >= 0 AND response_time < 500
   UNION
   SELECT
     '500..1000',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time >= 500 AND response_time < 1000
   UNION
   SELECT
     '1000..2000',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time >= 1000 AND response_time < 2000
   UNION
   SELECT
     '>= 2000',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time >= 2000;

=item Traffic by mime type

   SELECT
     mime_type,
     SUM(reply_size) as total_bytes
   FROM access_log
   GROUP BY mime_type
   ORDER BY 2 DESC;

=item Traffic by client

   SELECT
     client_src_ip_addr,
     SUM(reply_size) AS total_bytes
   FROM access_log
   GROUP BY 1
   ORDER BY 2 DESC;

=back

=head2 Speed issues

The myisam database engine is known to be faster than the innodb one, so
although it doesn't support transactions and referential integrity
enforcement, it might be more appropriate in this scenario. You might
want to append "ENGINE=MYISAM" at the end of the table creation code in
the SQL script provided with this program.

Indexes should be created according to the queries that are more
frequently run. The DDL script only creates an implicit index for the
primary key column.

=head1 TODO

=head2 Table cleanup

This script currently implements only the C<L> (i.e. "append a line to
the log") command, therefore the log lines are never purged from the
table. This approach has an obvious scalability problem.

One solution would be to implement e.g. the "rotate log" command in a
way that would calculate some summary values, put them in a "summary
table" and then delete the lines used to caluclate those values.

Similar cleanup code could obviously be implemented in an external
script and run periodically independently from squid log commands.

=head2 Testing

This script has only been tested in low-volume scenarios. Tests in high
volume environments could reveal performance bottlenecks and bugs.

=head1 AUTHOR

Marcello Romani, marcello.romani@libero.it

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2008 by Marcello Romani

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.

=cut
############# END logfile-daemon_mysql.pl ################

############# BEGIN logfile-daemon_mysql.sql ################

--
-- setup statements
--
--CREATE DATABASE squid_log;
--GRANT INSERT,SELECT ON squid_log.* TO 'squid'@'localhost' IDENTIFIED 
BY 'squid';
--
-- DDL statements
--
USE squid_log;
--
-- This table is based on squid's default 'squid' logformat, with minor 
modifications
-- (the two slashes are removed)
-- original:
-- logformat squid  %ts.%03tu %6tr %>a %Ss/%03Hs %<st %rm %ru %un 
%Sh/%<A %mt
-- modified:
-- logformat squid  %ts.%03tu %6tr %>a %Ss %03Hs %<st %rm %ru %un %Sh 
%<A %mt
-- changes:                               ^                          ^
-- 
CREATE TABLE access_log(
     id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
     time_since_epoch     DECIMAL(15,3),
     response_time        INTEGER,
     client_src_ip_addr   CHAR(15),
     squid_request_status VARCHAR(20),
     http_status_code     VARCHAR(10),
     reply_size           INTEGER,
     request_method       VARCHAR(20),
     request_url          VARCHAR(1000),
     username             VARCHAR(20),
     squid_hier_status    VARCHAR(20),
     server_ip_addr       CHAR(15),
     mime_type            VARCHAR(50)
);
--
-- Sample views
--
-- ip address of hosts accessing the cache
CREATE VIEW cache_clients AS
     SELECT DISTINCT client_src_ip_addr FROM access_log ORDER BY 1;
-- number of requests per day
CREATE VIEW requests_per_day AS
     SELECT
         DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
         COUNT(*) AS num_of_requests
     FROM access_log
     GROUP BY 1
     ORDER BY 1;
-- percentage of each request status
CREATE VIEW requests_status_perc AS
     SELECT
         squid_request_status,
         (COUNT(*)/(SELECT COUNT(*) FROM access_log)*100) AS percentage
     FROM access_log
     GROUP BY squid_request_status
     ORDER BY 2 DESC;
-- request hits and misses, in percentage
CREATE VIEW hits_misses_perc AS
     SELECT
         'hits',
         (SELECT COUNT(*) FROM access_log WHERE squid_request_status 
LIKE '%HIT%')
         /
         (SELECT COUNT(*) FROM access_log)*100
         AS percentage
UNION
     SELECT
         'misses',
         (SELECT COUNT(*) FROM access_log WHERE squid_request_status 
LIKE '%MISS%')
         /
         (SELECT COUNT(*) FROM access_log)*100
         AS pecentage;
-- response times
CREATE VIEW response_time_ranges AS
     SELECT
         '0..500',
         COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
     FROM access_log
     WHERE response_time >= 0 AND response_time < 500
UNION
     SELECT
         '500..1000',
         COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
     FROM access_log
     WHERE response_time >= 500 AND response_time < 1000
UNION
     SELECT
         '1000..2000',
         COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
     FROM access_log
     WHERE response_time >= 1000 AND response_time < 2000
UNION
     SELECT
         '>= 2000',
         COUNT(*) / (SELECT COUNT(*) FROM access_log)*100 AS percentage
     FROM access_log
     WHERE response_time >= 2000;
-- traffic by mime type
CREATE VIEW traffic_by_mime_type AS
     SELECT
         mime_type,
         SUM(reply_size) as total_bytes
     FROM access_log
     GROUP BY mime_type
     ORDER BY 2 DESC;
############# END logfile-daemon_mysql.sql ################
############# BEGIN logfile-daemon_mysql.html ################
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>C&lt;logfile-daemon_mysql.pl&gt; - Write squid access log into a 
mysql database</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:feedback@suse.de" />
</head>
<body style="background-color: white">
<p><a name="__index__"></a></p>
<!-- INDEX BEGIN -->
<ul>
	<li>NAME</li>
	<li>SYNOPSIS</li>
	<li>DESCRIPTION</li>
	<li>CONFIGURATION</li>
	<ul>
		<li>Squid configuration</li>
		<ul>
			<li>logformat directive</li>
			<li>access_log directive</li>
			<li>logfile_daemon directive</li>
		</ul>
		<li>Database configuration</li>
		<ul>
			<li>Database</li>
			<li>User</li>
			<li>Table</li>
		</ul>
	</ul>
	<li>VERSION INFORMATION</li>
	<li>DATA EXTRACTION</li>
	<ul>
		<li>Sample queries.</li>
		<li>Speed issues</li>
	</ul>
	<li>TODO</li>
	<ul>
		<li>Table cleanup</li>
		<li>Testing</li>
	</ul>
	<li>AUTHOR</li>
	<li>COPYRIGHT AND LICENSE</li>
</ul>
<!-- INDEX END -->
<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p><code>logfile-daemon_mysql.pl</code> - Write squid access log into a 
mysql database</p>
<p>
</p>
<hr />
<h1><a name="synopsis">SYNOPSIS</a></h1>
<pre>
   mysql -u root -p squid_log &lt; logfile_daemon-mysql.sql
   cp logfile_daemon-mysql.pl /path/to/squid/libexec/</pre>
<p>then, in squid.conf:</p>
<pre>
   logformat squid_mysql  %ts.%03tu %6tr %&gt;a %Ss %03Hs %&lt;st %rm 
%ru %un %Sh %&lt;A %mt
   access_log daemon:/mysql_host/database/table/username/password 
squid_mysql
   logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl</pre>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>This module exploits the new logfile daemon support available in 
squid 2.7 to store the access_log entries in a MySQL database.</p>
<p>
</p>
<hr />
<h1><a name="configuration">CONFIGURATION</a></h1>
<p>
</p>
<h2><a name="squid_configuration">Squid configuration</a></h2>
<p>
</p>
<h3><a name="logformat_directive">logformat directive</a></h3>
<p>This script expects the following log format (it's the default 
'squid' log format without the two '/' characters):</p>
<pre>
   logformat squid_mysql  %ts.%03tu %6tr %&gt;a %Ss %03Hs %&lt;st %rm 
%ru %un %Sh %&lt;A %mt</pre>
<p>
</p>
<h3><a name="access_log_directive">access_log directive</a></h3>
<p>The path to the access log file is used to provide the database 
connection parameters.</p>
<pre>
   access_log daemon:/mysql_host/database/table/username/password 
squid_mysql</pre>
<p>The 'daemon' prefix is mandatory and tells squid that the 
logfile_daemon is to be used instead of the normal logging routines.</p>
<p>The last parameter, 'squid_mysql' in the example, tells squid which 
log format to use when writing lines to the log daemon.</p>
<dl>
<dt><strong><a name="item_mysql_host">mysql_host</a></strong>
<dd>
<p>Host where the mysql server is running. If left empty, 'localhost' is 
assumed.</p>
</dd>
</li>
<dt><strong><a name="item_database">database</a></strong>
<dd>
<p>Name of the database where to connect to. If left empty, 'squid_log' 
is assumed.</p>
</dd>
</li>
<dt><strong><a name="item_table">table</a></strong>
<dd>
<p>Name of the database table where log lines are stored. If left empty, 
'access_log' is assumed.</p>
</dd>
</li>
<dt><strong><a name="item_username">username</a></strong>
<dd>
<p>Username to use when connecting to the database. If left empty, 
'squid' is assumed.</p>
</dd>
</li>
<dt><strong><a name="item_password">password</a></strong>
<dd>
<p>Password to use when connecting to the database. If left empty, no 
password is used.</p>
</dd>
</li>
</dl>
<p>To leave all fields to their default values, you can use a single 
slash:</p>
<pre>
   access_log daemon:/ squid_mysql</pre>
<p>To specify only the database password, which by default is empty, you 
must leave unspecified all the other parameters by using null strings:</p>
<pre>
   access_log daemon://///password squid_mysql</pre>
<p>
</p>
<h3><a name="logfile_daemon_directive">logfile_daemon directive</a></h3>
<p>This is the current way of telling squid where the logfile daemon 
resides.</p>
<pre>
   logfile_daemon /path/to/squid/libexec/logfile-daemon_mysql.pl</pre>
<p>Obviously the script must be copied to the location specified in the 
directive.</p>
<p>
</p>
<h2><a name="database_configuration">Database configuration</a></h2>
<p>Let's call the database 'squid_log' and the log table 'access_log'. 
The username and password for the db connection will be both 'squid'.</p>
<p>
</p>
<h3><a name="database">Database</a></h3>
<p>Create the database:</p>
<pre>
   CREATE DATABASE squid_log;</pre>
<p>
</p>
<h3><a name="user">User</a></h3>
<p>Create the user:</p>
<pre>
   GRANT INSERT,SELECT ON squid_log.* TO 'squid'@'localhost' IDENTIFIED 
BY 'squid';
   FLUSH PRIVILEGES;</pre>
<p>
</p>
<h3><a name="table">Table</a></h3>
<p>Create the table:</p>
<pre>
   CREATE TABLE access_log(
     id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
     time_since_epoch     DECIMAL(15,3),
     response_time        INTEGER,
     client_src_ip_addr   CHAR(15),
     squid_request_status VARCHAR(20),
     http_status_code     VARCHAR(10),
     reply_size           INTEGER,
     request_method       VARCHAR(20),
     request_url          VARCHAR(1000),
     username             VARCHAR(20),
     squid_hier_status    VARCHAR(20),
     server_ip_addr       CHAR(15),
     mime_type            VARCHAR(50)
   );</pre>
<p>
</p>
<hr />
<h1><a name="version_information">VERSION INFORMATION</a></h1>
<p>This document refers to logfile-daemon_mysql.pl script version 0.3.</p>
<p>The script has been developed and tested in the following 
environment:</p>
<dl>
<dt><strong><a 
name="item_squid_2d2_2e7_2edevel0_2d20080220">squid-2.7.DEVEL0-20080220</a></strong>
<dt><strong><a name="item_mysql_5_2e0_2e26">mysql 5.0.26</a></strong>
<dt><strong><a name="item_perl_5_2e8_2e8">perl 5.8.8</a></strong>
<dt><strong><a name="item_opensuse_10_2e2">OpenSUSE 10.2</a></strong>
</dl>
<p>
</p>
<hr />
<h1><a name="data_extraction">DATA EXTRACTION</a></h1>
<p>
</p>
<h2><a name="sample_queries_">Sample queries.</a></h2>
<dl>
<dt><strong><a name="item_clients_accessing_the_cache">Clients accessing 
the cache</a></strong>
<dd>
<pre>
   SELECT DISTINCT client_src_ip_addr FROM access_log;</pre>
</dd>
<dt><strong><a name="item_number_of_request_per_day">Number of request 
per day</a></strong>
<dd>
<pre>
   SELECT
     DATE(FROM_UNIXTIME(time_since_epoch)) AS date_day,
     COUNT(*) AS num_of_requests
   FROM access_log
   GROUP BY 1
   ORDER BY 1;</pre>
</dd>
<dt><strong><a name="item_request_status_count">Request status 
count</a></strong>
<dd>
<p>To obtain the raw count of each request status:</p>
</dd>
<dd>
<pre>
   SELECT squid_request_status, COUNT(*) AS n
   FROM access_log
   GROUP BY squid_request_status
   ORDER BY 2 DESC;</pre>
</dd>
<dd>
<p>To calculate the percentage of each request status:</p>
</dd>
<dd>
<pre>
   SELECT
     squid_request_status,
     (COUNT(*)/(SELECT COUNT(*) FROM access_log)*100) AS percentage
   FROM access_log
   GROUP BY squid_request_status
   ORDER BY 2 DESC;</pre>
</dd>
<dd>
<p>To distinguish only between HITs and MISSes:</p>
</dd>
<dd>
<pre>
   SELECT
     'hits',
     (SELECT COUNT(*)
     FROM access_log
     WHERE squid_request_status LIKE '%HIT%')
     /
     (SELECT COUNT(*) FROM access_log)*100
     AS percentage
   UNION
   SELECT
     'misses',
     (SELECT COUNT(*)
     FROM access_log
     WHERE squid_request_status LIKE '%MISS%')
     /
     (SELECT COUNT(*) FROM access_log)*100
     AS pecentage;</pre>
</dd>
</li>
<dt><strong><a name="item_response_time_ranges">Response time 
ranges</a></strong>
<dd>
<pre>
   SELECT
     '0..500',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time &gt;= 0 AND response_time &lt; 500
   UNION
   SELECT
     '500..1000',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time &gt;= 500 AND response_time &lt; 1000
   UNION
   SELECT
     '1000..2000',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time &gt;= 1000 AND response_time &lt; 2000
   UNION
   SELECT
     '&gt;= 2000',
     COUNT(*)/(SELECT COUNT(*) FROM access_log)*100 AS percentage
   FROM access_log
   WHERE response_time &gt;= 2000;</pre>
</dd>
<dt><strong><a name="item_traffic_by_mime_type">Traffic by mime 
type</a></strong>
<dd>
<pre>
   SELECT
     mime_type,
     SUM(reply_size) as total_bytes
   FROM access_log
   GROUP BY mime_type
   ORDER BY 2 DESC;</pre>
</dd>
<dt><strong><a name="item_traffic_by_client">Traffic by client</a></strong>
<dd>
<pre>
   SELECT
     client_src_ip_addr,
     SUM(reply_size) AS total_bytes
   FROM access_log
   GROUP BY 1
   ORDER BY 2 DESC;</pre>
</dd>
</dl>
<p>
</p>
<h2><a name="speed_issues">Speed issues</a></h2>
<p>The myisam database engine is known to be faster than the innodb one, 
so although it doesn't support transactions and referential integrity 
enforcement, it might be more appropriate in this scenario. You might 
want to append ``ENGINE=MYISAM'' at the end of the table creation code 
in the SQL script provided with this program.</p>
<p>Indexes should be created according to the queries that are more 
frequently run. The DDL script only creates an implicit index for the 
primary key column.</p>
<p>
</p>
<hr />
<h1><a name="todo">TODO</a></h1>
<p>
</p>
<h2><a name="table_cleanup">Table cleanup</a></h2>
<p>This script currently implements only the <code>L</code> (i.e. 
``append a line to the log'') command, therefore the log lines are never 
purged from the table. This approach has an obvious scalability problem.</p>
<p>One solution would be to implement e.g. the ``rotate log'' command in 
a way that would calculate some summary values, put them in a ``summary 
table'' and then delete the lines used to caluclate those values.</p>
<p>Similar cleanup code could obviously be implemented in an external 
script and run periodically independently from squid log commands.</p>
<p>
</p>
<h2><a name="testing">Testing</a></h2>
<p>This script has only been tested in low-volume scenarios. Tests in 
high volume environments could reveal performance bottlenecks and bugs.</p>
<p>
</p>
<hr />
<h1><a name="author">AUTHOR</a></h1>
<p>Marcello Romani, <a 
href="mailto:marcello.romani@libero.it">marcello.romani@libero.it</a></p>
<p>
</p>
<hr />
<h1><a name="copyright_and_license">COPYRIGHT AND LICENSE</a></h1>
<p>Copyright (C) 2008 by Marcello Romani</p>
<p>This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.8 or,
at your option, any later version of Perl 5 you may have available.</p>
</body>
</html>
############# END logfile-daemon_mysql.html ################
-- 
Marcello Romani
Responsabile IT
Ottotecnica s.r.l.
http://www.ottotecnica.com
Received on Mon Mar 03 2008 - 00:50:39 MST

This archive was generated by hypermail pre-2.1.9 : Tue Apr 01 2008 - 13:00:04 MDT