Re: [squid-users] Calamaris

From: Kirk Schneider <kschneider@dont-contact.us>
Date: Mon, 01 Mar 2004 12:14:50 -0600

Endre,

You also might try this filter script I've written to cleanup
the logs before passing to calamaris, it removes some characters
that cause calamaris problems.

#!/usr/local/bin/gawk -f
{
         if ( $0 ~ /[^\x20-\x7E]/ ) {
                 gsub( /\x20\x0C/, "" )
                 gsub( /[\x00-\x1F]/, "" )
                 gsub( /[\x7F-\xFF]/, "" )
         }

         if ( $5 < 0 ) {
                 $5 *= -1
         }

         if ( $2 !~ /[^0-9]/ && $5 !~ /[^0-9]/ && ( $11 == "ALLOW" || $11 == "DENY" ) ) {
                 print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10
         }
         else {
                 next
         }
}

-- 
Kirk Schneider                          972-952-4645 (work)
Raytheon Corporate IT Security          214-912-8679 (cell)
kschneider@raytheon.com                 888-431-7621 (pager)
"If you think the problem is bad now just wait until we've solved it."
-------- Original Message --------
Subject: [squid-users] Calamaris
Date: Mon, 1 Mar 2004 17:43:52 +0100
From: Endre Szekely-Bencedi <Endre.Szekely-Bencedi@hu-tcs.com>
To: squid-users@squid-cache.org
Hello List,
I have a problem with Calamaris (v2.58).
I am using squid 2.5stable3, compiled from sources, with SmartFilter
plugin.
As far as I know, I have to use the squid-extended input type for this. But
this will give some errors:
[root@localhost logs]# date;cat test.log | /usr/local/squid/bin/calamaris
-f squid-extended -F html > /var/www/html/calamaris2.html;date
Mon Mar  1 17:44:08 CET 2004
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Split loop at (eval 1) line 20, <> line 369578.
Mon Mar  1 17:48:05 CET 2004
[root@localhost logs]#
Generated log shows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html;
charset=iso-8859-1"></HEAD>
<BODY></BODY></HTML>
Which is an empty page.
A sample from the logfile:
1077780471.441     93 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.466     64 3.227.65.74 TCP_MISS/200 1722 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.479     72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.508     59 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.699     73 3.227.65.74 TCP_MISS/200 1585 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.713     83 3.227.65.74 TCP_MISS/200 1607 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.726     86 3.227.65.74 TCP_MISS/200 1589 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.885    256 3.227.65.74 TCP_MISS/200 726 GET
http://as.fotexnet.hu/adserver.ads/153/0///937480 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.212    229 3.227.65.74 TCP_MISS/200 23713 GET
http://index.hu/ad/lipton/banner1_120x240.swf? -
DEFAULT_PARENT/10.20.20.254 applicat
ion/x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.298     72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.388    279 3.227.65.74 TCP_MISS/200 17697 GET
http://index.hu/ad/microsoft_wss.swf? - DEFAULT_PARENT/10.20.20.254
application/x-sho
ckwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.439    106 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.458     47 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.480    368 3.227.65.74 TCP_MISS/200 4292 GET
http://as.fotexnet.hu/adserver.ads/196/0///27236 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.643    162 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.646    144 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.673    487 3.227.65.74 TCP_MISS/200 10319 GET
http://as.fotexnet.hu/adserver.ads/200/0///378158 -
DEFAULT_PARENT/10.20.20.254 text/
html text/html ALLOW
1077780473.799    280 3.227.65.74 TCP_MISS/200 26216 GET
http://index.hu/ad/teluzoallo_120x240.swf? - DEFAULT_PARENT/10.20.20.254
application/
x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.819    122 3.227.65.74 TCP_MISS/200 216 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.824    124 3.227.65.74 TCP_MISS/200 355 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.842    136 3.227.65.74 TCP_MISS/200 1603 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780473.846     47 3.227.65.74 TCP_MISS/200 353 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
Am I doing something wrong?
Thanks,
Endre.
Received on Mon Mar 01 2004 - 11:48:28 MST

This archive was generated by hypermail pre-2.1.9 : Thu Apr 01 2004 - 12:00:01 MST