Re: [squid-users] Calamaris

From: Endre Szekely-Bencedi <Endre.Szekely-Bencedi@dont-contact.us>
Date: Tue, 2 Mar 2004 11:28:18 +0100

Erm sorry, I did something wrong that it ran so much.
Actually the script given by Kirk is great, and fast too. :p
Sorry for any inconvenience.
(It was checking some wrong fields, I am not sure why it ran for so long,
but after correcting it
it finished the 60Mb file in like 10-20 seconds).

                                                                                                                                           
                    "Endre Szekely-Bencedi"
                    <Endre.Szekely-Bencedi@h To: Kirk Schneider <kschneider@raytheon.com>
                    u-tcs.com> cc: squid-users@squid-cache.org
                                                    Subject: Re: [squid-users] Calamaris
                    03/02/2004 10:41 AM
                                                                                                                                           
                                                                                                                                           

Great script and advices indeed.
Now the other problem, running this 'cleaning' script takes now an
estimated 40 minutes on my proxy
machine (60Mb of logs, result of 2 very active days, normally a week).
Should I perhaps rotate the squidlogs
then? And run this script daily in a crontab on the freshly rotated log
only? I think this would be a solution,
any other ideas?

Thanks,
Endre.

                    Kirk Schneider

                    <kschneider@ray To: Endre Szekely-Bencedi
<Endre.Szekely-Bencedi@hu-tcs.com>
                    theon.com> cc:
squid-users@squid-cache.org

                                           Subject: Re: [squid-users]
Calamaris
                    03/01/2004

                    07:11 PM

Endre,

I have contacted the Calamaris author before on this and he has
suggested filtering the extra fields that smartfilter adds at
the end.

Now I run this on all my logs before piping to calamaris:

awk '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' access.log |calamaris

--
Kirk Schneider                          972-952-4645 (work)
Raytheon Corporate IT Security          214-912-8679 (cell)
kschneider@raytheon.com                 888-431-7621 (pager)
"If you think the problem is bad now just wait until we've solved it."
-------- Original Message --------
Subject: [squid-users] Calamaris
Date: Mon, 1 Mar 2004 17:43:52 +0100
From: Endre Szekely-Bencedi <Endre.Szekely-Bencedi@hu-tcs.com>
To: squid-users@squid-cache.org
Hello List,
I have a problem with Calamaris (v2.58).
I am using squid 2.5stable3, compiled from sources, with SmartFilter
plugin.
As far as I know, I have to use the squid-extended input type for this. But
this will give some errors:
[root@localhost logs]# date;cat test.log | /usr/local/squid/bin/calamaris
-f squid-extended -F html > /var/www/html/calamaris2.html;date
Mon Mar  1 17:44:08 CET 2004
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Malformed UTF-8 character (unexpected non-continuation byte 0x31,
immediately after start byte 0xf3) in split at (eval 1) line 20, <> line
369578.
Split loop at (eval 1) line 20, <> line 369578.
Mon Mar  1 17:48:05 CET 2004
[root@localhost logs]#
Generated log shows:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html;
charset=iso-8859-1"></HEAD>
<BODY></BODY></HTML>
Which is an empty page.
A sample from the logfile:
1077780471.441     93 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.466     64 3.227.65.74 TCP_MISS/200 1722 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.479     72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.508     59 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780471.699     73 3.227.65.74 TCP_MISS/200 1585 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.713     83 3.227.65.74 TCP_MISS/200 1607 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.726     86 3.227.65.74 TCP_MISS/200 1589 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780471.885    256 3.227.65.74 TCP_MISS/200 726 GET
http://as.fotexnet.hu/adserver.ads/153/0///937480 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.212    229 3.227.65.74 TCP_MISS/200 23713 GET
http://index.hu/ad/lipton/banner1_120x240.swf? -
DEFAULT_PARENT/10.20.20.254 applicat
ion/x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.298     72 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.388    279 3.227.65.74 TCP_MISS/200 17697 GET
http://index.hu/ad/microsoft_wss.swf? - DEFAULT_PARENT/10.20.20.254
application/x-sho
ckwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.439    106 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.458     47 3.227.65.74 TCP_MISS/302 476 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.480    368 3.227.65.74 TCP_MISS/200 4292 GET
http://as.fotexnet.hu/adserver.ads/196/0///27236 -
DEFAULT_PARENT/10.20.20.254 text/ht
ml text/html ALLOW
1077780473.643    162 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.646    144 3.227.65.74 TCP_MISS/302 477 GET
http://sher.index.hu/ad? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Portal
  Sites
1077780473.673    487 3.227.65.74 TCP_MISS/200 10319 GET
http://as.fotexnet.hu/adserver.ads/200/0///378158 -
DEFAULT_PARENT/10.20.20.254 text/
html text/html ALLOW
1077780473.799    280 3.227.65.74 TCP_MISS/200 26216 GET
http://index.hu/ad/teluzoallo_120x240.swf? - DEFAULT_PARENT/10.20.20.254
application/
x-shockwave-flash application/x-shockwave-flash ALLOW Portal Sites
1077780473.819    122 3.227.65.74 TCP_MISS/200 216 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.824    124 3.227.65.74 TCP_MISS/200 355 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
1077780473.842    136 3.227.65.74 TCP_MISS/200 1603 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Port
al Sites
1077780473.846     47 3.227.65.74 TCP_MISS/200 353 GET
http://sher.index.hu/get? - DEFAULT_PARENT/10.20.20.254 text/html text/html
ALLOW Porta
l Sites
Am I doing something wrong?
Thanks,
Endre.
"THIS E-MAIL MESSAGE ALONG WITH ANY ATTACHMENTS IS INTENDED ONLY FOR THE
ADDRESSEE and may contain confidential and privileged information. If the
reader of this message is not the intended recipient, you are notified that
any dissemination, distribution or copy of this communication is strictly
prohibited. If you have received this message by error, please notify us
immediately, return the original mail to the sender and delete the message
from your system."
"THIS E-MAIL MESSAGE ALONG WITH ANY ATTACHMENTS IS INTENDED ONLY FOR THE
ADDRESSEE and may contain confidential and privileged information. If the
reader of this message is not the intended recipient, you are notified that
any dissemination, distribution or copy of this communication is strictly
prohibited. If you have received this message by error, please notify us
immediately, return the original mail to the sender and delete the message
from your system."
Received on Tue Mar 02 2004 - 04:29:05 MST

This archive was generated by hypermail pre-2.1.9 : Thu Apr 01 2004 - 12:00:01 MST