[squid-users] Force ASCII encoding for access.log fields?

From: Mark DeCheser <lists_at_decheserstudios.com>
Date: Thu, 26 Jun 2014 23:25:54 -0000

Hi everyone --

I recently ran into a strange condition within my Squid access logs which
is making importing the events into a database a bit more difficult.
Note, I am not logging directly to a database, but rather parsing event
into a centralized database via batch/cron.

Events in the access log, mainly which I see are in the ContentType field,
are being recorded as non-ASCII characters. When I attempt to import the
log into PostgreSQL, psql barfs.

Our logfile format in our Squid config looks like this:

logformat my-custom %la,%>a,%10tr,%>st,%<st,%rm,%03>Hs,%mt,%[un,%tg
access_log /var/log/squid/access.log my-custom

Some examples of the events look like this:

[serverIP],[clientIP],
4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36
[serverIP],[clientIP],
4012,564,795,GET,200,text/css,username,20/Jun/2014:00:06:36
[serverIP],[clientIP],,
4191,681,528,GET,200,application/javascript,username,20/Jun/2014:00:06:36

[serverIP],[clientIP],
4322,457,25813,GET,200,application/javascript,eadqnfkx,20/Jun/2014:00:07:21
[serverIP],[clientIP],
627,907,499,GET,200,°Z<90><8f>^X+,username,20/Jun/2014:00:07:21
[serverIP],[clientIP],
627,912,499,GET,200,@Ì^Px°*,username,20/Jun/2014:00:07:21
[serverIP],[clientIP],
627,898,499,GET,200,<90>KPñx+,username,20/Jun/2014:00:07:21
[serverIP],[clientIP],
627,907,497,GET,200,p<91><96>^U,username,20/Jun/2014:00:07:21

I'm running Squid instances on VPSes in a number of different countries.
This particular Squid instance is in Norway, and coincidentally enough
happens to be the only VPS delivered to my organization that wasn't
already set to en_US.UTF-8.

# cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"
# echo $LANG
en_US.UTF-8

It could be a coincidence, but based on the fact that I have instances all
over the world, and only this instance is giving me trouble ... I found it
to be an odd coincidence.

Ideally, if it's possible for Squid to force some kind of hex encoding for
this Content-Type (or really, for any field that receives non ASCII
characters), that would be optimal. There are downstream alternatives
which include finding / replacing non-ASCII chars in a preparation script.
 There's also the option to change the charset of the database itself so
that it doesn't complain about the charset, but these alternatives seem a
little reactionary.

I've reviewed: http://www.squid-cache.org/Doc/config/logformat/
I also tried using iconv unsuccessfully:
http://stackoverflow.com/questions/12999651/how-to-remove-non-utf-8-characters-from-text-file

It essentially leaves me with offset fields/columns in the logfile.

I also reviewed Amos' comment here:
http://www.squid-cache.org/mail-archive/squid-users/201109/0343.html

The difference in my case is that I'm dealing with Content-Type, not URL.
The potential for this condition to be found elsewhere is within the realm
of possibility (username, for example), but presently not an immediate
concern.

The community's advice would be greatly appreciated.

Thanks,
Mark DeCheser
Received on Thu Jun 26 2014 - 23:25:21 MDT

This archive was generated by hypermail 2.2.0 : Fri Jun 27 2014 - 12:00:05 MDT