Re: [squid-users] Force ASCII encoding for access.log fields?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 27 Jun 2014 17:55:43 +1200

On 27/06/2014 11:25 a.m., Mark DeCheser wrote:
> Hi everyone --
>
> I recently ran into a strange condition within my Squid access logs which
> is making importing the events into a database a bit more difficult.
> Note, I am not logging directly to a database, but rather parsing event
> into a centralized database via batch/cron.
>
> Events in the access log, mainly which I see are in the ContentType field,
> are being recorded as non-ASCII characters. When I attempt to import the
> log into PostgreSQL, psql barfs.
>
> Our logfile format in our Squid config looks like this:
>
> logformat my-custom %la,%>a,%10tr,%>st,%<st,%rm,%03>Hs,%mt,%[un,%tg
> access_log /var/log/squid/access.log my-custom
>
> Some examples of the events look like this:
>
> [serverIP],[clientIP],
> 4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36

The log format you used does not match this log line. The format produces:

[squid-listening-IP],[clientIP],
4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36

>
> I'm running Squid instances on VPSes in a number of different countries.
> This particular Squid instance is in Norway, and coincidentally enough
> happens to be the only VPS delivered to my organization that wasn't
> already set to en_US.UTF-8.
>
> # cat /etc/sysconfig/i18n
> LANG="en_US.UTF-8"
> SYSFONT="latarcyrheb-sun16"
> # echo $LANG
> en_US.UTF-8
>
> It could be a coincidence, but based on the fact that I have instances all
> over the world, and only this instance is giving me trouble ... I found it
> to be an odd coincidence.
>
> Ideally, if it's possible for Squid to force some kind of hex encoding for
> this Content-Type (or really, for any field that receives non ASCII
> characters), that would be optimal. There are downstream alternatives
> which include finding / replacing non-ASCII chars in a preparation script.
> There's also the option to change the charset of the database itself so
> that it doesn't complain about the charset, but these alternatives seem a
> little reactionary.
>
> I've reviewed: http://www.squid-cache.org/Doc/config/logformat/
> I also tried using iconv unsuccessfully:
> http://stackoverflow.com/questions/12999651/how-to-remove-non-utf-8-characters-from-text-file
>
> It essentially leaves me with offset fields/columns in the logfile.
>
> I also reviewed Amos' comment here:
> http://www.squid-cache.org/mail-archive/squid-users/201109/0343.html
>
> The difference in my case is that I'm dealing with Content-Type, not URL.

URL-encoding is the %xx character encoding, it can be (and is) applied
to anything which can legitimately contain non-ASCII characters or ASCII
special characters. Content-Type header is not one of those places.

You can use the '#' format modifier to URL-encode that %mt field
explicitly. Like so: %#mt

If you will share the exact Squid version you are using I would also
like to check the code to see if the mt code is being correctly setup,
that log entry looks a bit like random memory being displayed as if it
were text.

Amos
Received on Fri Jun 27 2014 - 05:55:50 MDT

This archive was generated by hypermail 2.2.0 : Sat Jun 28 2014 - 12:00:06 MDT