Re: [squid-users] Character encoding in access.log

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 22 Sep 2011 23:53:12 +1200

On 22/09/11 23:23, Javier Amor garcia wrote:
> Hello,
> I am working in a access.log parser for squid and I have trouble with
> some URLs that contains no-us characters, like spanish accents.
>
> To fix the issues with the parser I need to know the following:
>
> The character encoding used for the log files is always the same or is
> system dependent?.

Neither. It is configuration dependent.

see http://www.squid-cache.org/Doc/config/logformat/

ie
   " output in quoted string format
   [ output in squid text log format as used by log_mime_hdrs
   # output in URL quoted format
   ' output as-is
   - left aligned

The default for URI fields should be URL-encoding according to the URI
specifications. Which means RFC 1738 encoding of all non-ASCII
characters in the path & query sections. puny-coding of characters in
the host authority section (although the puny-coding is done by the
browser, Squid is agnostic).

> There is some way to explicitly force squid to use a given charset (or
> UTF8) in its log files?.

All Squid log files are UTF-8. Some specific characters are URL-encoded
to enforce one-line log entries. Otherwise not.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.15
   Beta testers wanted for 3.2.0.12
Received on Thu Sep 22 2011 - 11:53:20 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 22 2011 - 12:00:03 MDT