On 01/04/2011 10:38 AM, Alex Rousskov wrote:
> Hello,
> 
>     By default, Squid logs request URIs without any escaping. This works
> OK in most cases because uri_whitespace defaults to "strip". Even when
> URI has a space, the logged value does not have it, which keeps log
> parsing scripts happy.
> 
> However, when Squid detects a malformed request (e.g., the URI scheme is
> "rtsp"), Squid may log what it thinks is the raw URI, including any
> spaces. This results in malformed access log entries such as:
> 
> [02/Jan/2011:20:55:15 +0100] 10.3.75.185
>   xml:lang="*" version="1.0" xmlns:stream="http://jabber.org/streams"
>   NONE/400 HTTP/0.0 <stream:stream ...
> 
> [02/Jan/2011:21:03:54 +0100] 10.19.66.249
>   sip:10.38.26.67:80;transport=tcp SIP/2.0
>   NONE/400 HTTP/0.9 REGISTER ...
> 
> [02/Jan/2011:21:05:47 +0100] 10.228.123.186
>   rtsp://youtube.com/DjgMDA==video.3gp RTSP/1.0
>   NONE/400 HTTP/0.9 DESCRIBE ...
> 
> I split the logged lines above into three lines each for readability,
> with the second line always being the request URI (%ru format code).
> 
> As you can see, such log entries are malformed and would be rather
> difficult to interpret correctly due spaces in URIs and field-looking
> protocol versions that are actually a part of %ru output.
> 
> While the above real-world examples use custom access log format, the
> default behavior is the same.
> 
> 
> We could (and possibly will) improve request parsing so that common
> cases like RTSP and SIP requests do not get interpreted as malformed
> HTTP/0.9 requests. However, that does not solve the more general case of
> a truly malformed request like the very first example pasted above.
> 
> 
> Our options include:
> 
> 1) Apply uri_whitespace before logging malformed requests. This will
> result in spaces stripped by default. The uri_whitespace option
> description should probably be adjusted to recommend a different %ru
> encoding for those who do not want to remove spaces from logged URLs.
> 
> 2a) Strip spaces when logging %ru unless an explicit encoding is
> specified for that option. To implement this, we would add
> LOG_QUOTE_STRIP_SPACE log_quote value.
> 
> 2b) Chop spaces when logging %ru unless an explicit encoding is
> specified for that option. To implement this, we would add
> LOG_QUOTE_CHOP_SPACE log_quote value.
> 
> 2c) Replace spaces with %20 when logging %ru unless an explicit encoding
> is specified for that option. To implement this, we would add
> LOG_QUOTE_ENCODE_SPACE log_quote value. This is a little different from
> encoding the entire URL because it would apply to spaces (and '%') only.
> 
> 3) Add a new log_whitespace squid.conf option to allow the admin to
> strip, chop, or encode space in all transaction log fields that do not
> have an explicit setting. Default setting could be
> LOG_QUOTE_ENCODE_SPACE, I guess. This will help avoid similar problems
> in fields other than %ru.
> 
> 
> My preference is (1), followed by (3), but I am not sure and may have
> missed better options. What do you think?
Any objections to option #1 or better ideas?
Thank you,
Alex.
> P.S. One could argue that logging URIs with stripped or chopped spaces
> is wrong because it hides potentially critical information, but that is
> a different question that I do not want to discuss in this particular
> thread.
Received on Thu Jan 13 2011 - 22:46:03 MST
This archive was generated by hypermail 2.2.0 : Fri Jan 14 2011 - 12:00:08 MST