Request URI logging for malformed requests

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 04 Jan 2011 10:38:58 -0700

Hello,

    By default, Squid logs request URIs without any escaping. This works
OK in most cases because uri_whitespace defaults to "strip". Even when
URI has a space, the logged value does not have it, which keeps log
parsing scripts happy.

However, when Squid detects a malformed request (e.g., the URI scheme is
"rtsp"), Squid may log what it thinks is the raw URI, including any
spaces. This results in malformed access log entries such as:

[02/Jan/2011:20:55:15 +0100] 10.3.75.185
  xml:lang="*" version="1.0" xmlns:stream="http://jabber.org/streams"
  NONE/400 HTTP/0.0 <stream:stream ...

[02/Jan/2011:21:03:54 +0100] 10.19.66.249
  sip:10.38.26.67:80;transport=tcp SIP/2.0
  NONE/400 HTTP/0.9 REGISTER ...

[02/Jan/2011:21:05:47 +0100] 10.228.123.186
  rtsp://youtube.com/DjgMDA==video.3gp RTSP/1.0
  NONE/400 HTTP/0.9 DESCRIBE ...

I split the logged lines above into three lines each for readability,
with the second line always being the request URI (%ru format code).

As you can see, such log entries are malformed and would be rather
difficult to interpret correctly due spaces in URIs and field-looking
protocol versions that are actually a part of %ru output.

While the above real-world examples use custom access log format, the
default behavior is the same.

We could (and possibly will) improve request parsing so that common
cases like RTSP and SIP requests do not get interpreted as malformed
HTTP/0.9 requests. However, that does not solve the more general case of
a truly malformed request like the very first example pasted above.

Our options include:

1) Apply uri_whitespace before logging malformed requests. This will
result in spaces stripped by default. The uri_whitespace option
description should probably be adjusted to recommend a different %ru
encoding the values that do not remove spaces from URLs.

2a) Strip spaces when logging %ru unless an explicit encoding is
specified for that option. To implement this, we would add
LOG_QUOTE_STRIP_SPACE log_quote value.

2b) Chop spaces when logging %ru unless an explicit encoding is
specified for that option. To implement this, we would add
LOG_QUOTE_CHOP_SPACE log_quote value.

2c) Replace spaces with %20 when logging %ru unless an explicit encoding
is specified for that option. To implement this, we would add
LOG_QUOTE_ENCODE_SPACE log_quote value. This is a little different from
encoding the entire URL because it would apply to spaces (and '%') only.

3) Add a new log_whitespace squid.conf option to allow the admin to
strip, chop, or encode space in all transaction log fields that do not
have an explicit setting. Default setting could be
LOG_QUOTE_ENCODE_SPACE, I guess. This will help avoid similar problems
in fields other than %ru.

My preference is (1), followed by (3), but I am not sure and may have
missed better options. What do you think?

Thank you,

Alex.
P.S. One could argue that logging URIs with stripped or chopped spaces
is wrong because it hides potentially critical information, but that is
a different question that I do not want to discuss in this particular
thread.
Received on Tue Jan 04 2011 - 17:39:06 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 14 2011 - 12:00:08 MST