Re: Request URI logging for malformed requests

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 07 Jan 2011 16:58:50 +1300

On 05/01/11 06:38, Alex Rousskov wrote:
> Hello,
>
> By default, Squid logs request URIs without any escaping. This works
> OK in most cases because uri_whitespace defaults to "strip". Even when
> URI has a space, the logged value does not have it, which keeps log
> parsing scripts happy.
>
> However, when Squid detects a malformed request (e.g., the URI scheme is
> "rtsp"), Squid may log what it thinks is the raw URI, including any
> spaces. This results in malformed access log entries such as:
>
> [02/Jan/2011:20:55:15 +0100] 10.3.75.185
> xml:lang="*" version="1.0" xmlns:stream="http://jabber.org/streams"
> NONE/400 HTTP/0.0<stream:stream ...
>
> [02/Jan/2011:21:03:54 +0100] 10.19.66.249
> sip:10.38.26.67:80;transport=tcp SIP/2.0
> NONE/400 HTTP/0.9 REGISTER ...
>
> [02/Jan/2011:21:05:47 +0100] 10.228.123.186
> rtsp://youtube.com/DjgMDA==video.3gp RTSP/1.0
> NONE/400 HTTP/0.9 DESCRIBE ...
>
> I split the logged lines above into three lines each for readability,
> with the second line always being the request URI (%ru format code).
>
> As you can see, such log entries are malformed and would be rather
> difficult to interpret correctly due spaces in URIs and field-looking
> protocol versions that are actually a part of %ru output.
>
> While the above real-world examples use custom access log format, the
> default behavior is the same.
>
>
> We could (and possibly will) improve request parsing so that common
> cases like RTSP and SIP requests do not get interpreted as malformed
> HTTP/0.9 requests. However, that does not solve the more general case of
> a truly malformed request like the very first example pasted above.
>
>
> Our options include:
>
> 1) Apply uri_whitespace before logging malformed requests. This will
> result in spaces stripped by default. The uri_whitespace option
> description should probably be adjusted to recommend a different %ru
> encoding the values that do not remove spaces from URLs.
>
> 2a) Strip spaces when logging %ru unless an explicit encoding is
> specified for that option. To implement this, we would add
> LOG_QUOTE_STRIP_SPACE log_quote value.
>
> 2b) Chop spaces when logging %ru unless an explicit encoding is
> specified for that option. To implement this, we would add
> LOG_QUOTE_CHOP_SPACE log_quote value.
>
> 2c) Replace spaces with %20 when logging %ru unless an explicit encoding
> is specified for that option. To implement this, we would add
> LOG_QUOTE_ENCODE_SPACE log_quote value. This is a little different from
> encoding the entire URL because it would apply to spaces (and '%') only.
>
> 3) Add a new log_whitespace squid.conf option to allow the admin to
> strip, chop, or encode space in all transaction log fields that do not
> have an explicit setting. Default setting could be
> LOG_QUOTE_ENCODE_SPACE, I guess. This will help avoid similar problems
> in fields other than %ru.
>
>
> My preference is (1), followed by (3), but I am not sure and may have
> missed better options. What do you think?

definitely (1).

(3) seems like a good idea as a separate feature.

Along with (1) I think adding rtsp: and sip: as known protocols which
get rejected nicely until handled would be a good idea. The 3.2 parser
is ready now for handling unknown schemes as an error case.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.10
   Beta testers wanted for 3.2.0.4
Received on Fri Jan 07 2011 - 03:58:55 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 07 2011 - 12:00:03 MST