Re: [PATCH] Request URI logging for malformed requests from Tsantilas Christos on 2011-03-03 (squid-dev)

From: Tsantilas Christos <chtsanti_at_users.sourceforge.net>
Date: Thu, 03 Mar 2011 18:23:49 +0200

On 03/02/2011 03:00 AM, Amos Jeffries wrote:
> On Tue, 01 Mar 2011 20:29:51 +0200, Tsantilas Christos wrote:
>> This patch is forgotten too.
>> Any comments?
>>
>
> Just doing a quick audit now...
>
>......
> "
>
> src/client_side.cc:
> - the logic is wrong
> - in setLogUri() you do not need to call stringHasCntl(uri). The
> unescape call will do a one-pass scan and check these as it goes.
> also an old bug is now visible, stringHasCntl() does not check the full
> set of characters that need encoding :(

Well, does not check for rfc1738_unsafe_chars, is not difficult to fix
it. I can include a fix in this patch if required.

>
> With that fixed its clear that stringHasWhitespace() needs to be
> performed *before* any escaping and duplicating is done. Some shuffling
> can avoid doing that scan pass entirely on most config setups.

This is true, the rfc1738_escape_unscaped, escapes the whitespace so
user whitespace policy are not applied.

Your patch is not correct too, because the uris must always escaped if
they contain non printable characters, before logged. Your code only
take cares for whitespaces...

A better solution requires 2 strdup/malloc if there are both whitespaces
and ctl characters in the uri. I do not know how often are URIs with
whitespaces and ctl characters but I do not like ...

The problem was to find a way to handle the cases you have unescaped
URIs with whitespaces which causes problem on log analysis.
In my original patch if there are cntl characters escape them and escape
the whitespaces too. In the case there are not cntl characters apply
user defined whitespace policy....
Yes the above is not the best, but I think it is fair for most cases.

The overall source code which is related with http client request
parsing is a little confused, and I believe that we are spending a lot
of CPU cycles checking and parsing again and again the same strings.

My opinion is that a better solution is to rewrite or add a new
rfc1738_escape_unescaped library function as follows:

rfc1738_escape_unescaped_uri(const char *uri,
const char *escaped_uri, size_t escaped_uri_len,
int whitespace_policy = URI_WHITESPACE_ENCODE)

uri: the original URI
escaped_uri: the buffer to copy the URI
escaped_uri_len: the length of the buffer
whitespace_policy: method to handle the whitespaces
(URI_WHITESPACE_ALLOW, URI_WHITESPACE_ENCODE etc.... )

The above will save at list 2 full string passes (stringHasWhitespace()
and stringHasCntl() functions) for every URI to be logged and in worst
case 4 string passes and 2 strdup/mallocs

Regards,
Christos

>
> What I think is needed is:
>
> if (!cleanUrl)
> ...
> else {
> switch (Config.uri_whitespace) {
> case URI_WHITESPACE_ALLOW:
> http->log_uri = xstrndup(uri, MAX_URL);
> break;
>
> case URI_WHITESPACE_ENCODE:
> http->log_uri = xstrndup(rfc1738_escape_unescaped(uri), MAX_URL);
> break;
>
> case URI_WHITESPACE_CHOP:
> http->log_uri = xstrndup(uri, min(MAX_URL,strcspn(http->log_uri,
> w_space)));
> break;
>
> case URI_WHITESPACE_DENY:
> case URI_WHITESPACE_STRIP:
> default:
> {
> char *q = static_cast<char*>(xmalloc(strlen(uri)*3 + 1));
> const char *t = uri;
> while (*t) {
> if (!xisspace(*t))
> *q++ = *t;
> t++;
> }
> *q = '\0';
> http->log_uri = xstrndup(rfc1738_escape_unescaped(q), MAX_URL);
> xfree(q);
> }
> }
>
>
>
> Amos
>
Received on Thu Mar 03 2011 - 16:24:11 MST

This archive was generated by hypermail 2.2.0 : Fri Mar 04 2011 - 12:00:02 MST