Re: [PATCH] Request URI logging for malformed requests

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 05 Mar 2011 02:28:54 +1300

On 04/03/11 05:23, Tsantilas Christos wrote:
> On 03/02/2011 03:00 AM, Amos Jeffries wrote:
>> On Tue, 01 Mar 2011 20:29:51 +0200, Tsantilas Christos wrote:
>>> This patch is forgotten too.
>>> Any comments?
>>>
>>
>> Just doing a quick audit now...
>>
>> ......
>> "
>>
>> src/client_side.cc:
>> - the logic is wrong
>> - in setLogUri() you do not need to call stringHasCntl(uri). The
>> unescape call will do a one-pass scan and check these as it goes.
>> also an old bug is now visible, stringHasCntl() does not check the full
>> set of characters that need encoding :(
>
> Well, does not check for rfc1738_unsafe_chars, is not difficult to fix
> it. I can include a fix in this patch if required.
>
>>
>> With that fixed its clear that stringHasWhitespace() needs to be
>> performed *before* any escaping and duplicating is done. Some shuffling
>> can avoid doing that scan pass entirely on most config setups.
>
> This is true, the rfc1738_escape_unscaped, escapes the whitespace so
> user whitespace policy are not applied.
>
> Your patch is not correct too, because the uris must always escaped if
> they contain non printable characters, before logged. Your code only
> take cares for whitespaces...

Arg. Right.

>
> A better solution requires 2 strdup/malloc if there are both whitespaces
> and ctl characters in the uri. I do not know how often are URIs with
> whitespaces and ctl characters but I do not like ...
>
> The problem was to find a way to handle the cases you have unescaped
> URIs with whitespaces which causes problem on log analysis.
> In my original patch if there are cntl characters escape them and escape
> the whitespaces too. In the case there are not cntl characters apply
> user defined whitespace policy....
> Yes the above is not the best, but I think it is fair for most cases.
>
>
> The overall source code which is related with http client request
> parsing is a little confused, and I believe that we are spending a lot
> of CPU cycles checking and parsing again and again the same strings.
>
> My opinion is that a better solution is to rewrite or add a new
> rfc1738_escape_unescaped library function as follows:
>
> rfc1738_escape_unescaped_uri(const char *uri,
> const char *escaped_uri, size_t escaped_uri_len,
> int whitespace_policy = URI_WHITESPACE_ENCODE)
>
> uri: the original URI
> escaped_uri: the buffer to copy the URI
> escaped_uri_len: the length of the buffer
> whitespace_policy: method to handle the whitespaces
> (URI_WHITESPACE_ALLOW, URI_WHITESPACE_ENCODE etc.... )

No. our local whitespace policy has no place in the encoding algorithm.
It is used by many places for many non-URL things.

The current separation of a whitespace policy function setLogUri() which
happens to use rfc1738 escaping is good. The problem is limits on the
algorithm API.

Attached is a patch for the rfc1738 API which fixes its flags to enable
omitting the character groups separately.

I noticed there was no short-circuit to speed up the scan over the very
common alpha-numeric chars. Added. That should speed up the encoding a lot.

There are two extra performance boosts possible here but outside the
scope of any of this work,
  * using malloc instead of calloc. If necessary at all memset() is only
needed on the final unfilled part of the array.
  * update the API to receive an optional buffer to be filled, OR to
pass out the alloc'd buffer for the caller to deal with.

>
> The above will save at list 2 full string passes (stringHasWhitespace()
> and stringHasCntl() functions) for every URI to be logged and in worst
> case 4 string passes and 2 strdup/mallocs

Yes, neither of those are needed at all.

>
> Regards,
> Christos
>
>>
>> What I think is needed is:
>>
>> if (!cleanUrl)
>> ...
>> else {
>> switch (Config.uri_whitespace) {

After the rfc1738 API is fixed this can become:

  int flags = 0;
  switch (Config.uri_whitespace) {
  case URI_WHITESPACE_ALLOW:
    flags |= RFC1738_ESCAPE_NOSPACE;

  case URI_WHITESPACE_ENCODE:
   flags |= RFC1738_ESCAPE_UNESCAPED;
   http->log_uri = xstrndup(rfc1738_do_escape(uri, flags), MAX_URL);
   break;

>>
>> case URI_WHITESPACE_CHOP:
>> http->log_uri = xstrndup(uri, min(MAX_URL,strcspn(http->log_uri,
>> w_space)));
>> break;
>>
>> case URI_WHITESPACE_DENY:
>> case URI_WHITESPACE_STRIP:
>> default:
>> {
>> char *q = static_cast<char*>(xmalloc(strlen(uri)*3 + 1));
>> const char *t = uri;
>> while (*t) {
>> if (!xisspace(*t))
>> *q++ = *t;
>> t++;
>> }
>> *q = '\0';
>> http->log_uri = xstrndup(rfc1738_escape_unescaped(q), MAX_URL);
>> xfree(q);
>> }
>> }

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.11
   Beta testers wanted for 3.2.0.5

Received on Fri Mar 04 2011 - 13:29:04 MST

This archive was generated by hypermail 2.2.0 : Fri Mar 04 2011 - 12:00:02 MST