Re: parsing quoted-string HTTP header fields

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 27 May 2011 04:20:01 +1200

On 27/05/11 03:01, Alex Rousskov wrote:
> On 05/26/2011 06:14 AM, Amos Jeffries wrote:
>> On 26/05/11 19:02, Tsantilas Christos wrote:
>>> On 05/26/2011 07:42 AM, Amos Jeffries wrote:
>>>> On Wed, 25 May 2011 20:07:13 +0300, Tsantilas Christos wrote:
>>>>> The second patch does not look slower, but maybe we do not want to be
>>>>> strict for many reasons. One reason is that this function used to
>>>>> "Configurable SSL error details messages" patch, to parse the "descr:"
>>>>> and "detail:" fields :-)
>>>>> The other reason is that it is usual to find web clients or servers
>>>>> which uses a simple "\n" instead of the "\r\n"...
>
>
>>>>> On 05/25/2011 02:51 PM, Amos Jeffries wrote:
>>>>>> On 25/05/11 22:44, Tsantilas Christos wrote:
>>>>>>> The quoted string fields parsing done in squid3 using the
>>>>>>> httpHeaderParseQuotedString function (HttpHeaderTools.cc file)
>>>>>>>
>>>>>>> I found that we do not support multiline quoted string fields, but
>>>>>>> according the rfc2616 multiline quoted string fields should
>>>>>>> supported:
>>>>>>>
>>>>>>> quoted-string = (<"> *(qdtext | quoted-pair )<"> )
>>>>>>> qdtext =<any TEXT except<">>
>>>>>>> TEXT =<any OCTET except CTLs,
>>>>>>> but including LWS>
>>>>>>> LWS = [CRLF] 1*( SP | HT
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am planning to fix the httpHeaderParseQuotedString function, using
>>>>>>> one
>>>>>>> of the following rules:
>>>>>>>
>>>>>>> 1) Just ignore any "\r" or "\n" character. This is the fastest and
>>>>>>> simpler approach
>>>>>>> 3) Require "\r\n " or "\r\n\t" as line separator and just remove the
>>>>>>> "\r\n"
>>>>>>> 3) Require "\r\n " or "\r\n\t" as line separator and replace it
>>>>>>> with a
>>>>>>> space
>>>>>>> 4) Also support "\n\t" or "\n\n\t" as line separators.
>>>>>>
>>>>>> (4) case #2 "\n\n\t" is also non-compliant.
>>>>>>
>>>>>>>
>>>>>>> Any comments or suggestions?
>>>>>>
>>>>>> RFC states "A recipient MAY replace any linear white space with a
>>>>>> single
>>>>>> SP".
>>>>>>
>>>>>> So (1) and the second (3) {replace with 0x20} are the valid options.
>>>>>>
>>>>>> It may be worth having two quoted-string parsers. One for each option.
>>>>>> The preference being (1) for its higher performance.
>>>>>>
>>>>>> Amos
>>
>>
>> Ah, okay. I forgot the quoted string un-escapes as it goes. So there is
>> no extra penalty from (3). In which case I prefer (3) over (1), it has a
>> lot more error-checking for free.
>>
>>
>> Patch for case (3) looks great and is RFC compliant. Do we want to be
>> lenient and allow the absence of \r ?
>
> Sorry, I really do not like both the policing intent of #3 and #3
> implementation.
>
> Policing intent: RFC 2616 recommends that we ignore [the lack of] CR
> when interpreting CRLF in HTTP headers:
>
>> The line terminator for message-header fields is the sequence CRLF.
>> However, we recommend that applications, when parsing such headers,
>> recognize a single LF as a line terminator and ignore the leading CR.
>
> Granted, the above can be strictly interpreted to apply to the final
> header terminator and not LWS, but I think it is reasonable to extend
> the above logic to header continuation (and to LWS in general).

So one vote for leniency (alex). One don't care (amos).

>
> Implementation:
>
>> while (*pos != '"'&& len> (pos-start)) {
>> +
>> + if (*pos =='\r') {
>> + pos ++;
>> + if (*(pos++) != '\n' || (*pos != ' '&& *pos != '\t')) {
>
> Can the above double increment lead to *pos pointing beyond the string
> boundaries?

Yes. A dangerous only to the debugs(). Which needs a --pos--

>
> Will the above incorrectly accept CR x HT sequence, where "x" is any
> character other than LF?

Exactly how dangerous are solo-CR floating around in quoted-string?

>
> Finally, if we are going to replace LWS with a single SP, we should
> replace the whole LWS and not just the initial CRLF (SP|HT) prefix.
>

Well, we may have already skipped over and accepted the prefix of it.
If we ignore the CR, then that too is likely to also be appended to the
output as a solo.

>
> Overall, I think we should go for #1 for now because it is likely to be
> more compatible with real traffic and to have fewer bugs.
>
> Eventually, we should probably do something close to #3 but without bugs
> and without rejecting the whole header when a field has a quoted string
> with malformed LWS. We should be careful about enabling message
> smuggling attacks when policing messages so it may be tricky to get this
> right for all cases.

I wanted to consult with you on the smuggling cases again. To check if
we need to drop all CR here or just the ones in LWS.

This function is restricted to a single quoted-string token. The prior
pieces of header are not touched here. The latter I'm not sure about,
that is a caller issue.

>> + bool parse_error = (*end<= 0x1F&& *end != '\r'&& *end != '\n') || *end == 0x7F;
>
> This variable can be declared const.
>
> I do not fully understand what the above code and the while loop above
> is really doing. If you know what it does, please add a comment.

FYI:
  The while loop skips over as many safe octets as possible (excl. \")
to optimize calls to append().

The bool checks whether the octet which stopped that scan was important
enough to abort.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.7 and 3.1.12.1
Received on Thu May 26 2011 - 16:20:11 MDT

This archive was generated by hypermail 2.2.0 : Fri May 27 2011 - 12:00:05 MDT