Re: parsing quoted-string HTTP header fields

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 27 May 2011 00:14:46 +1200

On 26/05/11 19:02, Tsantilas Christos wrote:
> On 05/26/2011 07:42 AM, Amos Jeffries wrote:
>> On Wed, 25 May 2011 20:07:13 +0300, Tsantilas Christos wrote:
>>> Here are two patches the first implements the (1) and the second the
>>> (3).
>>
>> Attach?
> oops!
> Now attsached.
>
>>
>>> The second patch does not look slower, but maybe we do not want to be
>>> strict for many reasons. One reason is that this function used to
>>> "Configurable SSL error details messages" patch, to parse the "descr:"
>>> and "detail:" fields :-)
>>> The other reason is that it is usual to find web clients or servers
>>> which uses a simple "\n" instead of the "\r\n"...
>>>
>>
>> I'm thinking a copy process may be slower than a basic scan, due to new
>> memory allocations etc.
>> Not necessarily true, waiting to see those patches. :)
>>
>> Amos
>>
>>>
>>> On 05/25/2011 02:51 PM, Amos Jeffries wrote:
>>>> On 25/05/11 22:44, Tsantilas Christos wrote:
>>>>> The quoted string fields parsing done in squid3 using the
>>>>> httpHeaderParseQuotedString function (HttpHeaderTools.cc file)
>>>>>
>>>>> I found that we do not support multiline quoted string fields, but
>>>>> according the rfc2616 multiline quoted string fields should supported:
>>>>>
>>>>> quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
>>>>> qdtext = <any TEXT except <">>
>>>>> TEXT = <any OCTET except CTLs,
>>>>> but including LWS>
>>>>> LWS = [CRLF] 1*( SP | HT
>>>>>
>>>>>
>>>>>
>>>>> I am planning to fix the httpHeaderParseQuotedString function, using
>>>>> one
>>>>> of the following rules:
>>>>>
>>>>> 1) Just ignore any "\r" or "\n" character. This is the fastest and
>>>>> simpler approach
>>>>> 3) Require "\r\n " or "\r\n\t" as line separator and just remove the
>>>>> "\r\n"
>>>>> 3) Require "\r\n " or "\r\n\t" as line separator and replace it with a
>>>>> space
>>>>> 4) Also support "\n\t" or "\n\n\t" as line separators.
>>>>
>>>> (4) case #2 "\n\n\t" is also non-compliant.
>>>>
>>>>>
>>>>> Any comments or suggestions?
>>>>
>>>> RFC states "A recipient MAY replace any linear white space with a
>>>> single
>>>> SP".
>>>>
>>>> So (1) and the second (3) {replace with 0x20} are the valid options.
>>>>
>>>> It may be worth having two quoted-string parsers. One for each option.
>>>> The preference being (1) for its higher performance.
>>>>
>>>> Amos

Ah, okay. I forgot the quoted string un-escapes as it goes. So there is
no extra penalty from (3). In which case I prefer (3) over (1), it has a
lot more error-checking for free.

Patch for case (3) looks great and is RFC compliant. Do we want to be
lenient and allow the absence of \r ?

If no, please commit (3) asap :)

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.7 and 3.1.12.1
Received on Thu May 26 2011 - 12:14:53 MDT

This archive was generated by hypermail 2.2.0 : Thu May 26 2011 - 12:00:06 MDT