Re: [PATCH] new request-line parser and unit tests

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 01 Sep 2010 00:21:48 +0000

On Tue, 31 Aug 2010 16:59:05 -0600, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 08/31/2010 04:33 PM, Amos Jeffries wrote:
>> On Tue, 31 Aug 2010 09:47:18 -0600, Alex Rousskov
>> <rousskov_at_measurement-factory.com> wrote:
>
>>> * "// for now this hack retains that behaviour" code is actually
>>> disabled with "#if 0&& ". Please adjust the guard or the comment.
>>
>> Okay comment changing.
>>
>>>
>>> * CR+CRLF should be allowed because those CRs are a part of the final
>>> CRLF sequence. IIRC, RFC 2616 recommends allowing CR*LF whenever CRLF
is
>>
>>> required.
>>
>> They are allowed *if* followed by a LF which marks the line terminal.
>
> Exactly.
>
>> There was no mention I could see anywhere about sequences of CR*. LWS
>> sequences for whitespace wrapping which permits middle-line CR only
>> applies
>> to headers which are outside this line.
>
> Yes. I am only after allowing CR*LF, not implied LWS or bare CRs.

CR*LF == (bare-CR)* CR LF.

While marshalling my arguments against this I find it at odds with the
rest of the parser. You seem never to have committed
http://www.mail-archive.com/squid-dev@squid-cache.org/msg13384.html

In the request line they are definitely *not* valid, by explicit statement
in no less than six RFCs (1945, 2048, 2616, 2326, 822, 821) as I quote
below. Due to being ignored in the outputs I take you point on them
possibly being benign. This being the front line of security in Squid I'm
still suspicious about that.

>
>> These two and several other protocols using this type of request line
do
>> explicitly state "No CR or LF is allowed except in the final CRLF
>> sequence".
>>
>> The noted exception that bare-CR are permitted in the HTTP mime-headers
>> section implies that CR*CRLF would be possible and valid there.
However,
>> that section and its quirks are not relevant to this parser.
>
> Sure.
>
> Perhaps I did not grok the code right: I was under impression you only
> allow at most one CR before LF because you follow CR checks with a i+1
> LF checks:
>
> hmsg->buf[i + 1] == '\n'
>
> I could miss a different case. If I am wrong, please ignore this bullet
> with my apologies.

You are right. Only the CR followed by LF is permitted in the submitted
code.
The prefix sequence of CR is treated as (bare-CR)* CR LF.

All three of the HTTP RFCs have explicit octet-resolution BNF patterns for
what these upper-cased tags mean. CR is defined explicitly as a single
carriage-return octet value 10. CRLF is explicitly specified as two octets.
and explicitly forbidden to have CR outside the CRLF pairing. Explicit
mention that LWS (with explicit redefinition to be SP or HT only!) may be
substituted for SP around the URI as long as at least one SP is retained.

CR CR LF are therefore undefined and in light of the detail this
definition goes to strongly implicitly invalid.

In light of your analysis from that earlier double-CR patch and the fact
we effectively drop them from use immediately I'll accept it as benign to
Squid itself and add more leniency.

>
>>> > * TODO: making this part of HttpParser
>>>
>>> Since the current diff is pretty much useless anyway, I would
recommend
>>> doing the above change now. It will help with goto-avoidance as well.
I
>>> agree that you do not have to do it now though.
>>
>> That involves changes the callers of this code and its API. I'd rather
do
>> that as part of the StringNG change so the API only changes once.
>
>
> You can keep the callers intact:
>
> HttpParserParseReqLine(HttpParser *hmsg) {
> PROF_start(HttpParserParseReqLine);
> const int result = hmsg->parseReqLine();
> PROF_stop(HttpParserParseReqLine);
> return result;
> }

Far too smart for my lazy bones. :( Okay, doing.

Amos
Received on Wed Sep 01 2010 - 00:21:52 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 01 2010 - 12:00:06 MDT