Re: [PATCH] new request-line parser and unit tests

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 31 Aug 2010 18:59:55 -0600

On 08/31/2010 06:21 PM, Amos Jeffries wrote:

>>>> * CR+CRLF should be allowed because those CRs are a part of the final
>>>> CRLF sequence. IIRC, RFC 2616 recommends allowing CR*LF whenever CRLF
>>>> is required.
>>>
>>> They are allowed *if* followed by a LF which marks the line terminal.
>>
>> Exactly.
>>
>>> There was no mention I could see anywhere about sequences of CR*. LWS
>>> sequences for whitespace wrapping which permits middle-line CR only
>>> applies to headers which are outside this line.
>>
>> Yes. I am only after allowing CR*LF, not implied LWS or bare CRs.
>
> CR*LF == (bare-CR)* CR LF.
>
> While marshalling my arguments against this I find it at odds with the
> rest of the parser. You seem never to have committed
> http://www.mail-archive.com/squid-dev@squid-cache.org/msg13384.html

Just did, before receiving your email. The patch is related but does not
modify request line parsing.

> In the request line they are definitely *not* valid, by explicit statement
> in no less than six RFCs (1945, 2048, 2616, 2326, 822, 821) as I quote
> below. Due to being ignored in the outputs I take you point on them
> possibly being benign. This being the front line of security in Squid I'm
> still suspicious about that.

I think your RFC interpretation is more accurate than mine.

However, I believe I have seen HTTP clients sending them (I have seen
benign HTTP clients send them somewhere in the header). The smuggling
request paper does not object to them either, AFAICT.

If the consensus is that CRs before CRLF must be banned, let's ban them.
Compliance tests, especially those with questionable interpretation are
not the final criteria, of course.

> You are right. Only the CR followed by LF is permitted in the submitted
> code. The prefix sequence of CR is treated as (bare-CR)* CR LF.
>
> All three of the HTTP RFCs have explicit octet-resolution BNF patterns for
> what these upper-cased tags mean. CR is defined explicitly as a single
> carriage-return octet value 10. CRLF is explicitly specified as two octets.
> and explicitly forbidden to have CR outside the CRLF pairing. Explicit
> mention that LWS (with explicit redefinition to be SP or HT only!) may be
> substituted for SP around the URI as long as at least one SP is retained.

And then, after all that stuff, they go and say:

    However, we recommend that applications, when parsing such headers,
    recognize a single LF as a line terminator and ignore the leading CR.

So much for explicitness and strictness. If one CR is ignored, why not
two? And I suspect the motivation behind all those strict rules was to
optimize/simplify parsers, not prevent message smuggling.

The bottom line though is that there is a high, IMO, chance that your
strict parser will deny benign requests and we will have to relax it.

To summarize, our choices are:

1) Allow CR+CRLF: Your patch needs to be adjusted. Wait for a request
smuggling exploit (unlikely but possible). Code a workaround or goto #2.

2) Deny CR+CRLF: Your patch is fine. Wait for benign requests being
blocked (likely but not guaranteed). Goto #1.

> CR CR LF are therefore undefined and in light of the detail this
> definition goes to strongly implicitly invalid.
>
> In light of your analysis from that earlier double-CR patch and the fact
> we effectively drop them from use immediately I'll accept it as benign to
> Squid itself and add more leniency.

Perhaps we should hear what others have to say. I am happy with either
solution if others are behind it.

Thank you,

Alex.
Received on Wed Sep 01 2010 - 01:00:06 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 01 2010 - 12:00:06 MDT