Re: [PATCH] new request-line parser and unit tests

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 31 Aug 2010 09:47:18 -0600

On 08/31/2010 09:20 AM, Amos Jeffries wrote:
> Split into two patches.
>
> The parser_unittests.patch adds tests for all the request line
> permutations of content and garbage I could think of over the last few
> days with their RFC 1945 and RFC 2616 expected outcomes. On the cases
> which were not mentioned, ie binary crap, I've erred to leniency
> accepting as 'unknown' content in the field with relevant cases assumed
> to be HTTP/0.9.
>
>
> The request-line parser existing in squid failed on several types of
> test, mostly the ones centered around missing or invalid URIs.
>
> So, parser_changes.patch contains an upgrade/fix to
> HttpParserParseReqLine() handling HTTP request-lines as specific by
> section 5.1 of the above RFCs. Specifically to handle a sequence of
> unknown bytes up to a terminating LF (\n) octet. Nothing outside that is
> relevant to this parser.
>
> Changes:
> * The semantics as previously documented are taken on. No changes there,
> but documentation clarified a bit. Some things previously not erroring
> are now doing so. External code impact is in the nature of reduced
> special cases to be handled. Specifically raw-CR weirdness in the
> request line fields. This occuring in URL was a vulnerability at least
> once.
>
> * Prior updates to HttpParser object for other parse stages opens the
> possibility of this parse action returning HTTP status code directly.
> Additions are done to make use of this (with the existing status codes
> only).
>
> * Input permutations where the unit-tests showed the old parser was
> violating its own documentation have been fixed to produce expected
> outputs.
>
> * Old parser operated three distinct potentially long parse loops. Added
> several local variables to remember various octets seen while searching
> for the terminal LF. This removed the need for two of the parse re-scans
> (length of method, length of URI).

* "// for now this hack retains that behaviour" code is actually
disabled with "#if 0 && ". Please adjust the guard or the comment.

* CR+CRLF should be allowed because those CRs are a part of the final
CRLF sequence. IIRC, RFC 2616 recommends allowing CR*LF whenever CRLF is
required.

* The old "This is the only spot in this function where (0) is valid
result" comment is misleading because it is not clear what "valid
result" is and zero is always a non-error result. Consider rephrasing to
something like "This is the only spot in this function where we request
more data" if that is what the comment wanted to express. Or just delete
the comment.

* Since you are rewriting the whole thing, please get rid of gotos.
Besides making the code less safe, they prevent variable localization
and are probably preventing some compiler optimizations. One simple
solution would be to just return from the function and let a simple
wrapper do PROF_*() and debugging.

* Once the gotos are gone, please declare locally used variables locally.

> * TODO: making this part of HttpParser

Since the current diff is pretty much useless anyway, I would recommend
doing the above change now. It will help with goto-avoidance as well. I
agree that you do not have to do it now though.

This will be rewritten all over again when Kinkie's string changes are
completed, right?

If there are no objections, I would like the effects of this change on
compliance to be tested before the patch is committed. If you want me to
run the tests, please let me know when you do not intend to make any
more changes to the patch (because you are done addressing review
comments or do not consider them valid).

Thank you,

Alex.

>
> Future work TODO but outside the scope of this update and not going to
> be added:
>
> * making HttpParser a trickle-scanner to speed up large request line
> parsing.
> * adding 6xx codes for internal use producing better errors on several
> interesting error cases identified by this parser.
> * using HttpParser fields output from this to seed the following parser
> stages and prevent re-scanning this line to find bits. Particularly the
> HttpRequest sanity checks.
>
>
> Amos
Received on Tue Aug 31 2010 - 15:47:29 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 01 2010 - 12:00:05 MDT