Re: [PATCH] HTTP Parser upgrade

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 03 Jan 2014 09:38:58 -0700

On 01/03/2014 05:28 AM, Amos Jeffries wrote:
>>> 1) avoid the current design of 2-pass parsing: a) generic-parse just to
>>> determine it is a request-line, then b) re-parse with HttpRequest to get
>>> the fields already found by HttpParser.

>> I do not think we need two passes. The caller knows what it is parsing
>> and will use the right parser. That is, the caller will do either
>>
>> Http1::RequestParser parser(...)
>> if (parser.parse(...))
>>
>> or
>>
>> Http1::ResponseParser parser(...)
>> if (parser.parse(...))
>>
>> The above sketch omits many details, of course; for example, the
>> incremental parsers will be created once per message rather than once
>> per parse call. However, there is just one pass in the above sketch.

> "how" is the problem. The first pass I'm talking of is the discovery of
> that information used to run the pass these parsers will do. It may or
> may not be a complete pass or just several bytes preview-style into the
> data. You may recall we had to add a separate layer of memcmp() to
> distinguish between HTTP/1 and HTTP/2 alone.

Since there is not much value in making two passes over the same data,
we will eliminate them. Let's not assume that they have to be there and
designing to accommodate that wrong assumption! Your initial patch may
not eliminate the two passes, but your design does not have to suffer
from them from the beginning.

The initial patch you posted already creates the parser once, when
needed. That is the right way to do it. Eventually, you may use approach
#1 (below) and give that parser a request message, OR you can use
approach #2 (below) and make that parser create the request message.
AFAICT, you do not have to make that decision now because you are not
modifying/removing the "second pass" code yet.

>>> 3) avoid having to allocate a HttpRequest/HttpResponse object just to
>>> test parsing for invalid request/responses.
>>
>> I think there are two valid approaches here:
>>
>> 1) Always create the right empty HttpR* message object. Always create
>> the right parser. Give the message object to the parser so that the
>> parser can use it for storage during parsing.
>>
>> 2) Always create the right parser. Allow the parser to create the right
>> empty HttpR* message object if/when needed, so that the parser can use
>> the message object for storage during parsing.
>>
>> Approach (1) is probably simpler: No logic to decide when the message
>> object should be created. No code to check whether the message object
>> has been created already. It can be modified so that the parser creates
>> the right message object unconditionally, to simplify the caller's code.
>>
>> Approach (2) is more efficient if and only if there are a lot of cases
>> where the parser determines that no message object is needed to handle a
>> parse() call. Those are the cases where message object creation will be
>> delayed or even avoided.
>>
>> I suspect that there are not enough such cases to warrant the complexity
>> of (2), but you probably know this area better than I do.

> We have enough abort cases (Request parser generates Response object) to
> make me pick (2) ealier as the end goal.

The generated response objects are irrelevant IMO. Parsers in both
approaches can generate them, whether you supply the request message to
the request parser or let the parser create that request message as needed.

> Avoids replicating validation
> outside the parser which has already been done within and has less info
> outside.

I do not understand what duplicated validation (of the request being
parsed?) you are talking about, but I do not foresee any duplication in
either design. Perhaps you are talking about duplicated
parsing/validation in the current code, but I do not think it is
relevant to this discussion because it will be gone by the time we are done.

Overall, given vanishing correlation between our emails, I suspect we
are simply talking past each other: discussing different matters while
thinking that we are discussing the same thing. To make progress, I
suggest the following specific changes as the next step:

* Switch to the new namespace scheme. If I were you, I would not bother
with adding or renaming any files at this point unless absolutely
necessary. That should be done as the last step, to minimize
intermediate patch sizes and rererenaming overheads. For the same
reason, I would avoid updating test cases at least until the major
design issues are settled.

* Rename your Http::Http1Parser class to Http::One::Parser class.

* Add empty Http::One::RequestParser class, a child of
Http::One::Parser. This class will be referred by callers as
Http1::RequestParser. Again, to minimize overheads, ease review, and
focus on what it important, this class should be initially located in
the same header and source files as the Http::Http1Parser (we will split
sources later).

* Move code (including methods and data members) specific to request
parsing from Http::One::Parser to Http::One::RequestParser. Finalize
moved methods/data descriptions. Any generic code, such as code
isolating the first HTTP message line (if any), can stay in
Http::One::Parser, but please do not make any existing code generic even
if it is possible to do so (that should be done later if needed).

* Please do not add any code (including methods and data members)
specific to response parsing at this time.

* Address other, non-controversial review issues as usual.

* Re-post the patch.

I think the above plan will help us focus on what is critical now and
minimize iterations.

> /* NP: don't be tempted to move this down or remove again.
> * It's the only DDoS protection old-String has against long URL */
> if ( hp->bufsiz <= 0) {
> debugs(33, 5, "Incomplete request, waiting for end of request line");
> return NULL;
> } else if ( (size_t)hp->bufsiz >= Config.maxRequestHeaderSize && headersEnd(hp->buf, Config.maxRequestHeaderSize) == 0) {
> debugs(33, 5, "parseHttpRequest: Too large request");
> hp->request_parse_status = Http::scHeaderTooLarge;
> return parseHttpRequestAbort(csd, "error:request-too-large");
> }

BTW, the above should be moved into the RequestParser code, with an
adjusted comment, and headersEnd() call moved/merged into the overall
parsing logic.

HTH,

Alex.
Received on Fri Jan 03 2014 - 16:39:05 MST

This archive was generated by hypermail 2.2.0 : Mon Jan 06 2014 - 12:00:10 MST