Re: [PATCH] HTTP Parser upgrade

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 17 Feb 2014 23:06:26 -0700

Here are a few more notes, to add to the unfinished review posted
earlier. The first and the last one are minor, but the rest is
serious/important.

On 01/05/2014 08:21 PM, Amos Jeffries wrote:

> + /// the protocol label for this message
> + const AnyP::ProtocolVersion & messageProtocol() const {return msgProtocol_;}

s/this message/parsed message/ or something like that to clarify what
"this message" we are talking about in a context of a parser class. You
may also want to add "unset until ..." note.

> /** Initialize a new parser.
> * Presenting it a buffer to work on and the current length of available data.
> * NOTE: This is *not* the buffer size, just the parse-able data length.
> * The parse routines may be called again later with more data.
> */
> Parser(const char *aBuf, int len) { reset(aBuf,len); }

This new interface implies that the Parser stores/uses raw input buffer
between parse() calls. I do not think this is a good idea. This approach
already leads to time bombs where the patched code adjusts the
size/offsets but assumes there was no buf pointer change (which will not
be true if we start using more complex buffering approaches after
migrating to SBuf and such):

> +void
> +Http1::RequestParser::noteBufferShift(int64_t n)
> +{
> + bufsiz -= n;
...
> + // shift the parser resume point to match buffer content
> + parseOffset_ -= n;
...
> +}

With the old non-incremental parsers, the buffer pointer and various
offsets were reset before every parse call:

> /* Begin the parsing */
> PROF_start(parseHttpRequest);
> HttpParserInit(&parser_, in.buf, in.notYetUsed);
>
> /* Process request */
> Http::ProtocolVersion http_ver;
> ClientSocketContext *context = parseHttpRequest(this, &parser_, &method, &http_ver);

As we migrate to incremental parsers, we have a choice to make:

* Either the parser and the caller are going to maintain their own raw
buffers, with the caller appending to the parser's buffer when new bytes
arrive and the parser consuming whatever buffer [prefix] it no longer needs.

* Or the parser is going to rely on the callers buffer. Storing a
reference to that buffer during parse() calls is a good idea, but the
buffer is going to be presumed invalid and unused between those parse()
calls. We should either not store any offsets between calls at all (if
they are not needed) OR require the callers to consume exactly what the
parser tells them to consume before giving the parser a new buffer to
parse. In the latter case, the stored offsets will always be based on
the first buffer byte that the parser has seen and will not depend on
any buffer shifts.

For efficiency reasons, I am pretty sure we want the latter, which is
what the existing/unpatched code essentially gives us, ironically. We
need to document and continue to make use of it correctly (making any
necessary adjustments, of course).

If the latter approach is what you are trying to implement,

* we need neither the noteBufferShift() nor Parser constructor parameters,

* we need to pass the buffer to the parse() call, and

* the parseOffset_ either needs to be maintained differently or removed
from Parser completely.

These are important API changes that we need to get right now, before we
start making the parser more complex and truly incremental.

Note that the new RequestParser appears to be using parseOffset_ but it
is just [poorly] shadowing the old/existing req.start and req.end
members. We need to document what those old offsets mean and use them as
needed (or remove them if they are not needed!).

> request_parse_status = Http::scOkay; // HTTP/0.9
> + completedState_ = HTTP_PARSE_FIRST;
> + parseOffset_ = line_end;

> request_parse_status = Http::scOkay; // treat as HTTP/0.9
> + completedState_ = HTTP_PARSE_FIRST;
> + parseOffset_ = req.end;

> request_parse_status = Http::scOkay;
> + parseOffset_ = req.end+1; // req.end is the \n byte. Next parse step needs to start *after* that byte.
> + completedState_ = HTTP_PARSE_FIRST;

The above is one example where parseOffset_ is maintained
inconsistently. Removing parseOffset_ (until it is actually needed)
would fix that.

> + * \return true if garbage whitespace was found
> + */
> +bool
> +Http1::RequestParser::skipGarbageLines()

The current code returns true even if the garbage was not found during
the call. Please clarify whether the return result applies to the call
(naturally) or to the history of parsing this request (unnaturally).
However, I would just make this method return void. I do not think the
[fixed] caller really needs to know what happened here.

With this email, I am done reviewing this iteration of the parser
upgrade patch.

Thank you,

Alex.
Received on Tue Feb 18 2014 - 06:06:37 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 18 2014 - 12:00:13 MST