Re: [PATCH] HTTP Parser upgrade

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 17 Feb 2014 18:03:40 -0700

On 01/05/2014 08:21 PM, Amos Jeffries wrote:
> Updated patch attached. It mostly follows the plan you laid out (I was
> partially through an almost identical plan already, I merged the two
> from where I was up to).
>
> We now have a two-class HTTP Parser hierarchy:
> * base class Http1::Parser
> - which contains the generic (but HTTP/1.x specific) parse members and
> accessors
>
> * child class Http1::RequestParser
> - which contains the reuqest-specific state members and parse logics
> necessary to extend Http1::Parser into a HTTP Request message parser.

> -HttpParser::clear()
> +Http1::Parser::clear()

> +Http1::RequestParser::clear()
> +{
> + Http1::Parser::clear();

etc.

Please use Http::One (the real namespace) instead of Http1 (an alias)
when defining methods. Some compilers may be picky about how things are
defined. Using a short alias when referring to already declared and
defined names is fine, of course.

> +void
> +Http1::RequestParser::noteBufferShift(int64_t n)
> +{

Please make n constant if possible.

> + bufsiz -= n;
> +
> + // if parsing done, ignore buffer changes.
> + if (completedState_ == HTTP_PARSE_DONE)
> + return;
> +
> + // shift the parser resume point to match buffer content
> + parseOffset_ -= n;

The comment seems to contradict the code. We are decreasing bufsiz even
when parsing is done. This seems especially inconsistent given that we
do not adjust the offset in that case.

How about removing that if-statement for now? I am not sure we should be
called after we are done, but the current -=n adjustments seem benign
enough to let them be, at least for now.

> +
> +#if WHEN_INCREMENTAL_PARSING
> +
> + // if have not yet finished request-line
> + if (completedState_ == HTTP_PARSE_NEW) {
> + // check for and adjust the known request-line offsets.
> +
> + /* TODO: when the first-line is parsed incrementally we
> + * will need to recalculate the offsets for req.*
> + * For now, they are all re-calculated based on parserOffset_
> + * with each parse attempt.
> + */
> + }
> +
> + // if finished request-line but not mime header
> + // adjust the mime header states
> + if (completedState_ == HTTP_PARSE_FIRST) {
> + /* TODO: when the mime-header is parsed incrementally we
> + * will need to store the initial offset of mime-header block
> + * instead of locatign it from req.end or parseOffset_.
> + * Since req.end may no longer be valid, and parseOffset_ may
> + * have moved into the mime-block interior.
> + */
> + }
> +#endif

I am not sure those comments accurately reflect the best path forward. I
suggest removing them for now so that we do not have to argue about it.

> + // stage 1: locate the request-line
> + if (completedState_ == HTTP_PARSE_NEW) {
> + if (skipGarbageLines() && (size_t)bufsiz < parseOffset_)
> + return false;
> + }
> +
> + // stage 2: parse the request-line
> + if (completedState_ == HTTP_PARSE_NEW) {

There are two ways to implement the above correctly IMO:

A) Use one HTTP_PARSE_NEW state. Always try to skip garbage.
B) Use two states. Skip garbage in the first state only.

You appear to be doing (A) so I will focus on that one. In that
approach, we should not try to optimize the second call away (it
probably wastes a couple of CPU cycles now and is likely to lead to
buggy code eventually). The code should look like this instead:

     // stage 1: parse optional whitespace followed by request-line
     if (completedState_ == HTTP_PARSE_NEW) {
         _start(HttpParserParseReqLine);
         skipGarbageLines();
         int retcode = parseRequestFirstLine();
         ...
     }

If you disagree, I will write another email suggesting that the first
if-statement is buggy, but I would rather not spend time on that now.

> + int retcode = parseRequestFirstLine();

Please make const if possible.

> + /// attempt to parse a message from the buffer
> + virtual bool parse() = 0;

Please document what either "true" or "false" return value means. Parsed
everything? Keep going? Etc. This is important for code correctness.

> === renamed file 'src/HttpParser.h' => 'src/http/Http1Parser.h'

If you are going to rename files, should we put HTTP/1 parser into
src/http/one/Parser.h? Doing so will be more consistent with our naming
policy and avoid creating a lot of files with Http1 prefix (that does
not even correspond to the real namespace). Alternatively, avoid
renaming for now.

> // Parser states
> #define HTTP_PARSE_NONE 0 // nothing. completely unset state.
> #define HTTP_PARSE_NEW 1 // initialized, but nothing usefully parsed yet.
> +#define HTTP_PARSE_FIRST 2 // have parsed request first line
> +#define HTTP_PARSE_MIME 3 // have located end of mime header block
> +#define HTTP_PARSE_DONE 99 // have done with parsing so far

I do not think we should be extending the use of these macros. Please
replace with an enum (but also see immediately below):

> + /// what stage the parser is currently up to
> + uint8_t completedState_;

Actually, it is what stage the parser has completed and that is rather
awkward and error-prone because the code has to refer to the previous
stage to understand what the current stage is. Please replace with the
current state instead. I can suggest specific names if needed.

That is all I had time for right now; will have to come back to this
later. My apologies if any of the above is half-cooked, but you wanted
me to send this before I break...

Cheers,

Alex.

> The parser still only parses the request-line and greps the mime headers
> as a single token/blob into an SBuf for later old-parser stages to work
> with.
>
>
>
> On 4/01/2014 5:38 a.m., Alex Rousskov wrote:
>>
>>> /* NP: don't be tempted to move this down or remove again.
>>> * It's the only DDoS protection old-String has against long URL */
>>> if ( hp->bufsiz <= 0) {
>>> debugs(33, 5, "Incomplete request, waiting for end of request line");
>>> return NULL;
>>> } else if ( (size_t)hp->bufsiz >= Config.maxRequestHeaderSize && headersEnd(hp->buf, Config.maxRequestHeaderSize) == 0) {
>>> debugs(33, 5, "parseHttpRequest: Too large request");
>>> hp->request_parse_status = Http::scHeaderTooLarge;
>>> return parseHttpRequestAbort(csd, "error:request-too-large");
>>> }
>>
>> BTW, the above should be moved into the RequestParser code, with an
>> adjusted comment, and headersEnd() call moved/merged into the overall
>> parsing logic.
>
> I have done this in the patch as well while making the garbage prefix
> whitespace incrementally parse+discard.
>
> Although I am not yet 100% certain that I got all the places in the code
> shuffling the I/O buffer based on ClientHttpRequest::req_sz (which now
> omits any prefix garbage bytes and has already been consumed). Testing
> for the last day in regular traffic seems to show things fully working.
>
> Amos
>
Received on Tue Feb 18 2014 - 01:03:51 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 18 2014 - 12:00:13 MST