Re: [PREVIEW] HTTP Parser upgrade

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 19 May 2014 10:28:22 -0600

On 05/19/2014 12:29 AM, Amos Jeffries wrote:

> Updated patch attached. PREVIEW since the file renaming has been
> omitted, as have incomplete Tokenizer related changes.

I have not reviewed most of the details, but the preview looks good to
me overall.

New temporary SBuf allocations for message parts will be removed as you
polish the parsing code, right? Or are you proposing to commit those
performance regressions first?

At some point, please check for The Big Three rule violations in new or
rewritten classes. Removing _unused_ copying is fine if it is easier
than correctly supporting all three.

Please add "explicit" to any new/modified single-required-parameter
constructor or document why we want to support silent type conversion.
Parser(SBuf) is the one I noticed but there may be others.

> /** HTTP protocol parser.
> *
> * Works on a raw character I/O buffer and tokenizes the content into
> + * either an error state or HTTP procotol major sections:

It is not possible to tokenize something into an error state (unless you
are parsing errors), and major protocol sections are not exactly
"tokens" either. Suggestion:

/** Parses raw buffer to extract major HTTP message sections: ...

(the existence of an error state is obvious for any parser and does not
need to be documented in the class description IMO, but you can add it
if you prefer, of course).

Please consider s/isDone()/done()/ -- see the last blob in "Member
naming" item #3 at http://wiki.squid-cache.org/Squid3CodingGuidelines

Consider s/doneBytes/parsedBytes/ although I think that method is not
needed right now, as discussed further below.

> + /** Whether the parser is already done processing the buffer.
> + * Use to determine between incomplete data and errors results
> + * from the parse.
> + */
> + bool isDone() const {return parsingStage_==HTTP_PARSE_DONE;}

s/determine/distinguish/

However, it is not clear how and why isDone() should be used for
distinguishing errors from "need more data". I suspect that this needs
to be renamed or redescribed but I am not sure about your intent here.
Perhaps this would work?

  /// Whether we finished parsing (successfully or otherwise).

Personally, I find the "reverse" methods like "needsMoreData()" easier
to work with, but that is a matter of taste:

    if (parser->needsMoreData())
       return; // wait for more bytes (clear and simple)

    if (!parsedAll)
       ... does not need more and failed to parse: must be an error ...

versus:

    if (!parser->isDone())
       What does isDone() return on errors? And we are negating, so ...

    if (!parsedOk)
       ... handle error or does it need more data? ...

> + if (hp.doneBytes()) {
> + // we are done with some of the buffer. update the ConnStateData copy now.
> + csd->in.buf = hp.buf;
> + }

I recommend removing the doneBytes() condition to simplify code and
minimize the number of different paths we need to worry about:

  // the parser may have consumed some data, keep us synced
  csd->in.buf = hp.buf;

If our code is any good, the removal of the guard will not introduce any
significant performance penalty but will simplify the code logic
significantly: The buffers are always the same after parsing.

> +namespace Http {
> +namespace One {
...
> /** HTTP protocol parser.

Placement inside Http::One suggests that this is an HTTP/1 protocol
parser, not a generic HTTP protocol parser. Please document as such if
the placement is correct.

> + Parser(const SBuf &aBuf) { reset(aBuf); }

> - void reset(const char *aBuf, int len);
> + void reset(const SBuf &aBuf);

Have you tried removing the above constructor and reset() method while
supplying the buffer directly to the parse method as a parameter
instead? I suspect it may be better than doing this direct manipulation
of the parser buffer:

> + if (!parser_ || parser_->isDone())
> + parser_ = new Http1::RequestParser(in.buf);
> + else // update the buffer space being parsed
> + parser_->buf = in.buf;

Thank you,

Alex.
Received on Mon May 19 2014 - 16:28:41 MDT

This archive was generated by hypermail 2.2.0 : Tue May 20 2014 - 12:00:14 MDT