Re: [PATCH] HTTP Parser upgrade from Amos Jeffries on 2014-02-18 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 19 Feb 2014 03:17:23 +1300

On 18/02/2014 2:03 p.m., Alex Rousskov wrote:
> On 01/05/2014 08:21 PM, Amos Jeffries wrote:
>> Updated patch attached. It mostly follows the plan you laid out (I was
>> partially through an almost identical plan already, I merged the two
>> from where I was up to).
>>
>> We now have a two-class HTTP Parser hierarchy:
>> * base class Http1::Parser
>> - which contains the generic (but HTTP/1.x specific) parse members and
>> accessors
>>
>> * child class Http1::RequestParser
>> - which contains the reuqest-specific state members and parse logics
>> necessary to extend Http1::Parser into a HTTP Request message parser.
>
>
>> -HttpParser::clear()
>> +Http1::Parser::clear()
>
>> +Http1::RequestParser::clear()
>> +{
>> + Http1::Parser::clear();
>
> etc.
>
> Please use Http::One (the real namespace) instead of Http1 (an alias)
> when defining methods. Some compilers may be picky about how things are
> defined. Using a short alias when referring to already declared and
> defined names is fine, of course.
>
>

Okay. Done.

>> +void
>> +Http1::RequestParser::noteBufferShift(int64_t n)
>> +{
>
> Please make n constant if possible.
>

Done.

>
>> + bufsiz -= n;
>> +
>> + // if parsing done, ignore buffer changes.
>> + if (completedState_ == HTTP_PARSE_DONE)
>> + return;
>> +
>> + // shift the parser resume point to match buffer content
>> + parseOffset_ -= n;
>
> The comment seems to contradict the code. We are decreasing bufsiz even
> when parsing is done. This seems especially inconsistent given that we
> do not adjust the offset in that case.
>
> How about removing that if-statement for now? I am not sure we should be
> called after we are done, but the current -=n adjustments seem benign
> enough to let them be, at least for now.
>

It may be called after any of the calls to parse(), when the buffer has
removed the completed and no longer wanted prefix bytes.

Moving the if-statement to the top is fine. Removing it lets the code
possibly try to decrement the unsigned offset past 0.

>
>> +
>> +#if WHEN_INCREMENTAL_PARSING
>> +
>> + // if have not yet finished request-line
>> + if (completedState_ == HTTP_PARSE_NEW) {
>> + // check for and adjust the known request-line offsets.
>> +
>> + /* TODO: when the first-line is parsed incrementally we
>> + * will need to recalculate the offsets for req.*
>> + * For now, they are all re-calculated based on parserOffset_
>> + * with each parse attempt.
>> + */
>> + }
>> +
>> + // if finished request-line but not mime header
>> + // adjust the mime header states
>> + if (completedState_ == HTTP_PARSE_FIRST) {
>> + /* TODO: when the mime-header is parsed incrementally we
>> + * will need to store the initial offset of mime-header block
>> + * instead of locatign it from req.end or parseOffset_.
>> + * Since req.end may no longer be valid, and parseOffset_ may
>> + * have moved into the mime-block interior.
>> + */
>> + }
>> +#endif
>
> I am not sure those comments accurately reflect the best path forward. I
> suggest removing them for now so that we do not have to argue about it.
>

Okay. Removed.

>
>> + // stage 1: locate the request-line
>> + if (completedState_ == HTTP_PARSE_NEW) {
>> + if (skipGarbageLines() && (size_t)bufsiz < parseOffset_)
>> + return false;
>> + }
>> +
>> + // stage 2: parse the request-line
>> + if (completedState_ == HTTP_PARSE_NEW) {
>
> There are two ways to implement the above correctly IMO:
>
> A) Use one HTTP_PARSE_NEW state. Always try to skip garbage.
> B) Use two states. Skip garbage in the first state only.
>
> You appear to be doing (A) so I will focus on that one. In that
> approach, we should not try to optimize the second call away (it
> probably wastes a couple of CPU cycles now and is likely to lead to
> buggy code eventually). The code should look like this instead:
>
> // stage 1: parse optional whitespace followed by request-line
> if (completedState_ == HTTP_PARSE_NEW) {
> _start(HttpParserParseReqLine);
> skipGarbageLines();
> int retcode = parseRequestFirstLine();
> ...
> }
>
> If you disagree, I will write another email suggesting that the first
> if-statement is buggy, but I would rather not spend time on that now.

Doing the completedState_ change below this had to change to (B) with
NEW state being the garbage-collection time.

>
>> + int retcode = parseRequestFirstLine();
>
> Please make const if possible.
>

Done.

>
>> + /// attempt to parse a message from the buffer
>> + virtual bool parse() = 0;
>
> Please document what either "true" or "false" return value means. Parsed
> everything? Keep going? Etc. This is important for code correctness.
>

Done.

>
>> === renamed file 'src/HttpParser.h' => 'src/http/Http1Parser.h'
>
>
> If you are going to rename files, should we put HTTP/1 parser into
> src/http/one/Parser.h? Doing so will be more consistent with our naming
> policy and avoid creating a lot of files with Http1 prefix (that does
> not even correspond to the real namespace). Alternatively, avoid
> renaming for now.
>

You requested no more file renames until after auditing had finished. So
that is leftover from previous rounds. The files are planned to be
renamed and classes split according to source layout guidelines when the
code is approved.

If you are happy with the split now I will do the rename before sending
the next patch. That may be a day or two as I have to re-do the traffic
testing.

>
>> // Parser states
>> #define HTTP_PARSE_NONE 0 // nothing. completely unset state.
>> #define HTTP_PARSE_NEW 1 // initialized, but nothing usefully parsed yet.
>> +#define HTTP_PARSE_FIRST 2 // have parsed request first line
>> +#define HTTP_PARSE_MIME 3 // have located end of mime header block
>> +#define HTTP_PARSE_DONE 99 // have done with parsing so far
>
> I do not think we should be extending the use of these macros. Please
> replace with an enum (but also see immediately below):
>

I was hoping not to touch that yet. But Okay. Done.

>
>> + /// what stage the parser is currently up to
>> + uint8_t completedState_;
>
> Actually, it is what stage the parser has completed and that is rather
> awkward and error-prone because the code has to refer to the previous
> stage to understand what the current stage is. Please replace with the
> current state instead. I can suggest specific names if needed.
>

Done. Fixing this highlighted all the spots parseOffset_ needed updating
and what it should be set to. I think both are correct now.

Thank you
Amos
Received on Tue Feb 18 2014 - 14:17:37 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 18 2014 - 12:00:13 MST