Re: [PATCH] HTTP Parser upgrade

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 03 Jun 2014 03:45:00 +1200

On 21/05/2014 4:04 a.m., Alex Rousskov wrote:
> On 05/20/2014 05:17 AM, Amos Jeffries wrote:
>> On 20/05/2014 4:28 a.m., Alex Rousskov wrote:
>>> New temporary SBuf allocations for message parts will be removed as you
>>> polish the parsing code, right? Or are you proposing to commit those
>>> performance regressions first?
>
>> The temporary allocations for message parts should already be gone.
>
>
> I found these six (one of which was marked) and stopped looking for more:
>
> 1> + const char *url = SBuf(hp.requestUri()).c_str();
>
> AFAICT, this should be removed or marked as a performance regression.

This is replacing a xmalloc() + memcpy(). So at worst case it is a
no-change. Alternatively the c_str() might not reallocate and we
actually have a performance gain.

>
> 2> + if ((req.v_end - req.v_start +1) < 5 || buf.substr(req.v_start,
> 5).caseCmp(SBuf("HTTP/")) != 0) {
>
> This regression should be removed IMO. Use a static SBuf (slightly
> faster) or a c-string comparison method (slightly less code).
>

Replaced it with Http1magic parse test similar to the one you worked out
for HTTP/2 now that HTTPbis Part1 has been assigned as RFC 7230. That
also helps drop half the version parsing code.

>
> 3> + const char *url = SBuf(hp.requestUri()).c_str(); // XXX:
> performance regression. convert to SBuf parse
>
> This one is properly marked, thank you.
>

Now replaced with that Tokenizer/Sbuf parse.

>
> 4> + SBuf m(start, start-t);
>

Marked now. That method should not be used though. It wrongly replicates
part of the HttpParser code.

> 5> + bool result = header.parse(hp.mimeHeader().c_str(),
> hp.headerBlockSize());

Okay, Marked.

>
> 6> + for (p = mimeHeader().c_str(); *p; p += strcspn(p, "\n\r")) {
>
> AFAICT, these should be removed or marked as performance regressions.
>
> There are probably more suspects -- I did not check the whole patch.
> Some of them may be debatable, but are you sure all these potential (at
> least) performance regressions are worth officially adding now? If you
> are sure, please do indicate that this change contains performance
> regressions in your commit message.

Now that we have Tokenizer available this whole method has been
re-written for better performance and one less bug.

>
>> Now
>> that trunk I/O buffer is an SBuf the message parts should be just
>> buf.substr() references to that I/O MemBlob backing store.
>
> Just to minimize misunderstanding and future surprises: Please note that
> I cannot, in good conscious, support code that uses the SBuf Comm I/O
> you added to trunk. Fortunately, this patch does not use that code
> directly (AFAICT), so we do not have to fight over it right now, but it
> does add to the code that [indirectly] relies on that broken API so it
> aggravates what I perceive as a problem.
>

Sure.

We can revert the buffer and re-SBuf it on every parse attempt if you
are that set against the existing SBuf I/O design. Not sure you would
like the resulting performance regression though.

Or we can try to redesign the comm_read I/O for better filling an SBuf.

Further discussion ongoing in the other "tunnel.cc I/O" thread.

>
>> + explicit Parser(const Parser&); // do not implement
>> + Parser& operator =(const Parser&); // do not implement
>
> Why add these two? The Parser class appears to be perfectly copyable "as
> is" to me (which is a good thing!). Did I miss any non-copyable fields?

Those are there because there should only ever be one owner of a Parser
which also feeds it with bytes to process - the ConnStateData currently.
It gets passed around via & params or a ref-counted Pointer. If copies
of the parser itself are being grabbed by other classes something has
changed with the design and we need to check the code using it.

>> + explicit RequestParser(const RequestParser&); // do not implement
>> + RequestParser& operator =(const RequestParser&); // do not implement
>
> Why add these two? The RequestParser class appears to be perfectly
> copyable "as is" to me (which is a good thing!).

As above.

>> + virtual ~RequestParser() {}
>
> If you remove the RequestParser copying restrictions, please also remove
> this destructor as unneeded (provided by the compiler) and violating the
> Big Three rule (if copying restrictions are removed).
>
>
>> + RequestParser() : Parser() {}
>
> I would also remove the above as unneeded (provided by the compiler).
>

Removing results in several compile errors "no matching function for
call to Http::One::RequestParser::RequestParser()"

I've added unit tests now to check the state on construction is correct
and found that RequestParser() needs to call clear() itself as the
parent does not call the right virtual clear().

>
>> + bool result = header.parse(hp.mimeHeader().c_str(), hp.headerBlockSize());
>
> Please make that variable constant.
>

Done.

>
>> + SBuf tmp = buf.substr(req.m_start, req.m_end - req.m_start + 1);
>> + method_ = HttpRequestMethod(tmp);
>
> Please make "tmp" constant if possible.

Done.

>
>> + virtual int64_t firstLineSize() const = 0;
>
>> + int64_t headerBlockSize() const {return mimeHeaderBlock_.length();}
>
>> + int64_t messageHeaderSize() const {return firstLineSize() + headerBlockSize();}
>
> Please typdef these and other identical size-related return types.
> Reusing SBuf::size_type in that typedef may be a good idea.

Done.

>> /** HTTP/1.x protocol parser
>> *
>> * Works on a raw character I/O buffer and tokenizes the content into
>> * the major CRLF delimited segments of an HTTP/1 procotol message:
>> *
>> * \item first-line (request-line / simple-request / status-line)
>> * \item mime-header 0*( header-name ':' SP field-value CRLF)
>> */
>>
>> and
>> /** HTTP/1.x protocol request parser
>> *
>> * Works on a raw character I/O buffer and tokenizes the content into
>> * the major CRLF delimited segments of an HTTP/1 request message:
>> *
>> * \item request-line (method, URL, protocol, version)
>> * \item mime-header (set of RFC2616 syntax header fields)
>> */
>
> Ideally, the child class description should not repeat what the parent
> class says but highlight what the child adds to the parent class. That
> approach makes it easier to understand the class hierarchy as a whole.
> However, this and other wording problems are minor and not worth arguing
> about IMO.
>

Okay. Not changed at this point.

>
>> + SBuf buf;
>
> The above key (and public!) Parser data member is missing a description.
> When you add it, please mention that the buffer contains yet-unparsed
> data only and should be treated as read-only outside the Parser class.
> It may even be a good idea to convert that member to buf_ and provide a
> const buf() accessor. Your call.
>

Done. It is now documented and private with remaining() const accessor.

>
> I suggest that the next patch revision is tested for performance and
> HTTP/1.1 compliance regressions before it is committed. Do you agree?
>

Underway, new patch when the results are in.

FYI: I am attempting to make this RFC 7230 compliant (the HTTPbis
document on HTTP/1 message syntax). Is coadvisor able to test that
latest criteria yet?

Amos
Received on Mon Jun 02 2014 - 15:45:20 MDT

This archive was generated by hypermail 2.2.0 : Tue Jun 03 2014 - 12:00:17 MDT