Re: [PATCH] HTTP Parser upgrade from Alex Rousskov on 2014-05-20 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 20 May 2014 10:04:47 -0600

On 05/20/2014 05:17 AM, Amos Jeffries wrote:
> On 20/05/2014 4:28 a.m., Alex Rousskov wrote:
>> New temporary SBuf allocations for message parts will be removed as you
>> polish the parsing code, right? Or are you proposing to commit those
>> performance regressions first?

> The temporary allocations for message parts should already be gone.

I found these six (one of which was marked) and stopped looking for more:

1> + const char *url = SBuf(hp.requestUri()).c_str();

AFAICT, this should be removed or marked as a performance regression.

2> + if ((req.v_end - req.v_start +1) < 5 || buf.substr(req.v_start,
5).caseCmp(SBuf("HTTP/")) != 0) {

This regression should be removed IMO. Use a static SBuf (slightly
faster) or a c-string comparison method (slightly less code).

3> + const char *url = SBuf(hp.requestUri()).c_str(); // XXX:
performance regression. convert to SBuf parse

This one is properly marked, thank you.

4> + SBuf m(start, start-t);

5> + bool result = header.parse(hp.mimeHeader().c_str(),
hp.headerBlockSize());

6> + for (p = mimeHeader().c_str(); *p; p += strcspn(p, "\n\r")) {

AFAICT, these should be removed or marked as performance regressions.

There are probably more suspects -- I did not check the whole patch.
Some of them may be debatable, but are you sure all these potential (at
least) performance regressions are worth officially adding now? If you
are sure, please do indicate that this change contains performance
regressions in your commit message.

> Now
> that trunk I/O buffer is an SBuf the message parts should be just
> buf.substr() references to that I/O MemBlob backing store.

Just to minimize misunderstanding and future surprises: Please note that
I cannot, in good conscious, support code that uses the SBuf Comm I/O
you added to trunk. Fortunately, this patch does not use that code
directly (AFAICT), so we do not have to fight over it right now, but it
does add to the code that [indirectly] relies on that broken API so it
aggravates what I perceive as a problem.

> + explicit Parser(const Parser&); // do not implement
> + Parser& operator =(const Parser&); // do not implement

Why add these two? The Parser class appears to be perfectly copyable "as
is" to me (which is a good thing!). Did I miss any non-copyable fields?

And in case the above removal triggers more Big Three doubts: If you
remove copying restrictions, please note that it is OK to leave the
virtual destructor (as long as it does not do anything!). A lonely
do-nothing virtual destructor in the base class does not violate the Big
Three rule because it simply adds a virtual destruction API, not a
specific way to clean up the class it belongs to.

> + explicit RequestParser(const RequestParser&); // do not implement
> + RequestParser& operator =(const RequestParser&); // do not implement

Why add these two? The RequestParser class appears to be perfectly
copyable "as is" to me (which is a good thing!).

> + virtual ~RequestParser() {}

If you remove the RequestParser copying restrictions, please also remove
this destructor as unneeded (provided by the compiler) and violating the
Big Three rule (if copying restrictions are removed).

> + RequestParser() : Parser() {}

I would also remove the above as unneeded (provided by the compiler).

> + bool result = header.parse(hp.mimeHeader().c_str(), hp.headerBlockSize());

Please make that variable constant.

> + SBuf tmp = buf.substr(req.m_start, req.m_end - req.m_start + 1);
> + method_ = HttpRequestMethod(tmp);

Please make "tmp" constant if possible.

> + virtual int64_t firstLineSize() const = 0;

> + int64_t headerBlockSize() const {return mimeHeaderBlock_.length();}

> + int64_t messageHeaderSize() const {return firstLineSize() + headerBlockSize();}

Please typdef these and other identical size-related return types.
Reusing SBuf::size_type in that typedef may be a good idea.

> /** HTTP/1.x protocol parser
> *
> * Works on a raw character I/O buffer and tokenizes the content into
> * the major CRLF delimited segments of an HTTP/1 procotol message:
> *
> * \item first-line (request-line / simple-request / status-line)
> * \item mime-header 0*( header-name ':' SP field-value CRLF)
> */
>
> and
> /** HTTP/1.x protocol request parser
> *
> * Works on a raw character I/O buffer and tokenizes the content into
> * the major CRLF delimited segments of an HTTP/1 request message:
> *
> * \item request-line (method, URL, protocol, version)
> * \item mime-header (set of RFC2616 syntax header fields)
> */

Ideally, the child class description should not repeat what the parent
class says but highlight what the child adds to the parent class. That
approach makes it easier to understand the class hierarchy as a whole.
However, this and other wording problems are minor and not worth arguing
about IMO.

> + SBuf buf;

The above key (and public!) Parser data member is missing a description.
When you add it, please mention that the buffer contains yet-unparsed
data only and should be treated as read-only outside the Parser class.
It may even be a good idea to convert that member to buf_ and provide a
const buf() accessor. Your call.

I suggest that the next patch revision is tested for performance and
HTTP/1.1 compliance regressions before it is committed. Do you agree?

Thank you,

Alex.
Received on Tue May 20 2014 - 16:05:08 MDT

This archive was generated by hypermail 2.2.0 : Fri May 30 2014 - 12:00:14 MDT