Re: [PATCH] HTTP Parser upgrade

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 03 Jan 2014 02:45:23 +1300

On 2/01/2014 10:08 a.m., Alex Rousskov wrote:
> On 01/01/2014 05:15 AM, Amos Jeffries wrote:
>> This patch renames the HttpParser class as Http1Parser (to differentiate
>> from the future planned Http2Parser) and moves it into the Http namespace.
> ...
>> Also, the HttpRequestMethod class is moved into the Http namespace and
>> library to reduce dependencies on the parser class outside the library.
>
>
>> Overall this does not seem to affect performance much.
>
> Any performance testing performed on this patch to support the above
> estimation?

Semi-casual runs of apachebench on trunk rev.13197 compared with this
branch as of the patch submitted for audit. Using a HIT server object
with warm cache from http://example.com/. Difference was consistently
within +/- 1 req/sec of trunk, with variation decreasing as concurrency
increased.

>> Http::Http1ParserPointer
>
> The proposed naming approach results in awkward names with too many
> "HTTP"s in them. I suggest a different arrangement:
>
> * namespace Http // stuff common to HTTP/1 and HTTP/2 handling
> * namespace Http::One // stuff specific to HTTP/1 handling
> * namespace Http::Two // stuff specific to future HTTP/2 handling
> * No Http prefixes for names inside the above three namespaces.
>
> This alternative approach would result in
>
> Http::One::Parser
> Http::Two::ParserPointer
>
> instead of
>
> Http::Http1Parser
> Http::Http2ParserPointer
>

I was wondering about Parser1 / Parser1Pointer , Parser2 /
Parser2Pointer as an alternative?

Only the parser class seems to be needing the version difference.
Everything else seems to be generic or relevant only to one of the versions.

> Same length, but better looking names without repetition of information.
> Moreover, after adding these global namespace aliases (to http/forward.h):
>
> namespace Http1 = Http::One;
> namespace Http2 = Http::Two;
>
> we can use
>
> Http1::Parser
> Http2::ParserPointer
>
> which are even shorter that the proposed names.
>
> Note that declaring Http1 and Http2 namespaces outside of the Http
> namespace (rather than namespace aliases) is probably a bad idea because
>
> * it will force us to use Http:: prefixes inside Http1 and Http2
> namespaces, which is kind of wrong, OR
>
> * force us to add blank "use Http" statements, which opens another can
> of warms, especially if used in header files.
>

Um..

>
> The final sketch, putting all the naming-related pieces together:
>
>> namespace Http {
>> namespace One {
>> class Parser ...
>> };
>> namespace Two {
>> class Parser ...
>> };
>> };
>> namespace Http1 = Http::One;
>> namespace Http2 = Http::Two;
>

You just said that was a bad idea?

> You can even place all HTTP/N-specific protocol code into src/http/N/
> directories, but that may be an overkill today.
>
>
>> === modified file 'src/client_side.cc'
>
>> +prepareAcceleratedURL(ConnStateData * conn, ClientHttpRequest *http, const Http::Http1ParserPointer &hp)
> ...
>> +prepareTransparentURL(ConnStateData * conn, ClientHttpRequest *http, const Http::Http1ParserPointer &hp)
>
> If possible, pass references to the parser instead of pointers. All
> other factors being equal, references are cheaper and safer than
> pointers, so code that does not store an object (or pass an object for
> storage by others) should accept a reference, especially when the object
> is guaranteed to exist in the callers.
>

Done. De-referenced the parser pointer in ConnStateData before passing
to the entire call chain as regular non-const object reference.

In order to make the parser incremental and not be reset on every input
byte (worst case slow-loris attack) the Parser object is allocated
dynamically by ConnStateData when a new request is expected.

>
>> + const char *url = hp->requestUri().c_str();
>> if (*url != '/') {
>
> Declare the "url" pointer itself as constant, to expose "url is no
> longer the same as requestUri()" problems discussed further below.
>
>
>> + const char *url = hp->requestUri().c_str();
>> if (*url != '/') {
>
> Please add an "XXX: performance regression" comment for the above if you
> do not want to convert the rest of url-using code to use the new SBuf API.
>

Yes that was the intention. I am proposing this patch as an initial
merge before we start implementing anything on top of the incomplete
Tokeniser.

Comment added for now.

>
>> - int url_sz = strlen(url) + 32 + Config.appendDomainLen +
>> + int url_sz = hp->requestUri().length() + 32 + Config.appendDomainLen +
>> ...
>> - int url_sz = strlen(url) + 32 + Config.appendDomainLen;
>> + int url_sz = hp->requestUri().length() + 32 + Config.appendDomainLen;
>
> AFAICT, there are cases where url is no longer the same as requestUri()
> by the time you get to this code in prepareAcceleratedURL().

Yes there are cases when it is shortened. I do not see any cases where
it is extended, so this is adding a small over-allocation of memory by
url_sz until we fully convert to SBuf.
NP: The magic +32 is already over-allocating for the scheme name and
delimiters.

>
> Please make url_sz constant if possible since you are modifying its
> declaration. Same for url_sz in prepareTransparentURL().
>

Done.

>
>> - if (*url != '/')
>> + if (hp->requestUri()[0] != '/')
>> return; /* already in good shape */
>
> The old code "worked" for empty URLs (""). The new does not. I do not
> know whether empty URLs are possible at this point. If they are not
> possible, perhaps add a comment or even a Must()?
>

The parseFirstLine logics ensure there is always a URL. Either by
treating the version tag as the URL, or aborting with a parse error.

Do you really think we need to add an unused check for length() ?

>
>> + bool parsedOk = hp->parseRequest();
>
> Please declare as constant if possible.

Done.

>
>> === renamed file 'src/HttpParser.h' => 'src/http/Http1Parser.h'
>
>> + /// the request-line URI if this is a request, or an empty string.
>> + SBuf requestUri() const {return uri_;}
>
> Consider returning "const SBuf &" to avoid refcounting overheads. Why is

const SBuf &uri() const {retun uri_;}

const SBuf uri() const {retun uri_;}

SBuf &uri() {retun uri_;}

==> error: passing âconst SBufâ as âthisâ argument of âconst char*
SBuf::c_str()â discards qualifiers

SBuf &uri() const {retun uri_;}

==> invalid initialization of reference of type 'SBuf&' from expression
of type 'const SBuf'

Whereas return via copy-construct provides a non-const SBuf using the
non-allocating SBuf copy against the shared MemBlob.

> this method in the general message parser? Should not it (and the
> corresponding data member) be placed in HTTP *request* parser?
>

The request parsing is in the generic HttpParser. The response parsing
is in the generic HttpMsg class but the intention is to move both into
here and:
 1) avoid the current design of 2-pass parsing: a) generic-parse just to
determine it is a request-line, then b) re-parse with HttpRequest to get
the fields already found by HttpParser.
 2) avoid adding parser overheads to the HttpRequest and HttpResponse
classes.
 3) avoid having to allocate a HttpRequest/HttpResponse object just to
test parsing for invalid request/responses.
 4) make HttpRequest and HttpResponse generic enough for use in both
protocols and be spawned by the parser(s).

This accessor is the one to retrieve the (1a) request-line data field
for use in (1b) creating the HttpRequest object without re-parsing. It
can also be used anywhere the raw-URI is needed before HttpRequest
exists - which is what you see in the current patch.

(Same goes for the method accessor.)

We also need to work around two other traffic flow properties:

 * in HTTP/2 the response parse needs to handle PUSH_PROMISE "requests",
so the API separation becomes more con than pro.

 * request parsing may result in a response error object being
instantiated. And thus a status code results from the request-line.

I figure its better at this stage to have a set of accessors for each of
the possible fields (request-line, status-line and mime-header areas)
and two parse methods to be called as appropriate from the client-side
or server-side as to which message type is to be interpreted.

The final intended result after everything is finished is for HttpParser
to spawn Http::Request and/or Http::Reply filled with the basic details
the Parser detects. So most of the uses you see in this patch will
eventually become things using the HttpRequest object and its accessors.
But that is some ways down the track still.

>
>> + /// the HTTP method if this is a request method
>> + const HttpRequestMethodPointer & method() const {return method_;}
>
> Similar concerns here. This should be in a request parser IMO.
>

Same results as uri() accessor. Although this one works with const
return value because assignment is a data copy of the image String.

> Also, s/a request method/a request message/ if method() stays in the
> general message parser.

Done.

>
> Finally, either this method should return a method object (rather than a
> pointer to it) OR we should get rid of METHOD_NONE. Having two ways to
> signal a missing method is too much IMO. The former is probably easier
> and requires fewer code changes.

Tryimg. I think the Pointer was there to avoid data copies. The method
class is relatively heavy to copy/assign with its String internals.

NOTE: New patch to follow once the remaining questions above are agreed
and branch re-tested.

Amos
Received on Thu Jan 02 2014 - 13:45:33 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 03 2014 - 12:00:10 MST