Re: [PATCH] refactor ConnStateData pipeline handling

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 12 May 2014 01:57:08 +1200

On 10/05/2014 8:56 a.m., Alex Rousskov wrote:
> On 05/08/2014 11:17 AM, Amos Jeffries wrote:
>> This refactors the request pipeline management API to use std::queue
>> instead of a custom linked-list with accessors spread over both
>> ConnStateData and ClientSocketContext.
>>
>> To do this a new class Pipeline is created with methods wrapping
>> std::queue API and extending it slightly to meet the HTTP/1.1 pipeline
>> behaviours and perform basic stats gathering. The pipeline management
>> methods and state variables are moved inside this class.
>
>> +/// a pipeline of received HTTP/1 request transactions
>> +class Pipeline {
>
> Please avoid mixing "request" and "transaction" together. In most
> contexts, "transaction" is a (request, response) pair. IIRC,
> ClientSocketContext represents just a request, not transaction as a
> whole. At some point, we even had active responses without
> ClientSocketContext (I do not know whether that is still true).
>
> I consider the scope of ClientSocketContext and, hence, Pipeline
> important for this project. For example, if ClientSocketContext is about
> a request, not a response, then we should get rid of it when we switch
> to the next request, but if ClientSocketContext is about transaction,
> then we should keep it until response processing is complete. I think it
> is the former.
>
> If you agree, please avoid using "transaction" throughout new comments.

I dont. The queue is for full transactions in HTTP semantics.
As per http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-26
section 6.3.2 paragraph 1 sentence 2.

 * ClientSocketContext appears to be performing the "MAY process ... in
parallel" part by registering itself with the Pipeline then continuing
processing for a while.

 * ConnStateData is using this Pipeline class to ensure the "MUST send
... in the same order that the requests were received" part.

ClientSocketContext represents a whole transaction. The request, where
its up to in processing, the reply headers, and where the reply body is
up to in delivery. It is "active" from end of parsing request headers to
after delivering the last byte of response body into client socket.
 - previous to this patch it was also responsible for the initial
read(2) and parsing of the next N transactions after itself. That is now
in kick().

>
>> + /// Transactions parsed from the connection but possibly incomplete.
>> + /// For HTTP/1 these are handled as a FIFO queue of request messages.
>> + std::queue<ClientSocketContextPointer> Q;
>
> Non-static class data members should start with a lower case letter.
> Please avoid single-letter member names. "requests" would work OK here,
> I guess.
>

Sorry, being funny with phonetics.

Making it "requests" is fine for the current meaning.

FYI: with a small change from ClientSocketContext to something a little
less request oriented (say std::queue<MasterXaction>) we can use this
Pipeline class for both client-side and server-side pipeline management
in HTTP/1. And with additional change to dqueue or list internal storage
we can use it for HTTP/2 streams management as well.

>
>> + /// get the first transaction context in the pipeline
>> + /// regardless of whether it is completed or not
>> + ClientSocketContextPointer get() const;
>
> "get" is a rather generic name for a "pipeline" object. I suggest to use
> "front" instead.
>

Done.

>
>> + /// Tell everybody about the err, and abort all waiting transactions.
>> + /// A value of 0 just aborts.
>> + void notifyAll(const int xerrno);
>
> Note that this is not what the old notifyAllContexts() did. You are
> converting a logging-related method into request termination method. You
> can do that, of course, but renaming the method would be wise.
> terminateAll() may work better.
>

Done. Dropped the second half of the comment as well, since it was wrong
to begin with (was notify unless completed) and without doneAll() notify
has to always happen. Which is okay because if there is anything still
queued it definitely needs a notify that its being dropped.

>
>> +/// propagates abort event to all contexts
>> +/// while clearing the queue
>> +void
>> +Pipeline::notifyAll(int xerrno)
>
> I am guessing a description in .cc file is not needed because we
> describe this public method in .h file, as we should. The "to all
> contexts" part of the .cc description is lying because the event is not
> propagated to "finished" contexts.

Right. Dropping.

>
> Please note that the old method was used to propagate two different
> events: timeout and connection abort. In your comments, it is better to
> talk about "error event" propagation rather than "abort event"
> propagation because the former includes both aborts and timeouts.
>
>
>> + while (!Q.empty()) {
>> + ClientSocketContextPointer context = Q.front();
>> + Q.pop();
>> + // if the transaction is already complete, just drop it.
>> + if (!context->doneAll()) {
>
> Do we have to keep "completed" contexts stored in the pipeline? Can we
> simplify things by equating pipeline membership with "completeness"? In
> other words, whenever a context object becomes "complete" or "finished",
> we remove it from the pipeline. This will simplify and clarify many
> things, including avoid concerns that the new Pipeline::empty() method
> returns false for an essentially empty pipeline that does not have any
> requests waiting.

With a little more shuffling we can avoid having doneAll(). I have had
to add a pop() method though. Which seems risky as if it gets called by
code other than the ClientSocketContext at the queue front things will
go badly wrong.

When a transaction becomes completed ClientSocketContext calls
connIsFinished() which does the kick() and de-registration (in whichever
order necessary so that the clientStreams pipes are detatched *before*
deregistration).

If empty() returns false with a completed front transaction it means the
front request has "completed" without calling its connIsFinished()
method and kick() has not occured to begin transition to the next
transaction. It may be technically completed, but the clientStreams
pipes are still attached to something that could write to them. So
pipeline.front() still has something to do to trigger detatching.

A further refactor in ClientSocketContext (outside this patch scope) is
still required to fix problems like:
"
 * We are done with the response, and we are either still receiving request
 * body (early response!) or have already stopped receiving anything.
 *
 * If we are still receiving, then clientParseRequest() below will fail.
 * (XXX: but then we will call readNextRequest() which may succeed and
 * execute a smuggled request as we are not done with the current request).
 *
 * If we stopped because we got everything, then try the next request.
 *
 * If we stopped receiving because of an error, then close now to avoid
 * getting stuck and to prevent accidental request smuggling.
"

>
> Is removing connRegistered_ (and using pipeline registration instead)
> doable?

I am not confident enough yet that the ClientSocketContext logics around
transaction completion are right and will stay that way trough all the
future client-side refactoring. Those comments and checks about
early-response hint at bugs hiding.
FWIW the flag is only used now to assert() unique registrations and safe
cleanup inside ClientSocketContext.

>
>> +void
>> +Pipeline::clearCompleted()
>> +{
>> + // find next incomplete request in the queue (if any)
>> + while (!Q.empty()) {
>> + if (!Q.front()->doneAll())
>> + break;
>> + debugs(33, 3, "Pipeline " << (void*)this << " drop " << Q.front());
>> + Q.pop();
>> + }
>> +}
>
>> + // find next incomplete request in the queue (if any)
>> + pipeline.clearCompleted();
>
> This method does not find anything from the caller point of view. It
> removes leading "completed" contexts.
>

Dropped the whole method. With ClientSocketContext once again
self-deregistering this would be an invalid action and there is no way
to check completion of in-queue contexts from here.

>
>> - freeAllContexts();
> ...
>> + if (!pipeline.empty()) {
>> + // BUG? starting SSL with a pipeline queue of requests still waiting for non-SSL replies.
>> + pipeline.notifyAll(0); // abort the lot.
>> + }
>
> Please remove the emptiness check. The notifyAll() call should handle
> empty queue correctly itself, and the caller code should not imply
> otherwise (because it gets copied/misinterpreted/etc). If you remove the
> check, please update the "BUG?" description to be specific to a
> non-empty pipeline

Done.
I have added an assert for good measure to ensure that only one request
exists. Assuming its the CONNECT for now.

The proper thing to do is have the CONNECT request use
ClientSocketContext::connIsFinished() and untie itself from any context
state. Someone with better ssl-bump knowledge needs to determine where
the best place to drop that CONNECT actually is. This current location
seems to be wrong.
 For now I am retaining the status-quo.

>
> FWIW, I suggest using XXX instead of BUG to minimize the number of
> various internal markers with very similar meanings:

Done.

>> + /// whether this context can be discarded
>> + bool doneAll() const { return !connRegistered_;}
>> +
>
> I do not think we should conflate the new method with
> AsyncJob-specific/internal doneAll(), to avoid the wrong implication
> that ClientSocketContext is a job. How about finished()?

Dropped entirely for now. With self-deregistering ClientSocketContext
its not useful.

FWIW ClientSocketContext is one of those "hairy" objects halfway between
a state engine and AsyncJob. I have a feeling it should be an AsyncJob
and offload some/most transaction state into MasterXaction with the
pipeline holding a reference to the MasterXaction.
 That will probably be the next thing I look at doing in this direction.

>
>> This refactor reveals a potential bug in ssl-bump. For now I have marked
>> and retained the buggy behaviour to limit scope of this change.
>>
>> What should be happening when bumping:
>>
>> * when a CONNECT is encountered the read(2) operations should "pause"
>> until the pipeline is drained up to the CONNECT.
>> - I believe the request processing code already does this in a
>> roundabout way.
>
> And if it does not, this is not an SslBump bug. Regular CONNECT tunnels
> ought to do the same.

Yes. Not saying it is, this CONNECT behaviour is a base fact in the
logic chain following. Half the code in this whole area assumes that it
is happening, half takes (incorrect?) protective measures to destroy or
cleanup things which should not exist.

>
>> * the actions of bumping should be to emit the 200 OK response, do the
>> bump, then adjust the CONNECT request context such that doneAll()
>> returns true, then kick() the ConnStateData to start processing the
>> HTTPS traffic.
>> - this sequence also hints that the bumping operations should be
>> driven/performed by ClientSocktContext, rather than ConnStateData where
>> they are now. Both have direct access to the client socket so that is
>> not a problem, and the context has access to getConn() to change its
>> state details.
>
> I am not sure ClientSocketContext is the right place for bumping
> operations because we are not talking about a temporary detour to
> process a single HTTP request. In fact, the last request processing is
> over when we start bumping. Instead, we are switching into a special
> tunnel mode. The best place is probably neither ConnStateData nor
> ClientSocketContext but an SslBump-equivalent of TunnelStateData.

You may be right. However IMO it needs to at least be driven/started by
a call done from ClientSocketContext as it processes the CONNECT.
(tunnelStart should also be triggered in the same sort of way.)

The warning note at the start of ConnStateData::getSslContextStart() is
a huge hint that CONNECT is being "finished" too early:

"
 careful: pipeline.terminateAll(0) above frees request, host, etc.
"

The current state of detatchment by tunnel seems to be a big cause of
readMore related bugs.

>
> I do not volunteer to change this now, especially since Peek and Splice
> is fiddling with the same code, but we should keep this problem in mind.
>
>
>> NP: at the point when the CONNECT is bumped if the pipeline holds
>> *anything* other than a single CONNECT request context which is already
>> doneAll()==true, the ssl-bumping MUST NOT be started. Wait until that
>> condition is met before switching to HTTPS.
>
> It may be even more complicated/messy. IIRC, ClientSocketContext objects
> [in the pipeline] represent client requests and it may be possible that
> the "pipeline" is empty, but some active response(s?) are still going
> on. Code comments imply this is possible, but I am not sure whether
> those comments are still valid.

They better not be active responses. Because when keepAliveNextRequest
(now ConnStateData::kick()) is called the clientStreams pipes were just
detatched.
 Active request+response transactions are still a possibility, and the
list of them is the contents of Pipeline queue.

Pipeline should only ever be empty when tunneling or idle client
connection. Because that is not HTTP traffic. Even for CONNECT tunnels
IMO the CONNECT should be left in pipeline until the tunnel context is
ended.
 Today that would be connection closed by tunnel.cc code, or SSL-bump
completed switching to HTTPS. In either case the removal should not be a
freeAllContexts/pipeline.terminateAll but a pipeline.pop() for just the
CONNECT.

The attached patch has now had several days usage and several thousand
requests with no visible problems.

Amos

Received on Sun May 11 2014 - 13:57:31 MDT

This archive was generated by hypermail 2.2.0 : Thu May 15 2014 - 12:00:23 MDT