Re: [PATCH] refactor ConnStateData pipeline handling

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 14 May 2014 15:30:42 -0600

On 05/11/2014 07:57 AM, Amos Jeffries wrote:

> ClientSocketContext represents a whole transaction. The request, where
> its up to in processing, the reply headers, and where the reply body is
> up to in delivery. It is "active" from end of parsing request headers to
> after delivering the last byte of response body into client socket.

OK, I will assume the above description going forward.

That description implies that we may have two or more "active"
ClientSocketContext at the same time because the context reading the
request is not necessarily the context writing the response (when
pipelining is enabled). If there is a reading context, it has to be the
one in front of the pipeline queue. If there is a writing context, it
has to be the one at the back of the pipeline queue.

I recommend recording this assumption as the ClientSocketContext class
description.

>> Please note that the old method was used to propagate two different
>> events: timeout and connection abort. In your comments, it is better to
>> talk about "error event" propagation rather than "abort event"
>> propagation because the former includes both aborts and timeouts.

>>> + while (!Q.empty()) {
>>> + ClientSocketContextPointer context = Q.front();
>>> + Q.pop();
>>> + // if the transaction is already complete, just drop it.
>>> + if (!context->doneAll()) {
>>
>> Do we have to keep "completed" contexts stored in the pipeline? Can we
>> simplify things by equating pipeline membership with "completeness"? In
>> other words, whenever a context object becomes "complete" or "finished",
>> we remove it from the pipeline. This will simplify and clarify many
>> things, including avoid concerns that the new Pipeline::empty() method
>> returns false for an essentially empty pipeline that does not have any
>> requests waiting.

> With a little more shuffling we can avoid having doneAll(). I have had
> to add a pop() method though. Which seems risky as if it gets called by
> code other than the ClientSocketContext at the queue front things will
> go badly wrong.

If you are worried about that misuse, consider adding a "caller"
parameter to pop() and have pop() assert that the caller is the front
context. You may rename the method to popMe() and explain why this
safety precaution is needed.

> + connRegistered_ = false;
> /* we can't handle any more stream data - detach */
> clientStreamDetach(getTail(), http);
> +
> + assert(conn->pipeline.front() == this);
> + conn->pipeline.pop();

If possible, please set connRegistered_ to false immediately before or
after pop(). I am worried about clientStreamDetach() side-effects
reaching a ClientSocketContext object that is no longer marked as
registered but is still in the pipeline queue, breaking the
"connRegistered_ is pipeline membership" invariant.

> If empty() returns false with a completed front transaction it means the
> front request has "completed" without calling its connIsFinished()

I thought "completed" meant that doneAll() is true, which meant
connRegistered_ was false, which meant connIsFinished() was called.
Thus, it was impossible to be "completed" without calling
connIsFinished(). What meaning do you attach to the "completed" term now?

> FWIW ClientSocketContext is one of those "hairy" objects halfway between
> a state engine and AsyncJob. I have a feeling it should be an AsyncJob
> and offload some/most transaction state into MasterXaction with the
> pipeline holding a reference to the MasterXaction.

Please do not put ClientSocketContext transaction state (or pointers to
ClientSocketContext jobs) into MasterXaction. See the MasterXaction
class description for documentation of what should [not] go there.

>> I am not sure ClientSocketContext is the right place for bumping
>> operations because we are not talking about a temporary detour to
>> process a single HTTP request. In fact, the last request processing is
>> over when we start bumping. Instead, we are switching into a special
>> tunnel mode. The best place is probably neither ConnStateData nor
>> ClientSocketContext but an SslBump-equivalent of TunnelStateData.
>
> You may be right. However IMO it needs to at least be driven/started by
> a call done from ClientSocketContext as it processes the CONNECT.
> (tunnelStart should also be triggered in the same sort of way.)

We are probably digressing here, but I think the key to this decision is
ConnStateData and its scope:

* If the ConnStateData object remains while we tunnel, then
ConnStateData should probably initiate that tunnel after the last
ClientSocketContext is gone and get notified when the tunnel is over.
The kick() method might be the right place for that initiation.

* If ConnStateData should be destroyed during the handover to the
tunneling code, then initiating that tunnel may be the last thing a
ClientSocketContext does before deregistering (with the intent to
destroy ConnStateData as well) and self-destructing. This "a child
destroying parent" approach feels messy though.

Whoever works on this code needs to start by getting a clear
understanding (or at least documenting the assumed definition) of
ConnStateData, ClientSocketContext, and similar client_side* classes
intent and scope. These difficult problems are nearly impossible to
solve correctly while looking at a single class, not to mention a single
class member.

I cannot volunteer for that work right now because I have already
promised to fix Store API, but I would discourage attempts to fix
*inter-class* problems by nibbling at individual class details.

Intra-class nibbling is very useful on its own (and your patch is a good
example of what can be done at that level!). We should just be careful
not to overreach without developing a good understanding/plan for the
client_side* hierarchy as a whole.

> The warning note at the start of ConnStateData::getSslContextStart() is
> a huge hint that CONNECT is being "finished" too early:
>
> "
> careful: pipeline.terminateAll(0) above frees request, host, etc.
> "

> The current state of detatchment by tunnel seems to be a big cause of
> readMore related bugs.

Yes, there are several problems in that area. IMO, most of them are
inter-class problems mentioned above.

> + /// pipeline/set of transactions waiting to be serviced
> + Pipeline pipeline;

I suggest a more precise/detailed version, given your definition in the
beginning of your email:

  /// Zero or more transactions being (or waiting to be) serviced:
  /// The front transaction is the request reading transaction;
  /// The back transaction is the response writing transaction.

> +ClientSocketContextPointer
> +Pipeline::front() const
> +{
> + if (requests.empty()) {
> + debugs(33, 3, "Pipeline " << (void*)this << " empty");
> + return ClientSocketContextPointer();
> + }

I recommend asserting on an empty queue here. Most of the caller code
already checks for that and there few remaining places can be easily
adjusted to do so (they have to be modified anyway). Your call though.

Same for pop().

> + Pipeline& operator =(const Pipeline&); // do not implement

And do not make public.

There is also a copy constructor. If you want to disable copying, both
should be declared and undefined.

> +
> + // mark ourselves as completed
> connIsFinished();

This call may destroy "this" object, right? I suggest mentioning that
fact in the above comment if it is true.

> + ~Pipeline() {terminateAll(0);}

What is the point of the terminateAll() call here? A destructing
pipeline implies half-destroyed ConnStateData, its owner. There is very
little we can safely do at that stage! Calling connIsFinished() and,
hence, ConnStateData::kick() is not safe.

The only thing you can do in the destructor is assert that pipeline is
empty, I think.

> void
> +Pipeline::terminateAll(int xerrno)
> +{
> + while (!requests.empty()) {
> + ClientSocketContextPointer context = requests.front();
> + debugs(33, 3, "Pipeline " << (void*)this << " notify(" << xerrno << ") " << context);
> + context->noteIoError(xerrno);
> + context->connIsFinished(); // cleanup the context state and self-deregister
> + assert(context != requests.front());
> + }
> +}

This loop seems very unsafe in the patched code when pipelining is
enabled. It essentially calls ConnStateData::kick() multiple times,
without any respect to the current ConnStateData state (or existence).
This must be redesigned IMO.

Suggestion: Please see if you can move kick() away from
connIsFinished(). Let ClientSocketContext deregister itself but force
the caller to kick() ConnStateData explicitly when needed. The caller
that calls terminateAll() will not call kick(), of course.

The above approach will also remove concerns that kick() may
[eventually] destroy the ConnStateData object, making any Pipeline or
ConnStateData use after connIsFinished() call unsafe.

> + assert(pipeline.size() < 2); // the CONNECT being there us okay. Anything else is a bug.

s/us/is/

> // ClientSocketContext::keepaliveNextRequest() checks pinning.serverConnection and will close.

Needs syncing. It is kick() now. Please double check for other such cases.

> + /// try to complete a transaction or read more I/O
> + void kick();

Kick() is called from connIsFinished() which is the last thing the
transaction does so "complete" is a little awkward here. Polishing
suggestion:

/// unblock pending response, if any, or try reading the next request

In summary, all the comments except those about terminateAll()
design/safety are of polishing nature. Unfortunately, due to the
terminateAll-related concerns, I would like to ask for another review round.

Thank you,

Alex.
Received on Wed May 14 2014 - 21:30:56 MDT

This archive was generated by hypermail 2.2.0 : Thu May 15 2014 - 12:00:23 MDT