Re: [PATCH] bug 3081: comm layer cleanups for TCP acceptor

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 18 Jan 2011 18:03:00 +1300

On 18/01/11 10:08, Alex Rousskov wrote:
> On 01/14/2011 09:11 PM, Amos Jeffries wrote:
>> On 10/01/11 14:02, Amos Jeffries wrote:
>>> Attached patch contains the cleanup-comm branch changes to
>>> ListenStateData which make it an AsyncJob (called Comm::TcpAcceptor)
>>> using Subscriptions to generate calls.
>>>
>>> This fixes the call re-use problem of bug 3081 and a related FTP data
>>> connection bug.
>>>
>>> Alex:
>>> please test that this does not break with SMP support. I've run it with
>>> 2 workers on a non-SMP machine, but have not checked for anything more
>>> complex.
>>>
>>
>> Have you had any time to double-check me on this Alex?
>>
>> I don't want to rely on the silence clause for merging this one.
>
> Sorry I could not review this earlier.
>
>
>> NP: attached updated version against trunk r11159.
>
>> + CommAcceptCbPtrFun(const CommAcceptCbPtrFun&o);
>
> Are you sure this is needed? It seems like its implementation is what
> the default copy constructor does. If there is a reason to add it, you
> should add a destructor as well, but I think everything should work
> without these additional explicit members.

Since we have AsyncCall blocking the default constructor from allowing
null calls to be passed around we need to add an explicit constructor in
the particular children which subscription requires copying for.

>
>> inline CommCbFunPtrCallT(int debugSection, int debugLevel,
>> const char *callName, const Dialer&aDialer);
>>
>> + inline CommCbFunPtrCallT(const Pointer&p) :
>
> This additions seems wrong on two counts:
>
> a) You should not add a conversion from a pointer to CommCbFunPtrCallT
> to CommCbFunPtrCallT itself.
>
> b) You could add an explicit copy constructor, but the default copy
> constructor should work for copying CommCbFunPtrCallT objects already, no?

Without them ... this:

   // setup the subscriptions such that new connections accepted by
listenConn are handled by HTTP
   typedef CommCbFunPtrCallT<CommAcceptCbPtrFun> AcceptCall;
   RefCount<AcceptCall> subCall = commCbCall(5, 5, "httpAccept",
CommAcceptCbPtrFun(httpAccept, s));
   Subscription::Pointer sub = new CallSubscription<AcceptCall>(subCall);

Generates this:

In file included from ../../trunk/src/comm/TcpAcceptor.h:6,
                  from ../../trunk/src/client_side.cc:103:
../../trunk/src/base/Subscription.h: In member function
‘RefCount<AsyncCall> CallSubscription<Call_>::callback() const [with
Call_ = CommCbFunPtrCallT<CommAcceptCbPtrFun>]’:
../../trunk/src/client_side.cc:4092: instantiated from here
../../trunk/src/base/Subscription.h:45: error: no matching function for
call to ‘CommCbFunPtrCallT<CommAcceptCbPtrFun>::CommCbFunPtrCallT(const
RefCount<CommCbFunPtrCallT<CommAcceptCbPtrFun> >&)’
../../trunk/src/CommCalls.h:302: note: candidates are:
CommCbFunPtrCallT<Dialer>::CommCbFunPtrCallT(int, int, const char*,
const Dialer&) [with Dialer = CommAcceptCbPtrFun]
../../trunk/src/CommCalls.h:263: note:
CommCbFunPtrCallT<CommAcceptCbPtrFun>::CommCbFunPtrCallT(const
CommCbFunPtrCallT<CommAcceptCbPtrFun>&)

>
>> + AsyncCall();
>> + AsyncCall(const AsyncCall&);
>
> I could not find the implementation for these two, but I probably just
> missed it. I assume the default/generated constructors did not work.
> Where do we use these two explicit ones?

We don't. These were what you asked me to add when we were creating
Subscription, to prevent default constructors being (ab)used on any old
call. This non-definition enforces the AsyncCall and children are not
copied around and double-scheduled.

>
>> + AsyncCallT(const RefCount<AsyncCallT<Dialer> > &o):
>> + AsyncCall(o->debugSection, o->debugLevel, o->name),
>> + dialer(o->dialer) {}
>> +
>
> Similar to the earlier comments, it seems wrong to provide a
> pointer-to-T to T conversion, especially an implicit one where the
> default copy constructor should work fine.

Parallel of CommCbFunPtrCallT(const Pointer&p). This is required for the
matching case in FTP which uses AsyncCallT for a generic Uniary member
callback.

>
>> if (flag != COMM_OK) {
>> - debugs(33, 1, "httpAccept: FD "<< sock<< ": accept failure: "<< xstrerr(xerrno));
>> + // This should not occur with TcpAcceptor.
>> + // However its possible the call was still queued when the client disconnected
>> + debugs(33, 1, "httpAccept: FD "<< s->listenFd<< ": accept failure: "<< xstrerr(xerrno));
>
> This new comment confuses me. Is it possible or not? Sounds like it is
> possible and is not a bug. Why complain at level 1 then? Overall,
> httpAccept() and httpsAccept() changes seem either unnecessary or out of
> the patch scope.

Sorry. I removed the flag handling then had second thoughts about what
would happen under load or bad lag.

What this covers is:
  client connects
  ... unrelated calls are scheduled
  ... some time later TcpAcceptor picks it up
  TcpAcceptor schedules the subscribed call to happen
  ... previously queued unrelated calls are actioned, but take a long time
  ... meanwhile client disconnects and FD closing is flagged
  ... some time later the call occurs. The Dialer identifies the dead FD
and updates the flag.

I've only actually seen it when bugs were biting (tested with ab up to
1000 concurrent connections), but theoretically its possible under heavy
load.

Long-Term the plan is to have the dialer drop the accept calls via
syncWithComm instead of this handler check.
For now the FTP logics require the callback to happen in order to
failover quickly.

>
>> + debugs(5, 5, HERE<< "FD "<< fd<< ", "<< local_addr<< " AsyncCall Subscription: "<< aSub);
>
> Consider moving the part after HERE to TcpAcceptor::status() method so
> that you do not have to repeat it. It will be used whenever the acceptor
> is called asynchronously (and you can add explicit debugs calls too, of
> course).

Done. Thanks for that. It ties the job ID in nicely with the port details.

>
>> + } catch(const TextException&e) {
>> + fatalf("FATAL: error while accepting new client connection: %s\n", e.message);
>
> It is better to catch std::expression and print its message using
> e.what(). This way, you will catch and report details of both standard
> exceptions and our TextExceptions because TextException is a child of
> std::exception.
>

Fixed.

>> + comm_err_t status = oldAccept(newConnDetails);
>
> Avoid "status" because AsyncJob has a virtual method called status().
> Consider "err" or "result".
>
> Make it const unless we do change it.
>

Fixed.

>
>> + /// temporary holder for newely accepted client FD
>> + int newFd_;
> ...
>> + // drop the temporary recent accepted socket FD details.
>> + // this prevents information crossover on calls.
>> + newFd_ = -1;
>
> Why not keep it as a local variable and pass it to notify(), like the
> old code did? That feels like a simpler/safer approach to me than
> storing this temporary information in a data member.

Done.

>
>> + /** Subscribe a handler to receive calls back about new connections.
>> + * Replaces any existing subscribed handler.
>> + */
>> + void subscribe(const Subscription::Pointer&aSub);
>
> Consider s/Replaces/Unsubscribes/ to indirectly document that pending
> calls remain scheduled.

Fixed.

>
>> + * Pending calls will remain scheduled.
>
> Consider "Already scheduled callbacks remain scheduled".
>

Fixed,

>> +private:
>> + friend class AcceptLimiter;
>
> This looks weird because friends cannot access private members. They can
> only access protected ones and you do not have any protected members.
> Either this friendship is not needed at all or you can move some public
> members to become protected.

Fixed.

The AcceptLimiter class plays with the private field isLimiter right
below the friend definition.

The docs indicate "private or protected" members are accessible to
friend classes (as opposed to functions). I've not had any problems with
this being private. Changed now anyway in case the stricter compilers
complain.

>
>> + Comm::TcpAcceptor *tmp = new Comm::TcpAcceptor(fd, data.local, data.flags, note, sub);
>> +
>> + // Ensure we have a copy of the FD opened for listening and a close handler on it.
>> + assert(data.fd == -1 || data.fd == tmp->fd);
>> + data.fd = -1;
>> + data.opened(tmp->fd, dataCloser());
>> +
>> + AsyncJob::Start(tmp);
>
> This sequence is leak-prone. Either check everything before you create
> an acceptor job pointer or use a refcounted pointer to that job, in case
> you never reach AsyncJob::Start().
>

Fixed.

>> /// Close old data channels, if any. We may open a new one below.
>> - ftpState->data.close();
>> + if (!(ftpState->data.flags& COMM_REUSEADDR))
>> + ftpState->data.close();
>
> Perhaps change the comment to say "Close old exclusive data channels" or
> something like that, to hint at why COMM_REUSEADDR is important here?
>

Hmm, okay. Improving that a bit.

>
>> /// comm_open_listener() parameters holder
>> class OpenListenerParams
>> {
> ...
>> +
>> + /// handler to subscribe to Comm::TcpAcceptor
>> + Subscription::Pointer handlerSubscription;
>> };
>
> I am not sure handlerSubscription addition makes sense. We cannot ask
> Coordinator to subscribe our listening handler. We can only subscribe
> after we get a listening socket from Coordinator.
>
> Moreover, Subscription::Pointer is not a comm_open_listener() parameter.
>
> The UDP-versus-TCP check in Ipc::StartListening also looks out of place.

moved it to ListeningStartedDialer instead. Which seems like it would
work as well without triplicating the opener code.

>
> More related comments immediately below:
>
>> enter_suid();
>> - const int sock = comm_open_listener(p.sock_type, p.proto, p.addr, p.flags,
>> - FdNote(p.fdNote));
>> - const int errNo = (sock>= 0) ? 0 : errno;
>> + if (sock_type == SOCK_STREAM) {
>> + // TCP: setup a job to handle accept() with subscribed handler
>> + Comm::TcpAcceptor *tmp = new Comm::TcpAcceptor(cbd->fd, addr, flags, FdNote(fdNote), sub);
>> + cbd->fd = tmp->fd;
>> + AsyncJob::Start(tmp);
>> + } else if (sock_type == SOCK_DGRAM) {
>> + // UDP: setup the listener socket, but do not set a subscriber
>> + // TODO: create a UDP sbscription so packet event calls get scheduled and queued Async.
>> + cbd->fd = comm_open_listener(sock_type, proto, addr, flags, FdNote(fdNote));
>> + } else {
>> + fatalf("Invalid Socket Type (%d)",sock_type);
>> + }
>> + cbd->errNo = cbd->fd>= 0 ? 0 : errno;
>> leave_suid();
>
>
> AsyncJob::Start() running inside enter_suid() is scary!
>
> StartListeningCb now has to carry handlerSubscription back to the
> caller, even though this subscription was created by the calling code
> and was not used by IPC.

It may not matter now with this new patch. But what are you calling the
"caller" here?
   The config parser which allocates handlerSubscription?
   Or the IPC object doing the FD opening?
   Or the comm_openex() generating the FD?
   Or the TcpAcceptor doing the FD state setup?
   Or the ListeningStartedDialer() reporting the successful startup?
   Or client-side ConnStateData creator calls which is subscribed?

>
> I think this high-level TCP listening complexity should be handled in
> the Ipc::StartListening callback (e.g., clientHttpConnectionOpened) or
> similar code instead, without exposing the low-level Ipc file descriptor
> sharing code to how we listen on some descriptors.
>
> If you want to minimize changes to your patch, Comm::TcpAcceptor can be
> created later and the Comm::TcpAcceptor constructor should take the
> opened descriptor as a parameter instead of calling comm_open_listener
> itself, I guess, but it is probably not the best solution, IMO. Job
> constructors should avoid complex actions such as opening listening
> connections. Ideally, this should happen either before the job is
> created or while the job starts.
>
> If Comm::TcpAcceptor class has to call comm_open_listener, then it
> should really call Ipc::StartListening() instead, like everybody else
> and, again, ideally when starting, and not in the constructor.
>

I think I get what you were saying. In a fuzzy way. Please check my
re-try patch attached.

This version passes the subscription for *_ports through their
ListeningStartedDialer instead of IPC.

NP: I've left some changes in src/ipc/StartListening.cc for you to look
at. They are no longer part of this acceptor update, but are a small
memory optimization I think is worth doing. If you okay that I'll commit
it separately.

Amos

Received on Tue Jan 18 2011 - 05:03:13 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 20 2011 - 12:00:04 MST