Re: [PATCH] bug 3081: comm layer cleanups for TCP acceptor

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 19 Jan 2011 15:50:06 -0700

On 01/17/2011 10:03 PM, Amos Jeffries wrote:
> On 18/01/11 10:08, Alex Rousskov wrote:
>> On 01/14/2011 09:11 PM, Amos Jeffries wrote:
>>> On 10/01/11 14:02, Amos Jeffries wrote:
>>>> Attached patch contains the cleanup-comm branch changes to
>>>> ListenStateData which make it an AsyncJob (called Comm::TcpAcceptor)
>>>> using Subscriptions to generate calls.
>>>>
>>>> This fixes the call re-use problem of bug 3081 and a related FTP data
>>>> connection bug.
>>>>
>>>> Alex:
>>>> please test that this does not break with SMP support. I've run it with
>>>> 2 workers on a non-SMP machine, but have not checked for anything more
>>>> complex.
>>>>
>>>
>>> Have you had any time to double-check me on this Alex?
>>>
>>> I don't want to rely on the silence clause for merging this one.
>>
>> Sorry I could not review this earlier.
>>
>>
>>> NP: attached updated version against trunk r11159.
>>
>>> + CommAcceptCbPtrFun(const CommAcceptCbPtrFun&o);
>>
>> Are you sure this is needed? It seems like its implementation is what
>> the default copy constructor does. If there is a reason to add it, you
>> should add a destructor as well, but I think everything should work
>> without these additional explicit members.
>
> Since we have AsyncCall blocking the default constructor from allowing
> null calls to be passed around we need to add an explicit constructor in
> the particular children which subscription requires copying for.

This does not compute for me. A default (parameter-less) constructor,
blocked or not, should not affect copy constructors. Can you rephrase
what you are trying to avoid or achieve? Or perhaps quote the errors you
get when the above explicit copy constructor is removed.

>>> inline CommCbFunPtrCallT(int debugSection, int debugLevel,
>>> const char *callName, const
>>> Dialer&aDialer);
>>>
>>> + inline CommCbFunPtrCallT(const Pointer&p) :
>>
>> This additions seems wrong on two counts:
>>
>> a) You should not add a conversion from a pointer to CommCbFunPtrCallT
>> to CommCbFunPtrCallT itself.
>>
>> b) You could add an explicit copy constructor, but the default copy
>> constructor should work for copying CommCbFunPtrCallT objects already,
>> no?
>
> Without them ... this:
>
> // setup the subscriptions such that new connections accepted by
> listenConn are handled by HTTP
> typedef CommCbFunPtrCallT<CommAcceptCbPtrFun> AcceptCall;
> RefCount<AcceptCall> subCall = commCbCall(5, 5, "httpAccept",
> CommAcceptCbPtrFun(httpAccept, s));
> Subscription::Pointer sub = new CallSubscription<AcceptCall>(subCall);
>
>
> Generates this:
>
> In file included from ../../trunk/src/comm/TcpAcceptor.h:6,
> from ../../trunk/src/client_side.cc:103:
> ../../trunk/src/base/Subscription.h: In member function
> ‘RefCount<AsyncCall> CallSubscription<Call_>::callback() const [with
> Call_ = CommCbFunPtrCallT<CommAcceptCbPtrFun>]’:
> ../../trunk/src/client_side.cc:4092: instantiated from here
> ../../trunk/src/base/Subscription.h:45: error: no matching function for
> call to ‘CommCbFunPtrCallT<CommAcceptCbPtrFun>::CommCbFunPtrCallT(const
> RefCount<CommCbFunPtrCallT<CommAcceptCbPtrFun> >&)’
> ../../trunk/src/CommCalls.h:302: note: candidates are:
> CommCbFunPtrCallT<Dialer>::CommCbFunPtrCallT(int, int, const char*,
> const Dialer&) [with Dialer = CommAcceptCbPtrFun]
> ../../trunk/src/CommCalls.h:263: note:
> CommCbFunPtrCallT<CommAcceptCbPtrFun>::CommCbFunPtrCallT(const
> CommCbFunPtrCallT<CommAcceptCbPtrFun>&)

If I parse this correctly, you are, essentially, passing a pointer-to-X
where reference-to-X is expected. The solution is _not_ to add a
conversion from pointer-to-X to X, but to pass the X object itself.

In your case, X is CommCbFunPtrCallT<CommAcceptCbPtrFun>.

Note that fixing this may expose other problems.

>>> + AsyncCall();
>>> + AsyncCall(const AsyncCall&);
>>
>> I could not find the implementation for these two, but I probably just
>> missed it. I assume the default/generated constructors did not work.
>> Where do we use these two explicit ones?
>
> We don't. These were what you asked me to add when we were creating
> Subscription, to prevent default constructors being (ab)used on any old
> call. This non-definition enforces the AsyncCall and children are not
> copied around and double-scheduled.

Ah, please move these two declarations to the "private:" section and add
a comment saying that they are not implemented to prevent nil calls from
being passed around and unknowingly scheduled, for now.

>>> + AsyncCallT(const RefCount<AsyncCallT<Dialer> > &o):
>>> + AsyncCall(o->debugSection, o->debugLevel, o->name),
>>> + dialer(o->dialer) {}
>>> +
>>
>> Similar to the earlier comments, it seems wrong to provide a
>> pointer-to-T to T conversion, especially an implicit one where the
>> default copy constructor should work fine.
>
> Parallel of CommCbFunPtrCallT(const Pointer&p). This is required for the
> matching case in FTP which uses AsyncCallT for a generic Uniary member
> callback.

Same comment. A different solution is needed here unless I am missing
something. We should not add pointer-to-T to T conversions.

>>> if (flag != COMM_OK) {
>>> - debugs(33, 1, "httpAccept: FD "<< sock<< ": accept
>>> failure: "<< xstrerr(xerrno));
>>> + // This should not occur with TcpAcceptor.
>>> + // However its possible the call was still queued when the
>>> client disconnected
>>> + debugs(33, 1, "httpAccept: FD "<< s->listenFd<< ": accept
>>> failure: "<< xstrerr(xerrno));
>>
>> This new comment confuses me. Is it possible or not? Sounds like it is
>> possible and is not a bug. Why complain at level 1 then? Overall,
>> httpAccept() and httpsAccept() changes seem either unnecessary or out of
>> the patch scope.
>
> Sorry. I removed the flag handling then had second thoughts about what
> would happen under load or bad lag.
>
> What this covers is:
> client connects
> ... unrelated calls are scheduled
> ... some time later TcpAcceptor picks it up
> TcpAcceptor schedules the subscribed call to happen
> ... previously queued unrelated calls are actioned, but take a long time
> ... meanwhile client disconnects and FD closing is flagged
> ... some time later the call occurs. The Dialer identifies the dead FD
> and updates the flag.

> I've only actually seen it when bugs were biting (tested with ab up to
> 1000 concurrent connections), but theoretically its possible under heavy
> load.

Yes, I agree that this might be possible. I suggest polishing the
comment to remove "should not occur" and raising the debugging level to
2. This is not a bug or an unhandled condition that we should inform the
admin about. Messages like that really do hide important warnings on
busy caches.

> Long-Term the plan is to have the dialer drop the accept calls via
> syncWithComm instead of this handler check.
> For now the FTP logics require the callback to happen in order to
> failover quickly.

I do not know enough about FTP needs to suggest a better implementation,
but I agree that such checks go in a different direction from the
previous cancel-I/O-and-let-closing-handler-do-its-job way. Let's hope
they are temporary.

>>> +private:
>>> + friend class AcceptLimiter;
>>
>> This looks weird because friends cannot access private members. They can
>> only access protected ones and you do not have any protected members.
>> Either this friendship is not needed at all or you can move some public
>> members to become protected.
>
> Fixed.
>
> The AcceptLimiter class plays with the private field isLimiter right
> below the friend definition.
>
> The docs indicate "private or protected" members are accessible to
> friend classes (as opposed to functions). I've not had any problems with
> this being private. Changed now anyway in case the stricter compilers
> complain.

I was apparently wrong about friend access to private members, sorry.

>>> /// comm_open_listener() parameters holder
>>> class OpenListenerParams
>>> {
>> ...
>>> +
>>> + /// handler to subscribe to Comm::TcpAcceptor
>>> + Subscription::Pointer handlerSubscription;
>>> };
>>
>> I am not sure handlerSubscription addition makes sense. We cannot ask
>> Coordinator to subscribe our listening handler. We can only subscribe
>> after we get a listening socket from Coordinator.
>>
>> Moreover, Subscription::Pointer is not a comm_open_listener() parameter.
>>
>> The UDP-versus-TCP check in Ipc::StartListening also looks out of place.
>
> moved it to ListeningStartedDialer instead. Which seems like it would
> work as well without triplicating the opener code.

Looks much better, IMO. I do not even see the UDP check anymore!

>> More related comments immediately below:
>>
>>> enter_suid();
>>> - const int sock = comm_open_listener(p.sock_type, p.proto,
>>> p.addr, p.flags,
>>> - FdNote(p.fdNote));
>>> - const int errNo = (sock>= 0) ? 0 : errno;
>>> + if (sock_type == SOCK_STREAM) {
>>> + // TCP: setup a job to handle accept() with subscribed handler
>>> + Comm::TcpAcceptor *tmp = new Comm::TcpAcceptor(cbd->fd,
>>> addr, flags, FdNote(fdNote), sub);
>>> + cbd->fd = tmp->fd;
>>> + AsyncJob::Start(tmp);
>>> + } else if (sock_type == SOCK_DGRAM) {
>>> + // UDP: setup the listener socket, but do not set a subscriber
>>> + // TODO: create a UDP sbscription so packet event calls get
>>> scheduled and queued Async.
>>> + cbd->fd = comm_open_listener(sock_type, proto, addr, flags,
>>> FdNote(fdNote));
>>> + } else {
>>> + fatalf("Invalid Socket Type (%d)",sock_type);
>>> + }
>>> + cbd->errNo = cbd->fd>= 0 ? 0 : errno;
>>> leave_suid();
>>
>>
>> AsyncJob::Start() running inside enter_suid() is scary!
>>
>> StartListeningCb now has to carry handlerSubscription back to the
>> caller, even though this subscription was created by the calling code
>> and was not used by IPC.
>
> It may not matter now with this new patch.

Yes, I think you fixed the primary problem.

> But what are you calling the
> "caller" here?
> The config parser which allocates handlerSubscription?
> Or the IPC object doing the FD opening?
> Or the comm_openex() generating the FD?
> Or the TcpAcceptor doing the FD state setup?
> Or the ListeningStartedDialer() reporting the successful startup?
> Or client-side ConnStateData creator calls which is subscribed?

I think by "caller", I meant whoever created StartListeningCb and/or was
the recipient of that callback. Bad terminology indeed.

>> I think this high-level TCP listening complexity should be handled in
>> the Ipc::StartListening callback (e.g., clientHttpConnectionOpened) or
>> similar code instead, without exposing the low-level Ipc file descriptor
>> sharing code to how we listen on some descriptors.
>>
>> If you want to minimize changes to your patch, Comm::TcpAcceptor can be
>> created later and the Comm::TcpAcceptor constructor should take the
>> opened descriptor as a parameter instead of calling comm_open_listener
>> itself, I guess, but it is probably not the best solution, IMO. Job
>> constructors should avoid complex actions such as opening listening
>> connections. Ideally, this should happen either before the job is
>> created or while the job starts.
>>
>> If Comm::TcpAcceptor class has to call comm_open_listener, then it
>> should really call Ipc::StartListening() instead, like everybody else
>> and, again, ideally when starting, and not in the constructor.
>>
>
> I think I get what you were saying. In a fuzzy way. Please check my
> re-try patch attached.

You got it right!

> This version passes the subscription for *_ports through their
> ListeningStartedDialer instead of IPC.
>
> NP: I've left some changes in src/ipc/StartListening.cc for you to look
> at. They are no longer part of this acceptor update, but are a small
> memory optimization I think is worth doing. If you okay that I'll commit
> it separately.

Sure, especially if they are committed separately.

Thank you,

Alex.
Received on Wed Jan 19 2011 - 22:50:28 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 21 2011 - 12:00:20 MST