Re: [PATCH] bug 3081: comm layer cleanups for TCP acceptor

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 22 Jan 2011 04:55:22 +1300

On 20/01/11 11:50, Alex Rousskov wrote:
> On 01/17/2011 10:03 PM, Amos Jeffries wrote:
>> On 18/01/11 10:08, Alex Rousskov wrote:
>>> On 01/14/2011 09:11 PM, Amos Jeffries wrote:
>>>> On 10/01/11 14:02, Amos Jeffries wrote:
>>>>> Attached patch contains the cleanup-comm branch changes to
>>>>> ListenStateData which make it an AsyncJob (called Comm::TcpAcceptor)
>>>>> using Subscriptions to generate calls.
>>>>>
>>>>> This fixes the call re-use problem of bug 3081 and a related FTP data
>>>>> connection bug.
>>>>>
>>>>> Alex:
>>>>> please test that this does not break with SMP support. I've run it with
>>>>> 2 workers on a non-SMP machine, but have not checked for anything more
>>>>> complex.
>>>>>
>>>>
>>>> Have you had any time to double-check me on this Alex?
>>>>
>>>> I don't want to rely on the silence clause for merging this one.
>>>
>>> Sorry I could not review this earlier.
>>>
>>>
>>>> NP: attached updated version against trunk r11159.
>>>
>>>> + CommAcceptCbPtrFun(const CommAcceptCbPtrFun&o);
>>>
>>> Are you sure this is needed? It seems like its implementation is what
>>> the default copy constructor does. If there is a reason to add it, you
>>> should add a destructor as well, but I think everything should work
>>> without these additional explicit members.
>>
>> Since we have AsyncCall blocking the default constructor from allowing
>> null calls to be passed around we need to add an explicit constructor in
>> the particular children which subscription requires copying for.
>
>
> This does not compute for me. A default (parameter-less) constructor,
> blocked or not, should not affect copy constructors. Can you rephrase
> what you are trying to avoid or achieve? Or perhaps quote the errors you
> get when the above explicit copy constructor is removed.
>

I was thinking AsyncCall(const AsyncCall &); with no definition blocked
the children default constructors too. Due to the parent not having one
available to call from the default child constructor.

Doing a test build now to verify. If it passes will drop.

>
>>>> inline CommCbFunPtrCallT(int debugSection, int debugLevel,
>>>> const char *callName, const
>>>> Dialer&aDialer);
>>>>
>>>> + inline CommCbFunPtrCallT(const Pointer&p) :
>>>
>>> This additions seems wrong on two counts:
>>>
>>> a) You should not add a conversion from a pointer to CommCbFunPtrCallT
>>> to CommCbFunPtrCallT itself.
>>>
>>> b) You could add an explicit copy constructor, but the default copy
>>> constructor should work for copying CommCbFunPtrCallT objects already,
>>> no?
>>
>> Without them ... this:
>>
>> // setup the subscriptions such that new connections accepted by
>> listenConn are handled by HTTP
>> typedef CommCbFunPtrCallT<CommAcceptCbPtrFun> AcceptCall;
>> RefCount<AcceptCall> subCall = commCbCall(5, 5, "httpAccept",
>> CommAcceptCbPtrFun(httpAccept, s));
>> Subscription::Pointer sub = new CallSubscription<AcceptCall>(subCall);
>>
>>
>> Generates this:
>>
>> In file included from ../../trunk/src/comm/TcpAcceptor.h:6,
>> from ../../trunk/src/client_side.cc:103:
>> ../../trunk/src/base/Subscription.h: In member function
>> ‘RefCount<AsyncCall> CallSubscription<Call_>::callback() const [with
>> Call_ = CommCbFunPtrCallT<CommAcceptCbPtrFun>]’:
>> ../../trunk/src/client_side.cc:4092: instantiated from here
>> ../../trunk/src/base/Subscription.h:45: error: no matching function for
>> call to ‘CommCbFunPtrCallT<CommAcceptCbPtrFun>::CommCbFunPtrCallT(const
>> RefCount<CommCbFunPtrCallT<CommAcceptCbPtrFun> >&)’
>> ../../trunk/src/CommCalls.h:302: note: candidates are:
>> CommCbFunPtrCallT<Dialer>::CommCbFunPtrCallT(int, int, const char*,
>> const Dialer&) [with Dialer = CommAcceptCbPtrFun]
>> ../../trunk/src/CommCalls.h:263: note:
>> CommCbFunPtrCallT<CommAcceptCbPtrFun>::CommCbFunPtrCallT(const
>> CommCbFunPtrCallT<CommAcceptCbPtrFun>&)
>
> If I parse this correctly, you are, essentially, passing a pointer-to-X
> where reference-to-X is expected. The solution is _not_ to add a
> conversion from pointer-to-X to X, but to pass the X object itself.
>
> In your case, X is CommCbFunPtrCallT<CommAcceptCbPtrFun>.
>
> Note that fixing this may expose other problems.
>

Hmmm, bug in Subscription then.

=== modified file 'src/base/Subscription.h'
--- src/base/Subscription.h 2010-10-07 07:53:45 +0000

+++ src/base/Subscription.h 2011-01-21 14:42:52 +0000
@@ -42,7 +42,7 @@
  public:
      /// Must be passed an object. nil pointers are not permitted.
      explicit CallSubscription(const RefCount<Call_> &aCall) :
call(aCall) { assert(aCall != NULL); }
- virtual AsyncCall::Pointer callback() const { return new Call_(call); }
+ virtual AsyncCall::Pointer callback() const { return new
Call_(call.getRaw()); }

  private:
      const RefCount<Call_> call; ///< gets copied to create callback calls

Build testing this now. With the two related constructor pointer-to
changes. Should have a new patch for you tomorrow (mutual time).

>
>
>>>> + AsyncCall();
>>>> + AsyncCall(const AsyncCall&);
>>>
>>> I could not find the implementation for these two, but I probably just
>>> missed it. I assume the default/generated constructors did not work.
>>> Where do we use these two explicit ones?
>>
>> We don't. These were what you asked me to add when we were creating
>> Subscription, to prevent default constructors being (ab)used on any old
>> call. This non-definition enforces the AsyncCall and children are not
>> copied around and double-scheduled.
>
>
> Ah, please move these two declarations to the "private:" section and add
> a comment saying that they are not implemented to prevent nil calls from
> being passed around and unknowingly scheduled, for now.

Done.

>>>> + AsyncCallT(const RefCount<AsyncCallT<Dialer> > &o):
>>>> + AsyncCall(o->debugSection, o->debugLevel, o->name),
>>>> + dialer(o->dialer) {}
>>>> +
>>>
>>> Similar to the earlier comments, it seems wrong to provide a
>>> pointer-to-T to T conversion, especially an implicit one where the
>>> default copy constructor should work fine.
>>
>> Parallel of CommCbFunPtrCallT(const Pointer&p). This is required for the
>> matching case in FTP which uses AsyncCallT for a generic Uniary member
>> callback.
>
> Same comment. A different solution is needed here unless I am missing
> something. We should not add pointer-to-T to T conversions.
>
>
>>>> if (flag != COMM_OK) {
>>>> - debugs(33, 1, "httpAccept: FD "<< sock<< ": accept
>>>> failure: "<< xstrerr(xerrno));
>>>> + // This should not occur with TcpAcceptor.
>>>> + // However its possible the call was still queued when the
>>>> client disconnected
>>>> + debugs(33, 1, "httpAccept: FD "<< s->listenFd<< ": accept
>>>> failure: "<< xstrerr(xerrno));
>>>
>>> This new comment confuses me. Is it possible or not? Sounds like it is
>>> possible and is not a bug. Why complain at level 1 then? Overall,
>>> httpAccept() and httpsAccept() changes seem either unnecessary or out of
>>> the patch scope.
>>
>> Sorry. I removed the flag handling then had second thoughts about what
>> would happen under load or bad lag.
>>
>> What this covers is:
>> client connects
>> ... unrelated calls are scheduled
>> ... some time later TcpAcceptor picks it up
>> TcpAcceptor schedules the subscribed call to happen
>> ... previously queued unrelated calls are actioned, but take a long time
>> ... meanwhile client disconnects and FD closing is flagged
>> ... some time later the call occurs. The Dialer identifies the dead FD
>> and updates the flag.
>
>> I've only actually seen it when bugs were biting (tested with ab up to
>> 1000 concurrent connections), but theoretically its possible under heavy
>> load.
>
> Yes, I agree that this might be possible. I suggest polishing the
> comment to remove "should not occur" and raising the debugging level to
> 2. This is not a bug or an unhandled condition that we should inform the
> admin about. Messages like that really do hide important warnings on
> busy caches.

Done.

>
>
>> Long-Term the plan is to have the dialer drop the accept calls via
>> syncWithComm instead of this handler check.
>> For now the FTP logics require the callback to happen in order to
>> failover quickly.
>
> I do not know enough about FTP needs to suggest a better implementation,
> but I agree that such checks go in a different direction from the
> previous cancel-I/O-and-let-closing-handler-do-its-job way. Let's hope
> they are temporary.
>

>> This version passes the subscription for *_ports through their
>> ListeningStartedDialer instead of IPC.
>>
>> NP: I've left some changes in src/ipc/StartListening.cc for you to look
>> at. They are no longer part of this acceptor update, but are a small
>> memory optimization I think is worth doing. If you okay that I'll commit
>> it separately.
>
> Sure, especially if they are committed separately.
>

Done.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.10
   Beta testers wanted for 3.2.0.4
Received on Fri Jan 21 2011 - 15:55:35 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 22 2011 - 12:00:05 MST