Re: [PATCH] Comm::TcpAcceptor::doAccept fd limit handling is broken

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 17 Jul 2014 14:32:44 -0600

On 07/16/2014 06:46 PM, Amos Jeffries wrote:
> On 17/07/2014 11:10 a.m., Alex Rousskov wrote:
>> On 07/16/2014 02:38 PM, Rainer Weikusat wrote:
>>> Alex Rousskov <rousskov_at_measurement-factory.com> writes:
>>>> On 07/16/2014 11:11 AM, Rainer Weikusat wrote:
>>>>> This is broken because it re-enables readable notifications even if no
>>>>> connection was accepted.
>>
>>>> In other words, it re-enables a check for new connections when
>>>>
>>>> * we know that we currently cannot accept any new connections,
>>>> * Squid is under stress already, and
>>>> * our code does not handle multiple deferences of the same TcpAcceptor
>>>> object well.
>>
>>
>>> There's an important 4th point: The check is re-enabled when it is known
>>> that it will immediately return true again because the connection which
>>> was not accepted is still pending (hence, the socket is still readable),
>>> more details below.
>>
>> Agreed!
>>
>>
>>> I changed the TcpAcceptor/ AcceptLimiter code to act as if
>>> only a single file descriptor was available for client connections and
>>> created two connections via netcat. What it should do in this case is to
>>> ignore the second connection until the first one was closed. Instead, it
>>> went into an endless 'defer this connection' loop, as mentioned above,
>>> because the pending 2nd connection meant the socket remained readable.
>>
>> OK, I think I now better understand what you meant by "one connection at
>> the same time". And even if I do not, we seem to agree on the fix to
>> solve the problem, which is even more important :-).
>>
>>
>>> updated patch:
>>>
>>> --- mad-squid/src/comm/TcpAcceptor.cc 8 Jan 2013 17:36:44 -0000 1.1.1.2
>>> +++ mad-squid/src/comm/TcpAcceptor.cc 16 Jul 2014 20:36:35 -0000
>>> @@ -204,7 +204,6 @@
>>> } else {
>>> afd->acceptNext();
>>> }
>>> - SetSelect(fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, afd, 0);
>>>
>>> } catch (const std::exception &e) {
>>> fatalf("FATAL: error while accepting new client connection: %s\n", e.what());
>>> @@ -262,6 +261,8 @@
>>> debugs(5, 5, HERE << "Listener: " << conn <<
>>> " accepted new connection " << newConnDetails <<
>>> " handler Subscription: " << theCallSub);
>>> +
>>> + if (!isLimited) SetSelect(conn->fd, COMM_SELECT_READ, Comm::TcpAcceptor::doAccept, this, 0);
>>> notify(flag, newConnDetails);
>>> }
>>
>>
>> I am happy to commit the above (after splitting the new if-statement
>> into two lines), but since this is not an emergency, I would prefer to
>> wait a little for any objections or second opinions.
>
> Looks like a reasonable temporary solution to me for the stable
> releases. I have no problems with it going in if we continue to work on
> the queue fixes for trunk.

Hi Amos,

    Since I cannot promise to satisfy your "work on the queue fixes for
trunk" precondition, committing this fix to trunk (or removing the
precondition) becomes your call.

> Regarding that queue I think it can be cleaned up once and for all by
> making doAccept() a proper Call and using the new comm/Read.h API.

Yes, but with some adjustments:

Calling acceptOne() or a similar method asynchronously is indeed the
right way to do this.

However, the method that AcceptLimiter will call should not use the
comm/Read API unless AcceptLimiter is willing to provide the same
arguments that a regular read callback gets. I do not think this
complication (and a slight lie that the socket was just selected for
reading) are worth it. We should just call a parameterless TcpAcceptor
method. acceptOne() may be directly usable for this purpose.

doAccept() provides okToAccept() checks that are harmful in this
we-have-already-decided-to-resume-listening context and replacements for
job protections already offered by async calls API. If doAccept()
remains as it is, we should copy its fatal()-related stuff into
TcpAcceptor::callException() though.

> The queue becomes a std::queue<AsyncCall::Pointer>

Yes, or, if one worries about efficiency in this context, this can be
optimized further by reusing (and adjusting) AsyncCallQueue instead. I
probably would not worry about efficiency in this context, but let me
know if I should detail that optimized solution.

> and the nasty
> removeDead() drops in favour of a TcpAcceptor internal Call::cancel().

Remembering the call for the sole purpose of canceling it is harmful
when such cancellations are rare: The async calls API already provides
protection from calling dead/gone jobs, and we should just use that in
this case.

Another fix if somebody is going to work on queue optimization is
removal of isLimited: AFAICT, after all the changes, a "limited"
acceptor cannot have a read scheduled, and an acceptor with a read
scheduled cannot be limited.

HTH,

Alex.
Received on Thu Jul 17 2014 - 20:33:07 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 18 2014 - 12:00:11 MDT