Re: [PREVIEW] Retry requests that failed due to a pconn race, take 1 from Amos Jeffries on 2012-02-14 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 15 Feb 2012 12:49:32 +1300

On 15.02.2012 12:10, Alex Rousskov wrote:
> On 02/14/2012 02:45 PM, Amos Jeffries wrote:
>>> 1. Do not set wasIdle in pinConnection(). Yes, the pinned
>>> connection
>>> does become idle, but we cannot retry pinned pconn races so there
>>> is
>>> little point in detecting and then doing extra work to ignore them.
>>>
>>> 2. Set wasIdle in pinConnection() (already done in the attached
>>> patch).
>>> In forward.cc, add code that will avoid retrying pinned pconn
>>> races.
>>> This avoids the lie in pinConnection() but makes the forward.cc
>>> code a
>>> little more complex.
>>>
>>> 3. Set wasIdle in pinConnection() (already done in the attached
>>> patch).
>>> Leave forward.cc as is because my assertion analysis above is wrong
>>> and
>>> current code can retry pinned pconn races correctly.
>>>
>>> I am not an expert on current pinned connection use. Which option
>>> is the
>>> best?
>>
>>
>> I think (2). Turning that assert into:
>> serverConn = (pinned_connection ?
>> pinned_connection->validatePinnedConnection(request,
>> serverDestinations[0]->getPeer()) : NULL);
>
> Done in "take 2" of the patch (attached). I could not use the ?:
> operator because of the type mismatch, but the code is essentially
> the same.
>
> I also improved the "retry the same destination" condition so that we
> preserve the old behavior if serverConn is nil and, hence, we cannot
> detect a pconn race.
>
>
> Do you agree with FTP/Gopher changes? Should forward.cc make the
> retry
> decision itself instead, as in the sketch below?
>
> if (!serverConn || !serverConn->wasIdle || protoHasNoPconnRaces())
> serverDestinations.shift(); // last one failed. try another.
> else
> debugs(17, 4, HERE << "retrying the same destination");
>
>
> Finally, I am considering moving wasIdle flag from Connection to
> FwdState after all. While it is true that FwdState cannot know
> whether
> the pinned connection was an idle persistent connection, it just
> occurred to me that ALL pinned connections were idle and persistent
> when
> they enter FwdState!

Of course. That was the point behind flagging them in pinConnection().

>
> Thus, FwdState can maintain the wasIdle flag on its own. Doing so
> will
> require calls from Server kids to FwdState to clear the flag when we
> read something from the server, but it may be simpler (more
> localized)
> than the current Connection::wasIdle solution. What do you think?
>

Up to you.

Recall that tunnel.cc also needs to be considered whenever extra state
management is pushed into the ServerData child components and does use
pconn.

>
>>> As discussed earlier, we cannot bypass the pop() call because we
>>> want to
>>> close an old idle connection if we have to create a new one.
>
>> In this case an existing one has just closed/failed. So killing one
>> will be a net negative on the connection count. Seems to just be a
>> waste of one of those N other connections (without even checking its
>> usability).
>
>> I'm not arguing for/against this, just pointing out the case here is
>> not as bad as the expanding connection count we argued out earlier.
>
> Agreed. I will probably remove that extra pop() condition added in
> take1
> if nobody else finds a reason to add it.
>
> However, we will then need to add code to require a new connection
> when
> retrying after a pconn race because we do not want a single user to
> keep
> testing all those "N other connections" for validity. We will not
> close
> any of them, but we will not use them either.

You seem to misunderstand me here. I'm not talking about scanning those
N on pconn errors in one client. Just that killing an extra one is not
necessary because we *already* just popped off and closed the one being
retried.

Essentially out of N pconn entries:
   - pop() #1. It self-closes (race lost).
   - pop() #2. Close and drop it.
   - re-open #1.

Why bother with that second pop() at all?

For the next client to need one of these pconn set, #2 has a better
chance of success than #3 does (LIFO remember) regardless of whether it
failed or not. If it failed the chance is good that will fail too, but
less chance than #3 has, etc.

Amos
Received on Tue Feb 14 2012 - 23:49:36 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 17 2012 - 12:00:09 MST