Re: [PREVIEW] Retry requests that failed due to a pconn race

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 14 Feb 2012 09:24:37 -0700

On 02/14/2012 02:09 AM, Amos Jeffries wrote:
> On 14/02/2012 7:03 p.m., Alex Rousskov wrote:
>> Hello,
>>
>> This has been discussed before (see the old thread quoted below),
>> and the bump-ssl-server-first feature is being bitten by this especially
>> badly.
>>
>> The attached patch may not apply to trunk "as is" right now, but the
>> changes should be clearly visible, and I would like to port it there and
>> get it committed if there are no better ideas for the fix.
>>
>> Technical summary:
>>
>> The ERR_ZERO_SIZE_OBJECT errors were visible to the client when the
>> destination had only one address because serverDestinations.shift()
>> made the list of destination empty and startConnectionOrFail() failed.
>>
>> The new "wasIdle" state is remembered in the Connection object instead
>> of FwdState itself because we need to maintain that information for a
>> pinned connection as well (and FwdState is not preserved between a
>> request that pins a connection and a request that uses the pinned
>> connection).
>>
>> This may need more work. I am not sure the FTP and Gopher cases are
>> correct. The current code assumes that those protocols do not have pconn
>> [races].
>>
>
> Um. This patch is not doing what I thought you were arguing for it to do...

The implementation is different because my earlier idea of tying the
"avoidPconn" state to serverDestinations manipulations does not work for
pinned connections. However, I am still solving the same problem, using
essentially the same flag. It is just placed differently so that both
FwdState and pinned connections can share it.

My bugs that you have found below (thank you!) may have clouded the
intent though.

> * You set it on IdleConnList::push, but missed its twin
> ConnStateData::pinConnection() in client_side.cc

Will fix.

> * FwdState::startConnectionOrFail() pops one of those N pconn off the
> idle list instead of opening a new connection.

Yes, this is a bug. Will fix by changing the last pop() argument to the
equivalent of:

    !wasUsed && checkRetriable()

> So you implemented what I was ask
> You need to wrap that pop() call using
> if(serverDestinations[0]->wasIdle) in order to get a brand new
> connection on retries.

As discussed earlier, we cannot bypass the pop() call because we want to
close an old idle connection if we have to create a new one.

> Also, I think this flag would work better if it were a generic flag to
> say the FD was a pconn/pinned in the past. Making TcpOpener the only
> place setting it to false, as a brand new connection is attached to its
> FD. The ServerData child components dont need to even know about it or
> assume anything about races.

I do not think a "used to be a pconn" flag will work here because we
need to detect pconn race conditions and not just any connection reuse.
Pconn race conditions become impossible (and, hence, failures should not
be retried using the same destination address) after we receive
something from the server and before the connection becomes idle again.
Server kids are the only ones that know when we have received something.

In other words, we need a flag that helps us detect failed connections
that went through the following sequence of "states":

    * idle
    * eof

The new wasIdle flag keeps the first state, and Server kids clear that
flag if they receive something from the server because it means the eof
will not be reached immediately after the idle state. I do not see how
we can avoid Server kid modifications, but it would be great if we can,
of course.

I do not like the wasIdle name because, technically, the connection was
not idle after we started writing the request to the server, but I
failed to find a better name. Suggestions welcomed!

Thank you,

Alex.
Received on Tue Feb 14 2012 - 16:25:10 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 15 2012 - 12:00:08 MST