Re: [PREVIEW] Retry requests that failed due to a pconn race

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 15 Feb 2012 11:46:04 +1300

On 15.02.2012 05:24, Alex Rousskov wrote:
> On 02/14/2012 02:09 AM, Amos Jeffries wrote:
>> On 14/02/2012 7:03 p.m., Alex Rousskov wrote:
>>> Hello,
>>>
>>> This has been discussed before (see the old thread quoted
>>> below),
>>> and the bump-ssl-server-first feature is being bitten by this
>>> especially
>>> badly.
>>>
>>> The attached patch may not apply to trunk "as is" right now, but
>>> the
>>> changes should be clearly visible, and I would like to port it
>>> there and
>>> get it committed if there are no better ideas for the fix.
>>>
>>> Technical summary:
>>>
>>> The ERR_ZERO_SIZE_OBJECT errors were visible to the client when the
>>> destination had only one address because serverDestinations.shift()
>>> made the list of destination empty and startConnectionOrFail()
>>> failed.
>>>
>>> The new "wasIdle" state is remembered in the Connection object
>>> instead
>>> of FwdState itself because we need to maintain that information for
>>> a
>>> pinned connection as well (and FwdState is not preserved between a
>>> request that pins a connection and a request that uses the pinned
>>> connection).
>>>
>>> This may need more work. I am not sure the FTP and Gopher cases are
>>> correct. The current code assumes that those protocols do not have
>>> pconn
>>> [races].
>>>
>>
>> Um. This patch is not doing what I thought you were arguing for it
>> to do...
>
> The implementation is different because my earlier idea of tying the
> "avoidPconn" state to serverDestinations manipulations does not work
> for
> pinned connections. However, I am still solving the same problem,
> using
> essentially the same flag. It is just placed differently so that both
> FwdState and pinned connections can share it.
>
> My bugs that you have found below (thank you!) may have clouded the
> intent though.
>

Thought so :P. Audit assumed that was the aim.

>
>> * FwdState::startConnectionOrFail() pops one of those N pconn off
>> the
>> idle list instead of opening a new connection.
>
> Yes, this is a bug. Will fix by changing the last pop() argument to
> the
> equivalent of:
>
> !wasUsed && checkRetriable()
>
>
>> So you implemented what I was ask
>> You need to wrap that pop() call using
>> if(serverDestinations[0]->wasIdle) in order to get a brand new
>> connection on retries.
>
> As discussed earlier, we cannot bypass the pop() call because we want
> to
> close an old idle connection if we have to create a new one.
>

In this case an existing one has just closed/failed. So killing one
will be a net negative on the connection count. Seems to just be a waste
of one of those N other connections (without even checking its
usability).

I'm not arguing for/against this, just pointing out the case here is
not as bad as the expanding connection count we argued out earlier.

>
>> Also, I think this flag would work better if it were a generic flag
>> to
>> say the FD was a pconn/pinned in the past. Making TcpOpener the only
>> place setting it to false, as a brand new connection is attached to
>> its
>> FD. The ServerData child components dont need to even know about it
>> or
>> assume anything about races.
>
> I do not think a "used to be a pconn" flag will work here because we
> need to detect pconn race conditions and not just any connection
> reuse.
> Pconn race conditions become impossible (and, hence, failures should
> not
> be retried using the same destination address) after we receive
> something from the server and before the connection becomes idle
> again.
> Server kids are the only ones that know when we have received
> something.
>
> In other words, we need a flag that helps us detect failed
> connections
> that went through the following sequence of "states":
>
> * idle
> * eof
>
> The new wasIdle flag keeps the first state, and Server kids clear
> that
> flag if they receive something from the server because it means the
> eof
> will not be reached immediately after the idle state. I do not see
> how
> we can avoid Server kid modifications, but it would be great if we
> can,
> of course.

Okay. Leaves us a bit more complicated state to remember updating when
making new server-facing components. But not a major issue.

>
> I do not like the wasIdle name because, technically, the connection
> was
> not idle after we started writing the request to the server, but I
> failed to find a better name. Suggestions welcomed!

will have to ponder that. Something about the read being idle maybe?

Amos
Received on Tue Feb 14 2012 - 22:46:07 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 15 2012 - 12:00:08 MST