Re: [PATCH] Retry requests that failed due to a pconn race

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 20 Feb 2012 13:35:17 -0700

On 02/18/2012 12:26 AM, Amos Jeffries wrote:
> On 18/02/2012 12:36 p.m., Alex Rousskov wrote:
>> On 02/17/2012 01:59 AM, Amos Jeffries wrote:
>>
>>> I'm not sure about:
>>>
>>> + // XXX: stale serverConn may remain set even though we are opening
>>> new conn
>>>
>>> ** This should not matter, the next stage will either retry failed
>>> connect using it yet again, or reset serverConn to the newely opened
>>> Comm::Connection (itself). If you want to be certain about that you can
>>> move the "serverConn = temp;" line up out of the if() condition.
>> I hope it does not matter but it feels very wrong to keep stale
>> information around. Since you think it should work fine, I will set
>> serverConn to temp unconditionally and remove the XXX. serverConn will
>> be nill where that XXX is.
>>
>>
>>> We are more likely to have problems from the local port number of the
>>> serverConn being non-0 on the retries. Since the IP is set Squid will
>>> use bind() to ensure the outgoing IP is retained and bind() does not
>>> like ports which are in TIME_WAIT from previous use. Sorry I missed this
>>> on the last audit.
>>> The place to add serverConn->local.SetPort(0) is retryOrBail() alongside
>>> the new debugs()
>> I do not see port number being reused in basic tests. Perhaps it is
>> reused in some complex configurations, but you do raise a valid concern
>> even if the port number is never reused. Let's clarify what needs to be
>> done here.
>>
>> We do not use serverConn pointer when retrying a failed connection. And
>> after the above modification serverConn will be NULL during retries
>> anyway. I can set serverConn->local port to zero just before setting the
>> serverConn pointer to NULL, but it seems pointless or at least very
>> confusing because it implies we actually want to modify something other
>> than serverConn.
>>
>> We do reuse serverDestinations[0]. Until I read your comment above, I
>> assumed that address remains constant after serverDestinations is built.
>> Now I see that it is not a constant address but a half-baked,
>> fully-baked, or closed connection, depending on the ConnOpener and
>> FwdState state. This implies we may not reuse it safely.
>
> If it has closed already FD value is invalid we can re-use most of the
> other settings inside it. Whether we want to re-use the exact conn
> object or a clone is up to the context. Re-using the same one means we
> have to reset and loose some details logging may have wanted to keep for
> the connection attempt history, FwdState does not keep such a history so
> no problem there at this point.
>
>>
>> I see several options:
>>
>> a) Change FwdState to zero serverDestinations[0].local port before
>> calling Comm::ConnOpener(). This should work if we never pre-set the
>> port elsewhere due to some configuration options/etc. (if we do, we
>> should not zero it). Are there any other fields we should reset??
>
> We *can't* pre-set the port for any outgoing connections, due to the
> same bind() problem. Even TPROXY outgoings only copy the address over.
> The only addressing detail we need to reset is the local outgoing port.
>
> I'm not sure if its relevant outside of FTP Data connections but
> potentially the flags bitmap member might need the COMM_REUSEADDR bit
> un-set.
>
>>
>> b) Change Comm::ConnOpener to always zero the local port. This will work
>> only if we never preset the local port for any connection we are about
>> to open. Some ConnOpener code appears to imply that the connection may
>> already be opened before ConnOpener was called, so this may not work at
>> all. Are there any other fields we should reset??
>
> If it is done by ConnOpener then no. Only that port is relevant.
>
>
>>
>> c) Change Comm::ConnOpener to preserve serverDestinations[0] passed to
>> it. This might break some other new code that assumes that ConnOpener
>> modifies its first constructor parameter, I guess. We do not need to
>> guess which fields require a reset (now or in the future).
>
> Um. Possible, but inefficient. The current code receives back "a" open
> Comm::Connection for use. Any checking that it was the same one passed
> in is arbitrary. But this does mean adding extra allocation/deallocation
> around the connection setup. Which this design was seeking to avoid.
>
>>
>> d) Change FwdState to copyDetails() of the serverDestinations[0]
>> "connection" and give Comm::ConnOpener that copy so that ConnOpener can
>> modify the copy as it pleases. Upon call back, serverConn will have the
>> modified value but serverDestinations[0] will be preserved. We do not
>> need to guess which fields require a reset (now or in the future).
>
> Also, possible. And better than (c), since it only allocates on these
> race cases.
>
>>
>>
>> I guess I would favor (d), but I suspect it goes against your overall
>> Connection design. Do you want me to do (a)?
>
> (a) or (b) seem right without re-designing things.
>
> (b) would appear to be the right thing to do for the future. But affects
> a wider range of code than (a) and needs equivaently more testing. So
> its up to you which you want to go for now.

Hi Amos,

    The patch and port reset code using plan (a) above committed to
trunk as r12050.

Thank you,

Alex.
Received on Mon Feb 20 2012 - 20:35:27 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 21 2012 - 12:00:09 MST