Re: [PATCH] Move timeout handling from ConnOpener::connect to ConnOpener::timeout from Amos Jeffries on 2013-01-18 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 19 Jan 2013 17:00:22 +1300

On 19/01/2013 12:19 p.m., Rainer Weikusat wrote:
> Amos Jeffries <squid3_at_treenet.co.nz> writes:
>>> On 01/17/2013 02:53 PM, Rainer Weikusat wrote:
>>>> Alex Rousskov <rousskov_at_measurement-factory.com> writes:
>>>>> On 01/17/2013 10:56 AM, Rainer Weikusat wrote:
>>>> Then, the ::connect
>>>> method (or some code running on behalf of that) would close
>>>> temporaryFd_ before scheduling a 'delayed connect retry' which would
>>>> open a new socket. A ::timeout running then won't need to free
>>>> write_data and won't have to cancel the early_abort call.
>>> Agreed.
>>>
>>>> If the write handler was scheduled
>>>> by the time timout runs, this happened because 'something definite' is
>>>> known about the state of the connection attempt: Either, it suceeded
>>>> or the OS returned an error. Throwing this 'real information' away in
>>>> order to save a line of code seems not right to me even if it is a
>>>> fringe case of a fringe case.
>>> As I said, one would not be just saving lines of code. One would be most
>>> likely introducing bugs (that is exactly what happened in the past with
>>> this code). We have at most two "real information" pieces: a timeout and
>>> I/O status. The first to get to ConnOpener wins.
>> We have the problem that we do not know whether the future scheduled
>> ::connect was scheduled with a success (usually is) or a IN_PROGRESS
>> which would re-start another in-progress event cycle. The Job won't
>> know that datum until that scheduled Call is run.
> Presently, that's not much of a problem because the retry code is
> broken since it doesn't really retry the connect but just re-schedules
> the ::connect method call which will call comm_connect_addr with an
> unchanged state which should (I haven't tested this) either return the
> previous error or (more likely) return another 'false positive'
> 'connection succeeded' result which should ultimatively trigger the
> same assert I wrote about in my first mail to this list.
>
> After fixing this, the ::connect code could look at the calls_.timeout_
> member to determine if it is still supposed to retry or if a timeout
> happened in between and retries should stop.

If the timeout already happened then doneConnecting() has already
stopped the Job and the connect() Call will not be run as Alex pointed out.

calls_.* are local pointers held by ConnOpener for the sole purpose of
calling cancel() when necessary. They are *not* unset when the event
system of Squid schedules the Call. There is a separate pointer held in
the FD system (fd_table[].*Handler) which is cleared when the Call is
scheduled by event system or unset via Comm::SetSelect. This
Comm::ConnOpener layer is not exactly a good place to be playing with
those directly (the write_data is a problem hack and needs to die ASAP,
but not yet).

NP: We have already agreed to throw away the later arriving or bogus
'connection succeeded' information if timeout() wins the Call queue race
and uses doneConnecting().

Amos
Received on Sat Jan 19 2013 - 04:00:30 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 19 2013 - 12:00:09 MST