Re: [PATCH] schedule connect timeouts via eventAdd

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 24 Jan 2013 16:45:45 +1300

On 24/01/2013 1:50 p.m., Alex Rousskov wrote:
> On 01/23/2013 04:39 PM, Rainer Weikusat wrote:
>> Rainer Weikusat <rw_at_sapphire.mobileactivedefense.com> writes:
>>
>> [...]
>>
>>> The downside of this approach is that this means adding
>>> future connect timeouts will become proportionally more expensive as
>>> more and more connections are being established
>> This is an understatement. Based on a somewhat simplified model of the
>> event list (initially empty, only connect timeouts), adding n timeouts
>> in a row would have a cost cost 2 * (n - 1) + 1 (assuming 'insertion
>> cost/ empy list' 1, same for removal cost) when adding and removing
>> the events but would be (n * (n + 1)) / 2 for the other case (sum of
>> 1 + 2 + ... + n).
> Side note: I am very glad you noticed that problem with event-based
> timeouts. This tells me somebody is thinking about the code beyond the
> itch to squash a known bug. :-)
>
>
> To simplify your analysis and make it more meaningful, please consider a
> typical steady state, where you have R new connections per second, t
> second connection establishment delay, T second timeout, and negligible
> number of actual timeouts. In that state, one event-based approach gives
> you R*t timeouts and the second event-based approach gives you R*T
> timeouts registered with Squid at any time.
>
> What are the cost of handling one event-based timeout in the first (add
> and forget) and second (add and remove) event-based approaches? Since we
> start search from the oldest event and all timeouts will go to the very
> end of the list, I think the costs are:
>
> add-and-forget: R*T
> add-and-remove: 2*R*t
>
> (*) The 2 multiplier for add-and-remove is kind of the worst case -- in
> general, the event we want to cancel will be in the beginning of the
> queue, not the end of it!

I think there is a ^ calculation missing in the add-and-forget formula.
Because the list growth is exponential the add() cost rises
exponentially for each R+1.

> If the above model is correct, the best choice should be clear by now
> because a typical 2*t is a lot less than a typical T, but let's try R =
> 1000 new connections per second, T = 60 seconds (default), and t = 0.1
> second (conservative!).
>
> add-and-forget: 1000 * 60 = 60'000 operations
> add-and-remove: 2 * 1000 * 0.1 = 200 operations
>
> Still "200" or even "100" (*) is a significant cost for this
> more-or-less realistic example. We can reduce that cost if we add an
> "add by searching from the end of the queue" eventAdd() optimization
> (either automated or explicit). In that case, the costs will become:
>
> add-and-forget-o: 1
> add-and-remove-o: R*t
>
> Or we could go back to fd-based timeouts, but we would need to be extra
> careful with maintaining them using the same raw Comm manipulations that
> have screwed us in the past. That would be especially difficult across
> changing temporary FDs...
>
> Cost summary:
>
> CPU RAM
> current fd-based: 1 R*t
> add-and-forget-o: 1 R*T
> add-and-remove-o: R*t R*t
>
>
> Are my estimates correct?
>
> Is storing 60'000 events instead of 100 acceptable? I kind of doubt it
> is... :-(

I agree.

Particularly since there are ~10 other _types_ of event sharing the
queue. With varying length of timeout on each type. This means that
add/remove operations they are doing on their own (currently
add-and-remove models) will get proportionally more costly as the queue
size increases.
NP: converting all the users to add-and-forget will consume much more
memory, AND will break that assumption of "generally add is to the end
of the queue" which is actually not true now anyway. Generally add is to
position N-6 of the current queue in front of MemPools garbage
collection, disk swap.state cleanup, DelayPools garbage collection,
store Digest rebuild, store Digest fetch, peer ICP ping. Maybe in front
of a large set of IdlePool timeouts as well.

The different length of timer values on each event type above means we
cannot easily convert the list type to dlink_list. At least a full
analysis of the queue usage. Possibly implementation of both front and
back push ops switched by a check for which end is best to inject from
(more cycles spent doing the check etc).

Amos
Received on Thu Jan 24 2013 - 03:45:56 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 24 2013 - 12:00:08 MST