Re: ICP timeout calculation / parent selection

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Tue, 11 May 1999 03:22:00 +0200

Duane Wessels wrote:

> >squid-2.2.STABLE2.icp_timeout_select_rtt_parent-2.patch: Select a parent
> >based on statistical RTT when ICP times out (TIMEOUT_FIRST_PARENT_MISS)
>
> I feel like this complicates the algorithm more than it already is.
> Is it really necessary?

It depends on how sensitive the parent selection should be to network
congestion / lost ICP packets.

> Presumably the ICP query timed-out for a reason (i.e. congestion).
> Otherwise, the parent with the best RTT would already be selected
> as FIRST_PARENT_MISS. If we add this patch in, then we sort of
> weaken Squid's ability to recover (by going direct) in the face
> of network congestion or failure.

Both yes and no.

Yes, the query may have been lost due to network congestion related to
that peer.

No, the network congestion may as well be on the local connection,
hitting all peers good or bad.

I agree that the patch (rtt select parent) is far from perfect. Ideally
it would be weighted to dislike parents with a high packet loss ratio or
failed TCP connects.

Also, network failure / congestion recovery can be done quite nicely
without the help of ICP. See my peer connect timeout patch
(squid-2.2.STABLE2.peer_connect_timeout.patch) for one half (falling
back on another path when the selected does not work well), and
statistical weighting can provide the other half (not selecting paths
which has recently proven to be bad).

> >squid-2.2.STABLE2.icp_timeout_selection-2.patch: Calculate dynamic ICP
> >timeout based on only parents estimated RTT (or siblings if there is no
> >alive parents).
>
> I like what this tries to fix (when your siblings are much closer than
> your parents), except it seems like we're losing some valuable
> information in the averaging. I believe that an average of four data
> points is better (in terms of stability and accuracy) than an average
> of two data points.

Not when the four datapoints are of two separate classes and priorities,
which is the case for parents vs siblings. In most cases when there are
both parents and siblings then the parent peerings are important and the
sibling peerings are a nice thing to improve local hit ratio.

> What happens if the opposite situation exists? close parents
> and far siblings? Doesn't it add some unnecessary delays?

The siblings gets ignored in favor for the parents, and there would be a
lot of TIMEOUT_ entries in access.log since Squid do not bother to wait
for the slower siblings (more than 2 x average parent RTT).

> I wonder if things would stabalize if we weighted based on the percent
> of requests forwarded to each neighbor. Presumably we'd have more
> requests forwarded to parents than siblings, so they should
> automatically receive a higher weight. Under some threshold we could
> use hard-coded weights to make sure the first few timeouts are long
> enough so we get some parent replies. It could be computationally
> expensive, I suppose, to do ~100 of these calculations per second.

Probably yes, and it need not to be very computable expensive. It is
however a magnitude more complex than the parent/sibling separaion
approach, and will in the end acheive closely the same thing (siblings
mostly ignored in RTT timeout selection) with the addition that it may
make a better guess when there are big differences between the parents
RTT (only wait for the faster ones when things are normal, when things
get bad at those, wait for slower ones as well).

The timeout calculation can look something like

   timeout = 1.5 / (sum (peer_request_ratio)*max(peer_request_ratio)) *
sum (peer_rtt * peer_request_ratio)
   timeout = max (timeout, 2* max(peer_rtt))
   timeout = min (timeout, avg(peer_rtt))

Where peer_request_ratio is a estimated ratio of requests sent to each
peer, calculated with the help of a simple 5 min delta counter.

But is is only worth diving into coding this, if such flexibility in RTT
timeout selection actually is needed. How many locations have a broad
RTT distribution for parents (or siblings if there only is siblings) and
have a high enought packet loss that a high fixed icp_query_timeout is
not acceptable?

/Henrik
Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:08 MST