Re: [squid-users] CARP Failover behavior - multiple parents chosen for URL

From: Chris Woodfield <rekoil_at_semihuman.com>
Date: Wed, 6 May 2009 20:29:38 -0400

On May 6, 2009, at 8:14 PM, Amos Jeffries wrote:

>> Hi,
>>
>> I've noticed a behavior in CARP failover (on 2.7) that I was
>> wondering
>> if someone could explain.
>>
>> In my test environment, I have a non-caching squid configured with
>> multiple CARP parent caches - two servers, three per box (listening
>> on
>> ports 1080/1081/1082, respectively, for a total of six servers.
>>
>> When I fail a squid instance and immediately afterwards run GETs to
>> URLs that were previously directed to that instance, I notice that
>> the
>> request goes to a different squid, as expected, and I see the
>> following in the log for each request:
>>
>> May 6 11:43:28 cdce-den002-001 squid[1557]: TCP connection to http-
>> cache-1c.den002 (http-cache-1c.den002:1082) failed
>>
>> And I notice that the request is being forwarded to a different, but
>> consistent, parent.
>>
>> After ten of the above requests, I see this:
>>
>> May 6 11:43:41 cdce-den002-001.den002 squid[1557]: Detected DEAD
>> Parent: http-cache-1c.den002
>>
>> So, I'm presuming that after ten failed requests, the peer is
>> considered DEAD. So far, so good.
>>
>> The problem is this: During my test GETs, I noticed that immediately
>> after the "Detected DEAD Parent" message was generated, the parent
>> server that the request was being forwarded to changed - as if
>> there's
>> an "interim" decision made until the peer is officially declared
>> DEAD,
>> and then another hash decision made afterwards. So while consistent
>> afterwards, it's apparent that during the failover, the parent server
>> for the test URL changed twice, not once.
>>
>> Can someone explain this behavior?
>
> Do you have 'default' set on any of the parents?
> It is entirely possible that multiple paths are selected as usable and
> only the first taken.
>

No, my cache_peer config options are

cache_peer http-cache-1a.den002 parent 1080 0 carp http11 idle=10
<repeat for each hostname>

> During the period between death and detection the dead peer will
> still be
> attempted but failover happens to send the request to another
> location.
> When death is detected the hashes are actual re-calculated.
>

OK, correct me if I misread, but my understanding of the spec is that
each parent cache gets its own hash value, each of which is then
combined with the URL's hash to come up with a set of values. The
parent cache corresponding with the highest result is the cache
chosen. If that peer is unavailable, the next-best peer is selected,
then the next, etc etc.

If that is correct, what hashes are re-calculated when a dead peer is
detected? Any why would those hashes result in different results than
the pre-dead peer run of the algorithm

And more importantly, will that recalculation result in URLs being re-
mapped that weren't originally pointed to the failed parent? I thought
avoiding such an arbitrary re-mapping was the whole point of the CARP
algorithm.

-C

> If anyone wants a task it may be useful to see whether leaving dead
> peers
> in the existing hash and omitting the dead peers at the selection time
> instead of connection time is more responsive like this while
> reducing the
> double-change.
>

Again, I'm not clear on the difference between the two - educate me
please :)

> Amos
>
>
Received on Thu May 07 2009 - 00:29:42 MDT

This archive was generated by hypermail 2.2.0 : Thu May 07 2009 - 12:00:02 MDT