Re: multiple A records

From: Michael Pelletier <mikep@dont-contact.us>
Date: Tue, 1 Jul 1997 10:59:49 -0400 (EDT)

On Tue, 1 Jul 1997, Oskar Pearson wrote:
> The biggest complaint that I get with squid is www.microsoft.com being
> broken...

I had exactly the same problem here...

> As we know, if one of the IP addresses is down for microsoft.com, squid
> returns an error message and does something like "marks it down"...
>
> The people at microsoft seem to be relying on "if and A record is down
> try the next A record that was returned in the DNS"... this doesn't
> happen with squid, but does in things like netscape (and telnet)...

The HTTP RFC indicates that a server can simply refuse connections if it's
too busy, and that's what's happening here.

> Anyone have any idea how to patch squid so that it doesn't do this...
> from what I can tell the correct procedure is to "try the next
> IP" if you get connection refused (though for things like "connection
> reset by peer" during the middle of the download it's to give up)
>
> I haven't had a look at the code but I would guess that it's
> not easy?

I wrote a patch for 1.1.10 that's working swimmingly here. It wasn't
particularly easy until I got some fd-reset code from Duane, and got the
hang of the "handler"-based threading system. Squid code is a little hard
to follow until you get your mind around it.

Instead of removing a bad IP address from the cache when it gets a
connection problem, it marks it "bad" and tries the next address. Then
once it finds a good address, it fetches the page, and then only cycles
through the known-good addresses for future requests. The connect timeout
is lowered based on the number of IP addresses available, down to a
minimum value of 7 seconds, I think. When the last IP address is marked
bad, it goes through the list one more time and tries previously "bad"
addresses, and if it finds one that works, it marks it "good" and fetches
the page, otherwise it returns an error to the user.

For single-address hosts, it tries the host three times, and only returns
a failure if all three attempts fail, or if there's a timeout on the first
attempt. I also modified the cache manager IPcache display to show bad/ok
addresses and counts.

A single-address retry with a numeric URL host (PointCast Network), with
no IPcache entry to mark bad:
---------
97/07/01 09:56:13| commConnectHandle(22): 207.88.210.16 conn fail, retrying
97/07/01 09:56:13| commConnectHandle(22): 207.88.210.16 conn succeeded (try 2)
---------

A single-address retry with www.yahoo.com:
---------
97/06/24 16:39:56| www.yahoo.com(204.71.177.71) marked bad
97/06/24 16:39:56| commConnectHandle(20): 204.71.177.71[www.yahoo.com] conn fail, retrying
97/06/24 16:39:57| www.yahoo.com(204.71.177.71) marked good
97/06/24 16:39:57| commConnectHandle(20): www.yahoo.com conn succeeded (try 2)
---------

And www.pcquote.com has four IP addresses, two of which were either down
or too busy to respond:
---------
97/06/24 10:56:50| www.pcquote.com(206.64.123.139) marked bad
97/06/24 10:56:50| commConnectHandle(21): 206.64.123.139[www.pcquote.com] conn fail, retrying
97/06/24 10:56:50| www.pcquote.com(206.64.123.136) marked bad
97/06/24 10:56:50| commConnectHandle(21): 206.64.123.136[www.pcquote.com] conn fail, retrying
97/06/24 10:56:50| commConnectHandle(21): www.pcquote.com conn succeeded (try 3)
---------

I've been meaning to change the IP[host] to host[ip] -- I transposed two
lines in that particular module -- but I haven't gotten around to it yet.

I've submitted this patch to Duane Wessels, and he said that a change this
significant he's not comfortable putting in a minor release level, and has
postponed it to version 1.2.

If you'd like a copy of the patch in the meantime, feel free to drop me an
e-mail and I'll send it out to you. I haven't had a chance to do a 1.1.11
version yet, maybe some evening this week if someone asks for it. The
differences in the patch probably wouldn't be too significant. 1.1.10's
working well enough for me as it is, though.

I was also thinking of adding a couple extra config file parameters to
control the minimum connection timeout and the number of retries for a
single-address host, but I haven't gotten around to that either. :-)

        -Mike Pelletier.
Received on Tue Jul 29 2003 - 13:15:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:20 MST