Re: [squid-users] Unneeded DNS lookups for cache_peer selection

From: Amon Ott <lists_at_compuniverse.de>
Date: Thu, 30 Jan 2014 13:25:24 +0100

Am 30.01.2014 12:24, schrieb Amos Jeffries:
> On 30/01/2014 11:52 p.m., Amon Ott wrote:
>> Am 30.01.2014 10:45, schrieb Amos Jeffries:
>>> On 30/01/2014 9:25 p.m., Amon Ott wrote:
>>>> Am 29.01.2014 21:51, schrieb Amos Jeffries:
>>>>> On 2014-01-30 03:48, Amon Ott wrote:
>>>>>> We have a setup with a single parent proxy that shall be used for all
>>>>>> requests. "never_direct allow all" ensures that the parent is not to be
>>>>>> bypassed. Still, every request leads to an extra unnecessary (and
>>>>>> failing) DNS lookup. We enjoyed several complaints because of unwanted
>>>>>> DNS traffic inside a firewalled area. As all results are negative, even
>>>>>> the DNS caching does not really help.
>>>>>>
>>>>>> I have traced the problem into peerSelectDnsPaths() in
>>>>>> src/peer_select.cc. It seems that despite the clear fact that only one
>>>>>> cache_peer can ever be selected (and has to be selected), the code still
>>>>>> makes a DNS request for the hostname in the requested URL to find the
>>>>>> best proxy. It would be greatly appreciated if we could suppress these
>>>>>> lookups without loosing DNS access for parent proxy lookup by DNS name.
>>>>>> The lookup does not make any sense to me even with multiple parents, if
>>>>>> the parent selection is never based on the request target host.
>
> Please enable debug_options 44,2 to find evidence of that peer selection.
>
> Squid-3.4 cache.log shows "Find IP destination for:" and what URL on the
> lookup starting, the "via XX" is what DNS hostname is about to be looked up.
>
> The results when all (your 1 peer) are done will be shown following a
> line saying "Failed to select source for" or "Found sources for" with
> the same URL followed by some lines saying what those sources were.
>
>
> <snip>
>>
>>>> Since the DNS lookup fails anyway, the solution could be as simple as
>>>> having a simple switch to forbid URL target host DNS lookups completely.
>>>> Parent proxy selection could still continue like after failed DNS lookup.
>>>
>>> If the DNS lookup fails there Squid has no idea where the cache_peer has
>>> been moved to when it changes. And never_direct prevents alternative IPs
>>> being looked for.
>>
>> I seem to have been unclear here: I want to get rid of DNS lookups of
>> request URL hosts, not of DNS lookups for the parent proxy. ATM, every
>> single Web request over the proxy seems to trigger a DNS lookup for that
>> host. tcpdump clearly shows tons of requests for external DNS names to
>> our DNS server.
>>
>> It seems to me that the parent selection algorithm always looks up the
>> host in the user request URL at the place mentioned above, even though
>> it is only needed with certain per-parent ACLs or if the proxies might
>> get bypassed for speed. Maybe the lookup could be delayed until it is
>> really needed for a decision. AFAICS, the new "-n" parameter to ACLs in
>> 3.4 has a similar idea, it avoids DNS traffic.
>
> The config file you sent has nothing in it that would *ever* cause Squid
> to preform DNS lookup on the URL domain.

This is what I thought.

> There are ACLs checking various details of the client IP:port, one
> lookup of the DNS name of the cache_peer in order to connect to it. That
> is all.
>
> peerSelectDnsPaths() is handling the DNS lookup for whats shown in the
> config file as "my.uplink.proxy".
>
> never_direct prevents the request host being selected at all. No DNS is
> involved with that "allow all" check and once that has been done the
> request URL is ignored.
>
> So I am very interested in what your cache.log results from above will show.

Attached is a cache.log from a test system with the above debug settings
when accessing www.m-privacy.de, www.rsbac.org and www.google.de. Please
note that in this test network, external DNS names can be resolved. If
needed, I can also rearrange it to fail for external DNS.

Also attached is a tcpdump -n "port 53 and host 192.168.200.106" on the
LAN, starting when requesting www.rsbac.org. Subsequent reloads do not
send new DNS requests, so the ipcache seems to work for these positive
results.

The lines 264ff in peer_select.cc look like they were looking up the ip
of the request host, not of the peer:

        const char *host = fs->_peer ? fs->_peer->host :
psstate->request->GetHost();
        debugs(44, 2, "Find IP destination for: " <<
psstate->entry->url() << "' via " << host);
        ipcache_nbgethostbyname(host, peerSelectDnsResults, psstate);

This is why I suspected that code to be the culprit.

Amon.

Received on Thu Jan 30 2014 - 12:25:38 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 30 2014 - 12:00:07 MST