Re: [squid-users] Occasional slow connections/timeouts from Amos Jeffries on 2014-02-24 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 25 Feb 2014 12:49:06 +1300

On 2014-02-25 02:12, Simon Beale wrote:
>> On 2014-02-21 06:10, Simon Beale wrote:
>>> I've got a problem at the moment with our general squid proxies where
>>> occasionally requests take a long time that shouldn't do. (i.e. 5+
>>> seconds
>>> or timeout, instead of milliseconds).
>>>
>>> This is most common on our proxies doing 100 reqs/sec, but happens
>>> overnight too when they're running at 10 reqs/sec. I've got this
>>> happening
>>> with both v3.4.2 and also with a box I've downgraded back to v3.1.10.
>>> For
>>> v3.4.2, it's happening in both multiple worker and single worker
>>> modes.
>
> As a follow up, we've narrowed this down to the internal DNS resolver.
> When I deploy a 3.4.2 (which is what we're running elsewhere) that's
> been
> recompiled with "--disable-internal-dns", the problem completely goes
> away.
>
>> What sort of CPU loading do you have at ~100req/sec?
>> is that at or near your local installations req/sec capacity?
>
> For the box running with a single worker, it consumes 50% of one core
> at
> 100 req/sec.
> For the boxes running with 9 workers, each worker consumes 5% of a core
> at
> the same rate.
>
>>> The test is not reproducible, sadly, but I've got a cronjob running
>>> on
>>> localhost on these boxes testing access times to various URLs
>>> covering:
>>> HTTPS, non-HTTPS static content, using IP not hostname over both HTTP
>>> and
>>> HTTPS, and a URL on the same vlan as the proxies. All of these test
>>> cases
>>> have it happen occasionally, but not repeatedly/reliably.
>>
>> Some ideas:
>> * DNS lookup delays ?
>
> Yeah, when I enabled the dns resolution time logging in squid, that
> became
> apparent.
>
> Quite why the internal dns resolver shows this, but the external one
> doesn't, I don't know. The DNS server query logs show both DNS servers
> in
> /etc/resolv.conf getting the request in turn and answering it (though 5
> seconds apart). It's happening for us in multiple datacentres, so is
> unlikely to be port errors or internal packet loss.

The dnsserver helper used when internal DNS is disabled uses
gethostname()/getaddrinfo() and thus the local machine resolver. It has
a limit of ~250 req/sec on most systems and most times does not support
IPv6 DNS resolvers configured through squid.conf (does support them if
configured through resolv.conf).

The internal DNS client is using a form of happy-eyeballs scheduling to
send A and AAAA packets but waiting for *both* responses before
continuing (unlike full-blown happy eyeballs which goes with the first
response regardless of missing IPs). It should only be contacting one of
the resolvers at a time.

From your above description it sounds like the first resolver configured
is occasionally "failing" after 5 sec and Squid is moving on to the
second, which works.
Do you have "dns_timeout 5 seconds" configured ?

With internal DNS enabled your cachemgr "idns" report has a lot of
detail on the particular errors and actions happening.

NP: with Squid-3.4 the DNS lookup timeout has been detached from the TCP
connect_timeout and only happens once per connection destination. So you
can set each as short as you wish without affecting the other connection
setup steps.

Amos

>
> It's only(/mostly?) apparent on our squid servers that do desktop
> proxying, so do lots of DNS requests to everywhere; the squid servers
> that
> handle just our datacentre servers don't show this problem, but only
> really go to about 40 hosts in total.
>
> Thanks
>
> Simon
Received on Mon Feb 24 2014 - 23:49:11 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 25 2014 - 12:00:08 MST