dnsserver and always_keepalive.

From: Scott Hess <scott@dont-contact.us>
Date: Thu, 5 Aug 1999 10:20:34 -0700

Over time, our http_accel squid (2.1.PATCH2 on FreeBSD3.1 w/kva patches)
accumulates a lot of sockets in the CLOSING state. Eventually, the box
starts getting wedged (requests start taking longer and longer to service),
and only a reboot makes things smooth again. [Being honest, the tie between
CLOSING sockets and a wedged box is tenuous, at best. It's happened three
times, and those sockets are the only thing odd I've been able to find.]

As a fix for the CLOSING sockets, we've been experimenting with setting
net.inet.tcp.always_keepalive to 1. This causes the kernel to send
keepalive packets after a couple hours. If the remote end of the connection
still exists, it will get a response, if not, it will timeout the
connection. So the CLOSING sockets go away.

A couple hours after turning this on, squid reported in the error logs that
it got a timeout reading from a dnsserver, and that the dnsserver in
question had exitted. For the next fifteen minutes, it served up pages with
status 504, 503, and 000. 000! The error logs look like:

Aug 4 15:26:52 www squid[219]: helperHandleRead: FD 6 read: (60) Operation
timed out
Aug 4 15:26:52 www squid[219]: WARNING: dnsserver #1 (FD 6) exited
Aug 4 15:41:54 www squid[219]: ipcache_nbgethostbyname: 'ui.avantgo.com'
PENDING for 902 seconds, aborting
Aug 4 15:41:54 www squid[219]: ipcacheChangeKey: from 'ui.avantgo.com' to
'1/ui.avantgo.com'

Also, I notice that as of right now, it's only got a single dnsserver
process, though it's configured for 5. There are no messages regarding
dnsserver's dying in the squid error logs. That's why I suspect that it's
the keepalive thing - only the latest-used dnsserver is alive, and all the
others exitted due to their TCP connection dying. But squid hasn't hit any
of them, so it doesn't realize that they're dead.

First question - do my theories sound rational?

Second question - why doesn't squid catch the dead dnsservers earlier?
Obviously a number of them have died, but squid hasn't logged the fact.

Third question - why, when it saw that the dnsserver had died, did squid
lose it's mind?

[I know, I know, upgrade to a later version. But the current version has
run fine for a long time, and until now we've had zero problems attributed
to squid (I attribute the CLOSING sockets to the OS). Besides, I don't want
to upgrade unless this is a problem likely to be fixed by the upgrade.]

Thanks,
scott
Received on Thu Aug 05 1999 - 11:08:26 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:47:51 MST