RE: [squid-users] SQUID2.6STABLE12 - Assertion failure in hash.c related to DNS cache.

From: Pulu Anau <panau@dont-contact.us>
Date: Thu, 19 Apr 2007 14:10:31 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I've taken all the changes to the source I made out, and got a stack trace from the original error (see below). In addition I turned all the related debugging from squid (not much, but also below).

It appears that the problem occurs when a domain is released from the ipcache at the moment it's getting queried.

Other related settings: I have 3 search domains that get checked, and two nameservers. Ipcache_size, _low, and _high are at the default settings.

If anyone has any idea I would love to hear it, I really don't want to go back to 2.5. Are there other debug sections I should enable? It doesn't appear that hash.c calls the debug routines at all.

Oh yeah, one other thing that I don't think is (directly?) related, when I look at the backtrace, idnsALookup is called with a callback value that it appears from the code it passes directly to idnsCachedLookup, but on all of these I've seen gdb shows the value of the callback as 0.

Thanks,

Pulu


Backtrace:
#0 0x008597a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0015f7a5 in raise () from /lib/tls/libc.so.6
#2 0x00161209 in abort () from /lib/tls/libc.so.6
#3 0x00158d91 in __assert_fail () from /lib/tls/libc.so.6
#4 0x080eedbc in hash_lookup (hid=0x8e5e080, k=0xadb16a0) at hash.c:194
#5 0x0807e470 in idnsCachedLookup (key=0xadb16a0 "login.yahoo.com", callback=0, data=0xa93e3f8) at dns_internal.c:1015
#6 0x0807e528 in idnsALookup (name=0xadb16a0 "login.yahoo.com", callback=0x80a0e0a <ipcacheHandleReply>, data=0xa93e3f8) at dns_internal.c:1041
#7 0x080a10a6 in ipcache_nbgethostbyname (name=0x9235f58 "login.yahoo.com", handler=0x80761f6 <commConnectDnsHandle>, handlerData=0x9358b58) at ipcache.c:456
#8 0x080761ee in commConnectStart (fd=521, host=0x9235f58 "login.yahoo.com", port=80, callback=0, data=0x935c4d8) at comm.c:283
#9 0x08084cd2 in fwdConnectStart (data=0x935c4d8) at forward.c:621
#10 0x08084d3b in fwdStartComplete (servers=0x938eed0, data=0x935c4d8) at forward.c:631
#11 0x080ae3a4 in peerSelectCallback (psstate=0x926f7c0) at peer_select.c:202
#12 0x080ae810 in peerSelectFoo (ps=0x926f7c0) at peer_select.c:310
#13 0x080ae104 in peerSelect (request=0x9235ed0, entry=0xd650d88, callback=0x8084cda <fwdStartComplete>, callback_data=0x935c4d8) at peer_select.c:157
#14 0x08085593 in fwdStart (fd=396, e=0xd650d88, r=0x9235ed0) at forward.c:881
#15 0x08071da4 in clientProcessMiss (http=0xd0fb890) at client_side.c:3459
#16 0x080719dd in clientProcessRequest (http=0xd0fb890) at client_side.c:3375
#17 0x0806b3e9 in clientCheckNoCacheDone (answer=0, data=0x1b2a) at client_side.c:667
#18 0x08055093 in aclCheckCallback (checklist=0x922f060, answer=ACCESS_DENIED) at acl.c:2273
#19 0x08054ef4 in aclCheck (checklist=0x922f060) at acl.c:2230
#20 0x08055390 in aclNBCheck (checklist=0x922f060, callback=0, callback_data=0x6) at acl.c:2408
#21 0x0806b38a in clientCheckNoCache (http=0xd0fb890) at client_side.c:655
#22 0x0806a778 in clientAccessCheck2 (data=0xd0fb890) at client_side.c:369
#23 0x0806b32d in clientRedirectDone (data=0xd0fb890, result=0x0) at client_side.c:647
#24 0x080afcca in redirectHandleReply (data=0x93593e8, reply=0x0) at redirect.c:70
#25 0x08090d3b in helperHandleRead (fd=7, data=0x8e6e698) at helper.c:769
#26 0x080792cc in do_comm_select (msec=10) at comm_generic.c:264
#27 0x08078cc1 in comm_select (msec=10) at comm_generic.c:386
#28 0x080a3b91 in main (argc=2, argv=0xbfeecfa4) at main.c:838
#29 0x0014cde3 in __libc_start_main () from /lib/tls/libc.so.6
#30 0x0804ebe9 in _start ()


Last three lines from cache.log (crashed immediately after this)
2007/04/19 13:26:28| ipcache_nbgethostbyname: Name 'login.yahoo.com'.
2007/04/19 13:26:28| ipcacheRelease: Releasing entry for 'login.yahoo.com'
2007/04/19 13:26:28| ipcache_nbgethostbyname: MISS for 'login.yahoo.com'

My GCC (should it matter):
Reading specs from /usr/lib/gcc/i386-redhat-linux/3.4.6/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-java-awt=gtk --host=i386-redhat-linux
Thread model: posix
gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)


- -----Original Message-----
From: Pulu Anau [mailto:panau@2wire.com]
Sent: Wednesday, April 18, 2007 5:35 PM
To: squid-users@squid-cache.org
Subject: [squid-users] SQUID2.6STABLE12 - Assertion failure in hash.c related to DNS cache.

* PGP Signed: 04/18/07 at 17:35:07

Hi,

Note: http://marc.info/?l=squid-users&m=115342544210559&w=2

I am having the exact same problem and in the same place as the message mentioned above. Unfortunately I attempted to turn the crash into a memory leak by removing the assertion that was causing the crash (assert(walker != walker->next); in lib/hash.c) and ended up with debug data that doesn’t match my cores. But, the crash just moved to another place in hash.c:

(gdb) bt
#0 0x008597a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x008997a5 in raise () from /lib/tls/libc.so.6
#2 0x0089b209 in abort () from /lib/tls/libc.so.6
#3 0x00892d91 in __assert_fail () from /lib/tls/libc.so.6
#4 0x080db31c in hash_remove_link (hid=Variable "hid" is not available.
) at hash.c:281
#5 0x0807552d in idnsCallback (q=0xaf9ee38, answers=0x9b12ab8, n=2, error=0x0) at dns_internal.c:650
#6 0x08076339 in idnsGrokReply (buf=Variable "buf" is not available.
) at dns_internal.c:826
#7 0x08076543 in idnsRead (fd=6, data=0x0) at dns_internal.c:883
#8 0x08071ff8 in comm_select (msec=153) at comm_generic.c:264
#9 0x080966c1 in main (argc=2, argv=0xbfe6f8b4) at main.c:838
#10 0x00886de3 in __libc_start_main () from /lib/tls/libc.so.6
#11 0x0804ee41 in _start ()

The assertion that triggers this is (I believe, I’m far from a C programming wizard) from the same thing, searching for a key in this hash that should be there but isn’t.

Until it crashes it runs along fine, the performance is so much better than previous versions (cpu usage on the server went from 85% squid to 2%) that I’d hate to go back. This is built from the source RPM at http://people.redhat.com/stransky/squid on RHES4 both 32bit and 64bit. The patches in the RPM are almost all related to file locations.

Does anyone have any idea how I might better try and track this down? Would making/finding an external DNS resolver help?

Any help would be greatly appreciated.

Pulu


* Pulu Anau <panau@2wire.com>
* 0x4A79A7C4(L)

-----BEGIN PGP SIGNATURE-----
Version: 9.6.1 (Build 1012)

wsFVAwUBRifax4WUwm1KeafEAQipNhAAkor5gkjXQj4FSMoTokg7GicV8VcUTNDs
ZUOdrFl3bk6dLIwdMz2ZzptiVv2szREXPTXtFyJyeVDAEHJtLhXg5si85wYi1X/w
YZaqN0kerYiOqqqEHsrzrIghnkGngfuqxZ/wDBnvcJaQdcjkNCfqct7qBRZfDB+u
ZEXB6TSSeVAuI37wetFv+6cwAqfqFKJgScwf1EvxHMoHaTy/05UTP9h67+3c8mcg
4PBXgjIVB1d3D493VZE8cVWFecggTeZ5hrHQles3SkTUi4/gdwFmbpBkK8yXPj1n
HV5LoL5S1Yw0d7iYzSqwrOLWrPgvHYG7dMaGlDABgKG7BOABZZCEhQE3dceQ+CPs
LUBuN9AQ8BMHHc5dWQZHgb6Tmo9c7Lqm6O2lKFjGDFw8yIU9qR0Kvf75C05XQAzE
ltQeIG2Yd1qKF3TyctHIvk2ARrCA92BbTcqASFEFUDj4mCOXfjLR1+kdEGyteEwY
o8mirxtjcWIK+27jjsyQM9d3wzhD06IWKwa0Y89ahuVdfPJbwDRrlgAG6UYidLue
zYyCYftXaWIyXRbnbAEf2UfZCEGmrKPm0fgaEiiflw20jKTdWtVFugzUCdLMnJs3
nLj9rBMqqfBd6rEeoEVcGIbq3UN5OewdySKFQ7XMF49dBfK4E2Lnq2GA2dl2xxNl
Jy5ms8tDshM=
=/B2m
-----END PGP SIGNATURE-----

Received on Thu Apr 19 2007 - 15:10:39 MDT

This archive was generated by hypermail pre-2.1.9 : Tue May 01 2007 - 12:00:01 MDT