[squid-users] COSS causing squid Segment Violation on FreeBSD 6.2S

From: Mark Powell <M.S.Powell@dont-contact.us>
Date: Thu, 26 Apr 2007 11:20:23 +0100 (BST)

Hi,
   Just in the process of putting a small percentage of our web requests
through 3 new caches to test them. However, I'm encountering SEGV
seemingly due to COSS. Two of the caches ran for about a day and then
failed e.g.

2007/04/26 10:39:26| storeCossCompletePendingReloc: got failure (-1)
FATAL: Received Segment Violation...dying.

When the caches restart, read the COSS dir and then when they finish
reading it they die with the same error again. They are both seemingly in
a loop doing this forever now. However, the other cache is still running
happily (perhaps just luck?).
   COSS drives were completely blanked before use.
   They are all configured the same. Dell Poweredge 2650, PERC 4/DC RAID
controller, 4GB RAM, 2x3.2GHz Xeon, 5x72GB 15Krpm drives:

2x72GB RAID 1 OS and everything except cache_dir
1x72GB JBOD COSS
1x72GB JBOD aufs
1x72GB JBOD aufs

Very recent 32bit FreeBSD 6.2-STABLE #94: Fri Apr 20 11:22:18 BST 2007.
   Using the FreeBSD squid port which is currently 2.6-STABLE12. As both
COSS and aufs are specified, I believe that both use internal AIO code in
squid? Therefore the FreeBSD VFS_AIO module is not required?

cache_dir coss /dev/amrd1 65000 max-size=16384 block-size=4096
cache_dir aufs /2 56000 16 256
cache_dir aufs /3 56000 16 256

I changed squid to libthr library using /etc/libmap.conf:

# ldd /usr/local/sbin/squid
/usr/local/sbin/squid:
         libcrypt.so.3 => /lib/libcrypt.so.3 (0x88158000)
         libm.so.4 => /lib/libm.so.4 (0x88171000)
         libpthread.so.2 => /usr/lib/libthr.so.2 (0x88187000)
         libc.so.6 => /lib/libc.so.6 (0x8819a000)

I always map to libthr as in the past it has been more stable. Well
libpthread causes crashes with MySQL where libthr is ok.
   A quick look at the squid.core gives (I'm no debugging expert :( ):

(gdb) where
#0 0x88264bb7 in memset () from /lib/libc.so.6
#1 0x00004144 in ?? ()
#2 0x8826194e in calloc () from /lib/libc.so.6
#3 0x080fa334 in xcalloc (n=28, sz=2284278208) at util.c:561
#4 0x080e7a7d in storeCossDirWriteCleanStart (sd=0x82a4000) at coss/store_dir_coss.c:409
#5 0x080c94b5 in storeDirWriteCleanLogs (reopen=0) at store_dir.c:426
#6 0x080cc4dc in death (sig=0) at tools.c:314
#7 0xbfbfff94 in ?? ()
#8 0x0000000b in ?? ()
#9 0x0000000c in ?? ()
#10 0xbfbfe610 in ?? ()
#11 0x08047ffc in ?? ()
#12 0x080cc495 in uniqueHostname () at tools.c:556
#13 0x080e729f in aioCheckCallbacks (SD=0x82a40b0) at aufs/async_io.c:319
#14 0x080c97b2 in storeDirCallback () at store_dir.c:508
#15 0x0807b710 in comm_select (msec=10) at comm_generic.c:377
#16 0x080a568a in main (argc=2, argv=0xbfbfeea4) at main.c:837

   Just tried it with libpthread and got the same error once it had read
the COSS dir. The debug gives:

(gdb) where
#0 0x881a5abf in pthread_testcancel () from /lib/libpthread.so.2
#1 0x8819df3b in pthread_mutexattr_init () from /lib/libpthread.so.2
#2 0x882a3450 in ?? ()

Any ideas? Or can I find more info to help nail this?
   Many thanks for any pointers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information Services Division, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 4837  Fax: +44 161 295 5888  www.pgp.com for PGP key
Received on Thu Apr 26 2007 - 04:20:34 MDT

This archive was generated by hypermail pre-2.1.9 : Tue May 01 2007 - 12:00:01 MDT