Re: [squid-users] Squid crashing - assertion failed - using COSS

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 07 Jan 2011 17:31:00 +1300

On 07/01/11 06:48, Robert Pipca wrote:
> Hi All,
>
> I'm using LUSCA_HEAD-r14756 on an ISP network with about 3000 users.
>
> I tested squid-2.7-stable7 and had the same error.
>
> Everything is great, except that once in a while (ie. 2, 3 times a
> week) Lusca dies and I get this assertion failed in the COSS code:
>
> "-1 != sio->swap_filen"
>
> Now, looking at the code in fs/coss/store_io_coss.c, I saw this:
>
> sio->swap_filen = storeCossAllocate(SD, e, COSS_ALLOC_ALLOCATE);
>
> The code of storeCossAllocate actually can return -1 in several
> paths...so I'm wondering why lusca aborts on it, rather than returning
> an error.
>
> Since the COSS rebuild takes about an hour, the ISP takes a bandwidth
> blow to the head everytime this happens.
>
> Any clue why and how can it be fixed?
>
> My cache_dir setup is like this:
>
> cache_dir aufs /cache 69775 60 500 min-size=1048576
>
> cache_dir coss /coss1 65520 max-size=1048575 max-stripe-waste=32768
> block-size=4096 maxfullbufs=20
> cache_dir coss /coss2 65520 max-size=1048575 max-stripe-waste=32768
> block-size=4096 maxfullbufs=20
> cache_dir coss /coss3 65520 max-size=1048575 max-stripe-waste=32768
> block-size=4096 maxfullbufs=20
>
> I don't have that much more on my logs, but if there's any info I can
> provide you, I can try and dig it up.
>
> Thanks for your help.
>
> I saw the the aufs code returns NULL instead of sio if some part
> doesn't work, so I change from:
>
> assert (-1 != sio->swap_filen);
>
> to
>
> if (-1 == sio->swap_filen)
> return NULL;
>
> This opened a memory leak, but it's better than squid crashing.
>
> But it didn't work either. I got "FATAL: Received Segment
> Violation...dying" this morning.
>
> Should I maybe test assign storeCossAllocate to a temporary variable
> and only create the sio if that temporary variable is not -1? Would
> this work?
>
> I don't get why both squid and lusca are crashing from this change, though.

Your config shows ~69 GB of small files. Each cache_dir has a maximum
count of 2^31 files. It looks like that file count is being exceeded and
the overflow handling is broken.

Try 2.7.STABLE9 and see if its one of the bugs fixed there.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.10
   Beta testers wanted for 3.2.0.4
Received on Fri Jan 07 2011 - 04:31:20 MST

This archive was generated by hypermail 2.2.0 : Fri Jan 07 2011 - 12:00:02 MST