Mysterious CPU eater in 3.0

From: Joe Cooper <joe@dont-contact.us>
Date: Tue, 03 Jun 2003 09:07:57 -0500

Hi all,

I've stumbled onto an interesting issue with 3.0. I'm pretty sure the
trigger is very large files (mean object size of 5MB on a cache_dir of
60GB--some files over 100MB, many around 20MB) being served very fast to
a few (~50) simultaneous clients. It is not a particularly hard
workload, though the single IDE disk is pushing a goodly amount of
data--it would hold up find under 2.5.

CPU load is fine at ~50% idle, until something seems to 'snap' in Squid
and CPU hits 100% usage and stays there for minutes or an hour or more,
until either a crash with an assertion failure (below), or occasionally
it hangs on and eventually recovers (very rare, and maybe I even
imagined it--I might have missed the crash in the chattiness of the
logs). During this time, data rates drop to almost nothing
(2-3Kbytes/sec, as opposed to potentially MB/sec).

The assertion failure:

2003/06/03 03:16:22| assertion failed: ../include/Array.h:298: "theVector"

I think the assertion failure is triggered by the CPU hogging bug rather
than the assertion being caused directly by the same problem. But I
could be wrong...I'm just thinking that the CPU peak and the assertion
failure never come at the same time, and always at somewhat randomized
intervals apart.

Robert suggested I take a look at it with the cpu-profile configure
option, and so I did...Everything looks mostly normal in the 1 and 5
second averages, in that comm_poll_normal takes 85% for the 1 second
average and 55% for the 5 second, with everything else dividing up in
tiny little pieces everything else. But then for the 30 sec and 1 min
averages, PROF_UNACCOUNTED gets 92% and everything else just gets a few
scraps, including comm_poll_normal.

Other possible data points:
range_offset_limit and quick_abort_min, when set to -1 seems to cause
the bug to happen more frequently (but maybe traffic was higher before I
turned these off).

Anyone have a clue where my CPU is going? Any thoughts on what this bug
is about? (I have a nice little budget for killing this bug fast, if
anyone wants to make a quick dig into it.)

-- 
Joe Cooper <joe@swelltech.com>
Web caching appliances and support.
http://www.swelltech.com
Received on Tue Jun 03 2003 - 08:08:02 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:20:06 MST