Re: Mysterious CPU eater in 3.0

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Tue, 03 Jun 2003 21:13:51 +0300

 Prof_unaccounted gets assigned time during which no timers are active.
 Most probes that are there have been added ad hoc to cover most busy
 parts of squid. Thus we can assume that cpu is hogged by some section
 of code thats not normally very cpu intensive.
 You could try either add more probes here and there, or try to get stack
 during peak cpu moments to get better hints on area of trouble.
 Is the cpu time userspace or kernel?

On 3 Jun 2003 at 9:07, Joe Cooper wrote:

> Hi all,
>
> I've stumbled onto an interesting issue with 3.0. I'm pretty sure the
> trigger is very large files (mean object size of 5MB on a cache_dir of
> 60GB--some files over 100MB, many around 20MB) being served very fast to
> a few (~50) simultaneous clients. It is not a particularly hard
> workload, though the single IDE disk is pushing a goodly amount of
> data--it would hold up find under 2.5.
>
> CPU load is fine at ~50% idle, until something seems to 'snap' in Squid
> and CPU hits 100% usage and stays there for minutes or an hour or more,
> until either a crash with an assertion failure (below), or occasionally
> it hangs on and eventually recovers (very rare, and maybe I even
> imagined it--I might have missed the crash in the chattiness of the
> logs). During this time, data rates drop to almost nothing
> (2-3Kbytes/sec, as opposed to potentially MB/sec).
>
> The assertion failure:
>
> 2003/06/03 03:16:22| assertion failed: ../include/Array.h:298: "theVector"
>
> I think the assertion failure is triggered by the CPU hogging bug rather
> than the assertion being caused directly by the same problem. But I
> could be wrong...I'm just thinking that the CPU peak and the assertion
> failure never come at the same time, and always at somewhat randomized
> intervals apart.
>
> Robert suggested I take a look at it with the cpu-profile configure
> option, and so I did...Everything looks mostly normal in the 1 and 5
> second averages, in that comm_poll_normal takes 85% for the 1 second
> average and 55% for the 5 second, with everything else dividing up in
> tiny little pieces everything else. But then for the 30 sec and 1 min
> averages, PROF_UNACCOUNTED gets 92% and everything else just gets a few
> scraps, including comm_poll_normal.
>
> Other possible data points:
> range_offset_limit and quick_abort_min, when set to -1 seems to cause
> the bug to happen more frequently (but maybe traffic was higher before I
> turned these off).
>
> Anyone have a clue where my CPU is going? Any thoughts on what this bug
> is about? (I have a nice little budget for killing this bug fast, if
> anyone wants to make a quick dig into it.)
> --
> Joe Cooper <joe@swelltech.com>
> Web caching appliances and support.
> http://www.swelltech.com
>
>
Received on Tue Jun 03 2003 - 11:13:45 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:20:06 MST