RE: Cacheoff results published.

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Fri, 13 Oct 2000 01:23:47 +0200

On 12 Oct 2000, at 13:52, Chemolli Francesco (USI) <ChemolliF@GruppoCredit.it> wrote:

> > Also worth noting is that the simulated environment includes
> > delays that
> > cause file descriptors to run at well over 1000...I think our box was
> > topping 2000 in the peak phases.
>
> That could be an important factor. I heard that poll is one of the biggest
> CPU hogs in squid. The bigger the FDset, the more it hogs.
> This is why I explicitly specified my FD usage info.

 I think this is a misconception. poll() itself is not the biggest cpu hog in
 squid. Although for quite some time squid has had a bug that caused it poll
 DNS incoming socket after each and any operation on normal sockets, causing
 quite alot of useless cpu burn. It seems that 2.4Devel4 has this issue.

 Squid is sitting alot of time in poll() for many reasons, but mostly waiting
 for io or OS doing io for squid. Lots of work is done by system during the
 cpu time of squid even if second cpu is idle, but this is an issue with SMP
 scaling, not poll overhead. Usually what is meant by poll cpu hogging is
 pure handling and parsing of large FDset, and this is wrong. Pure overhead
 of poll is reaching some 10% of system cpu (depending on poll frequency,
 obviously, I here assume 100 times/sec) only after reaching some 1000-2000
 open files (depends on cpu speed).
 Of course, time spent in poll is lost for squid, and this can be solved with
 some sort of async notification or separate thread, moving system work done
 for squid by OS "into background", and leaving more cpu time available for
 squid itself, but this is a matter of SMP scaling again.

 All this is quite a non-issue under high loads, because poll is called only
 when there are no servicable FDs left. And as load increases, poll frequency
 is reducing, thus overhead of poll is deminishing. poll overhead is worst
 when there are thousands of FDs open and only one at a time becomes ready.
 Then poll rate goes skyrocket. Under realistic and high loads most time is
 spent servicing ready sockets and poll is called with quite a low rate.
 In fact, dropping poll in favour of async notification could result in more
 overhead under such high loads than poll is currently adding.

 For relatively idle Squid one of biggest cpu hogs is preparation of FDs for
 poll, especially Defer checking. With high-res timing I see that it takes
 consistenly at least 5-6 times more cpu time to prepare FD array for poll
 than it takes poll to update FDset states and return if it has any FD ready.
 This shows that poll's own overhead is much less than preparing for poll.
 This preparation hooks upto 15% of total system cpu time (under my load
 patterns) on average. And this overhead increases with increasing number
 of open files, faster than overhead of poll. But as it is called as often
 as poll, it has the same deminishing total overhead as poll has under
 high loads.

 Highest cpu time goes to handling reads from network which most probably
 has to do with parsing headers, and also time spent in system. This could
 also be somewhat offloaded, but lots of effort should go into optimising
 this part of squid.
 Under stress-tests with some 3000 concurrent sessions I see that handling
 of network reads takes upto 50% of system CPU, handling writes upto 20%,
 poll overhead goes up, but total cpu time spent in poll is reduced because
 of lowered poll rate, as also happens with FDset preparation overhead.
 Currently, squid needs to poll between read and write for the same client.
 When this will be redesigned, poll rate will drop even more, leaving more
 cpu for handling network reads.

 Next biggest cpu hog seems ACL matching. With some 30 ACL's in total on my
 box acl matching seems to hook upto 10% of total cpu on system. Not being
 very complex ACLs this seems quite excessive. Under stress-tests ACL checks
 showed upto 20% of total system CPU.
 While looking into it I noticed that for some reasons same ACLs are evaluated
 multiple times over again. It seems that it happens if squid must resolv DNS
 for request to proceed. Weird is that urlpath_regex type ACLs are reevaluated
 many times (I've seen upto 9-11 times per regex acl per request).

 Quite amazing is the amount of memory allocs/frees per request. Currently
 at about 120 allocs/request and 120 frees/req, alloc/free rate can very
 easily go to high thousands per second. (how about 50000 mallocs/sec?)
 Surprisingly, it doesn't have very high impact on CPU, I've seen upto 5%
 of total system CPU with 50K mallocs/sec. Yet this is definitely burning
 alot of cpu besides mallocs themselves and can also become a limiting
 factor under very high loads. So optimisations to reduce memory allocation
 and fast release should be undertaken.

 thats what I observe on squid 2.3 with async-io.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Thu Oct 12 2000 - 17:27:19 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:42 MST