Re: CPU profiling internally in squid

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Thu, 28 Sep 2000 23:40:39 +0200

 Disregard the previous patch. I've managed to solve my problems
 and updated patch is located at:
    http://www.online.ee/~andre/squid/profil/
 It integrates cleanly with 2.3S4-hno-CVS, adds configure option
 --enable-cpu-profiling, and pretty much works ;)

 I've not heard any comments on it, no good, no bad. please.

On 22 Sep 2000, at 11:48, Andres Kroonmaa <andre@online.ee> wrote:

>
> I've put together some profiling tools into squid, and I'd
> like to through it out for your review.
> The patch is clumsy, like my coding skills, and I had some
> trouble with integrating it nicely.
> It compiled ok when I patched and recompiled in already compiled
> tree, but it seems to fail now due to different approach needed
> to make both src/ and lib/ happy.
> This is something I'm not good at.
>
> Description.
> Patch adds new src file, profiling.c where profiling probes are
> implemented. include/profiling.h defines a macro, that can be
> inserted anywhere in the code, and be used to fire off and stop
> a measurement. Precision is ca 1 cpu clock tick. Works best for
> Intel (and only pentiums up) (this is where I'm on), I know it
> should work with small fixes on alpha, and for sparc some code
> needs to be added.
>
> I've added probes in several places in the source, to have some
> results. These can be seen at:
> http://www.online.ee/~andre/squid/profil/
> http://www.online.ee/~andre/squid/profil/sample-longterm.html
> and patch itself:
> http://www.online.ee/~andre/squid/profil/prof.diff
>
> Patch adds new menu to cachemgr, and keeps history stats for
> last 1,5,30 secs and 1,5,30 minutes.
>
> probes are measuring delta ticks between invocations, and
> accumulate this time into a structure. Every 1 seconds, from
> Event management, summary is generated for each historical
> period. Time is measured between event runs, 1sec, and time
> accumulated by probes is compared to it, resulting percentage
> of CPU use. Best, average and worst cases are also noted. In
> general, I believe this allows us to have a look at what is
> eating CPU inside squid. CPU clock timer moves on globally
> for the whole system, so all non-squid time is unaccounted
> and may be attributed wrongly to some probes, upping worst
> case dramatically.
>
> This patch seems stable. It has some functional problems: it
> cannot handle overlapping probes of same name (it would get timing
> wrong), it tries to measure "idle" time between different probes,
> (ie time when no probes in running state) and it doesn't workout
> well. Also, because several probes I've put into code do nest or
> overlap, totals are also off.
> Average added overhead seems to be about 650 ticks per probecall,
> which can be optimised somewhat, at a cost of smaller convenience.
>
> One more thing I wanted to ask you gurus. I've implemented probe
> matching by way of string pointers. To start a probe, you give a
> const char string as a parameter to a probe, it records the pointer
> to it, and subsequently compares not string itself, but the pointers.
> This allows for fast check, and the string itself is used as a
> description. Seems also to work ok. I understand that it could
> brake if probe with the same name is started in one module, and
> stopped in another. Then const char strings would have different
> pointers? This could result in the probes with the same name to
> be seen as different. This should also be immediately seen in
> cachemgr output. Or is such approach acceptable, generally?
>
> Please review the sample output, and the patch itself, and comment.
> Also, hints on how to integrate it properly, or (re)design hints
> are very welcome.
>
> PS. alpha timing macro is there, but I have no acces to alpha right
> now, so I cannot test. It would be helpful, if someone with skills
> to get it compiled could test it. Also, if other cpu-types can be
> added, would be very nice. All that is needed is a func or macro
> that is able to read out CPU onchip tick-counter, if supported.
> Actually, profiling code doesn't know or care in what units it
> is measuring. the only thing it assumes is that counter between
> sample is incrementing uniformly. It could be actually microsecs,
> nanosec, or whatever. For some CPU's that have 32bit counters
> (alpha, isn't it weird that 64bit cpu has 32bit counters?), it
> may wrap in few seconds. Currently, no code for handling that.
>
> PPS. preliminary results hints that squid spends considerable amount
> of time in preparing FD's for poll (setting up pfds.events), near
> 10% of total CPU time under about 50 reqs/sec load.
> Also, ACL matching seems comsuming alot, also approaching 10%.
> The rest is open for debate.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Thu Sep 28 2000 - 15:43:47 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:38 MST