Disregard the previous patch. I've managed to solve my problems
 and updated patch is located at:
    http://www.online.ee/~andre/squid/profil/
 It integrates cleanly with 2.3S4-hno-CVS, adds configure option
 --enable-cpu-profiling, and pretty much works ;)
 I've not heard any comments on it, no good, no bad. please.
On 22 Sep 2000, at 11:48, Andres Kroonmaa <andre@online.ee> wrote:
> 
>  I've put together some profiling tools into squid, and I'd
>  like to through it out for your review.
>  The patch is clumsy, like my coding skills, and I had some
>  trouble with integrating it nicely.
>  It compiled ok when I patched and recompiled in already compiled
>  tree, but it seems to fail now due to different approach needed
>  to make both src/ and lib/ happy.
>  This is something I'm not good at.
> 
>  Description.
>  Patch adds new src file, profiling.c where profiling probes are
>  implemented. include/profiling.h defines a macro, that can be
>  inserted anywhere in the code, and be used to fire off and stop
>  a measurement. Precision is ca 1 cpu clock tick. Works best for
>  Intel (and only pentiums up) (this is where I'm on), I know it
>  should work with small fixes on alpha, and for sparc some code
>  needs to be added.
> 
>  I've added probes in several places in the source, to have some
>  results. These can be seen at:
>   http://www.online.ee/~andre/squid/profil/
>   http://www.online.ee/~andre/squid/profil/sample-longterm.html
>  and patch itself:
>   http://www.online.ee/~andre/squid/profil/prof.diff
> 
>  Patch adds new menu to cachemgr, and keeps history stats for
>  last 1,5,30 secs and 1,5,30 minutes.
> 
>  probes are measuring delta ticks between invocations, and
>  accumulate this time into a structure. Every 1 seconds, from
>  Event management, summary is generated for each historical
>  period. Time is measured between event runs, 1sec, and time
>  accumulated by probes is compared to it, resulting percentage
>  of CPU use. Best, average and worst cases are also noted. In
>  general, I believe this allows us to have a look at what is
>  eating CPU inside squid. CPU clock timer moves on globally
>  for the whole system, so all non-squid time is unaccounted
>  and may be attributed wrongly to some probes, upping worst
>  case dramatically.
> 
>  This patch seems stable. It has some functional problems: it
>  cannot handle overlapping probes of same name (it would get timing
>  wrong), it tries to measure "idle" time between different probes,
>  (ie time when no probes in running state) and it doesn't workout
>  well. Also, because several probes I've put into code do nest or
>  overlap, totals are also off.
>  Average added overhead seems to be about 650 ticks per probecall,
>  which can be optimised somewhat, at a cost of smaller convenience.
> 
>  One more thing I wanted to ask you gurus. I've implemented probe
>  matching by way of string pointers. To start a probe, you give a
>  const char string as a parameter to a probe, it records the pointer
>  to it, and subsequently compares not string itself, but the pointers.
>  This allows for fast check, and the string itself is used as a
>  description. Seems also to work ok. I understand that it could
>  brake if probe with the same name is started in one module, and
>  stopped in another. Then const char strings would have different
>  pointers? This could result in the probes with the same name to
>  be seen as different. This should also be immediately seen in
>  cachemgr output. Or is such approach acceptable, generally?
> 
>  Please review the sample output, and the patch itself, and comment.
>  Also, hints on how to integrate it properly, or (re)design hints
>  are very welcome.
> 
> PS. alpha timing macro is there, but I have no acces to alpha right
>  now, so I cannot test. It would be helpful, if someone with skills
>  to get it compiled could test it. Also, if other cpu-types can be
>  added, would be very nice. All that is needed is a func or macro
>  that is able to read out CPU onchip tick-counter, if supported.
>  Actually, profiling code doesn't know or care in what units it
>  is measuring. the only thing it assumes is that counter between
>  sample is incrementing uniformly. It could be actually microsecs,
>  nanosec, or whatever. For some CPU's that have 32bit counters
>  (alpha, isn't it weird that 64bit cpu has 32bit counters?), it
>  may wrap in few seconds. Currently, no code for handling that.
> 
> PPS. preliminary results hints that squid spends considerable amount
>  of time in preparing FD's for poll (setting up pfds.events), near
>  10% of total CPU time under about 50 reqs/sec load.
>  Also, ACL matching seems comsuming alot, also approaching 10%.
>  The rest is open for debate.
------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Thu Sep 28 2000 - 15:43:47 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:38 MST