Re: CPU profiling internally in squid from Henrik Nordstrom on 2000-09-29 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Fri, 29 Sep 2000 08:18:17 +0200

And what makes it different from simply using gprof to get automatic
profiling of all functions?

/Henrik

Andres Kroonmaa wrote:
>
> Disregard the previous patch. I've managed to solve my problems
> and updated patch is located at:
> http://www.online.ee/~andre/squid/profil/
> It integrates cleanly with 2.3S4-hno-CVS, adds configure option
> --enable-cpu-profiling, and pretty much works ;)
>
> I've not heard any comments on it, no good, no bad. please.
>
> On 22 Sep 2000, at 11:48, Andres Kroonmaa <andre@online.ee> wrote:
>
> >
> > I've put together some profiling tools into squid, and I'd
> > like to through it out for your review.
> > The patch is clumsy, like my coding skills, and I had some
> > trouble with integrating it nicely.
> > It compiled ok when I patched and recompiled in already compiled
> > tree, but it seems to fail now due to different approach needed
> > to make both src/ and lib/ happy.
> > This is something I'm not good at.
> >
> > Description.
> > Patch adds new src file, profiling.c where profiling probes are
> > implemented. include/profiling.h defines a macro, that can be
> > inserted anywhere in the code, and be used to fire off and stop
> > a measurement. Precision is ca 1 cpu clock tick. Works best for
> > Intel (and only pentiums up) (this is where I'm on), I know it
> > should work with small fixes on alpha, and for sparc some code
> > needs to be added.
> >
> > I've added probes in several places in the source, to have some
> > results. These can be seen at:
> > http://www.online.ee/~andre/squid/profil/
> > http://www.online.ee/~andre/squid/profil/sample-longterm.html
> > and patch itself:
> > http://www.online.ee/~andre/squid/profil/prof.diff
> >
> > Patch adds new menu to cachemgr, and keeps history stats for
> > last 1,5,30 secs and 1,5,30 minutes.
> >
> > probes are measuring delta ticks between invocations, and
> > accumulate this time into a structure. Every 1 seconds, from
> > Event management, summary is generated for each historical
> > period. Time is measured between event runs, 1sec, and time
> > accumulated by probes is compared to it, resulting percentage
> > of CPU use. Best, average and worst cases are also noted. In
> > general, I believe this allows us to have a look at what is
> > eating CPU inside squid. CPU clock timer moves on globally
> > for the whole system, so all non-squid time is unaccounted
> > and may be attributed wrongly to some probes, upping worst
> > case dramatically.
> >
> > This patch seems stable. It has some functional problems: it
> > cannot handle overlapping probes of same name (it would get timing
> > wrong), it tries to measure "idle" time between different probes,
> > (ie time when no probes in running state) and it doesn't workout
> > well. Also, because several probes I've put into code do nest or
> > overlap, totals are also off.
> > Average added overhead seems to be about 650 ticks per probecall,
> > which can be optimised somewhat, at a cost of smaller convenience.
> >
> > One more thing I wanted to ask you gurus. I've implemented probe
> > matching by way of string pointers. To start a probe, you give a
> > const char string as a parameter to a probe, it records the pointer
> > to it, and subsequently compares not string itself, but the pointers.
> > This allows for fast check, and the string itself is used as a
> > description. Seems also to work ok. I understand that it could
> > brake if probe with the same name is started in one module, and
> > stopped in another. Then const char strings would have different
> > pointers? This could result in the probes with the same name to
> > be seen as different. This should also be immediately seen in
> > cachemgr output. Or is such approach acceptable, generally?
> >
> > Please review the sample output, and the patch itself, and comment.
> > Also, hints on how to integrate it properly, or (re)design hints
> > are very welcome.
> >
> > PS. alpha timing macro is there, but I have no acces to alpha right
> > now, so I cannot test. It would be helpful, if someone with skills
> > to get it compiled could test it. Also, if other cpu-types can be
> > added, would be very nice. All that is needed is a func or macro
> > that is able to read out CPU onchip tick-counter, if supported.
> > Actually, profiling code doesn't know or care in what units it
> > is measuring. the only thing it assumes is that counter between
> > sample is incrementing uniformly. It could be actually microsecs,
> > nanosec, or whatever. For some CPU's that have 32bit counters
> > (alpha, isn't it weird that 64bit cpu has 32bit counters?), it
> > may wrap in few seconds. Currently, no code for handling that.
> >
> > PPS. preliminary results hints that squid spends considerable amount
> > of time in preparing FD's for poll (setting up pfds.events), near
> > 10% of total CPU time under about 50 reqs/sec load.
> > Also, ACL matching seems comsuming alot, also approaching 10%.
> > The rest is open for debate.
>
> ------------------------------------
> Andres Kroonmaa <andre@online.ee>
> Delfi Online
> Tel: 6501 731, Fax: 6501 708
> Pärnu mnt. 158, Tallinn,
> 11317 Estonia
Received on Fri Sep 29 2000 - 00:18:20 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:38 MST