Re: more profiling

From: Gonzalo Arana <gonzalo.arana@dont-contact.us>
Date: Tue, 19 Sep 2006 09:58:54 -0300

On 9/19/06, Adrian Chadd <adrian@creative.net.au> wrote:
> On Tue, Sep 19, 2006, Andres Kroonmaa wrote:
>

...

> > Since then quite alot of changes have happened, so I'd
> > suggest to look at the gprof stats to decide what funcs
> > to probe with hires prof and add them.
>
> Yeah, I'm thinking that too.

Is hires profiling *that* heavy? I've used it in my production squids
(while I've used squid3) and the overhead was neglible.

There is a comment in profiling.h claiming that rdtsc (for x86 arch)
stalls CPU pipes. That's not what Intel documentation says (page 213
-numbered as 4-209- of the Intel Architecture Software Developer
Manual, volume 2b, Instruction Reference N-Z).

So, it should be harmless to profile as much code as possible, am I right?

> > Also review the probes already there - you'd want to make
> > sure a probe isn't left "running" at any function exit
> > point - this would lead to accounting to a probe sections
> > of code incorrectly.

This could be automatically done by the compiler, if the profile probe
was contained in an object. The object will get automatically
destroyed (and therefore the profiling probe will stop) when the
function exits.

> > There's something fishy with "best case" timers. They
> > shouldn't be zero, ever. Ditto "worst case" - they *can*
> > get high due to task switches, but your worst cases look
> > way too high, on P4 2.6G there should be 2.6G ticks per
> > second. Your worst case looks like probes have been
> > running for 8.9secs straight, seems unlikely.
> > So there seems to be a need to get hires profiling
> > uptodate with current squid code base.

> I did notice that but I don't know enough about the code to go digging.

On x86 architecture, timestamp counter may not run at external clock
rate. On different processor versions the meaning of a tick in this
clock may vary. From Intel Architecture * Volume 3b (chap 18.9):
"For Pentium 4 processors, Intel Xeon .... Intel Core Solo and Intel
Core Duo ...: the timestamp counter increments at a constant rate.
That rate may be set by the maximum core-clock to bus-clock ratio of
the processor or may be set by the frequency at which the processor is
booted. The specific processor configuration determines the
behaviour."

The only important issue is that TS counter runs at a contant rate,
but it is somewhat unknown (could be measured anyway).

> That said, the traces look much nicer now. There's definitely something
> weird going on with the nested traces though.
>
> I just don't have the time to go through the profiling code. Its definitely
> nicer to use than gprof but it'd be nice to keep counts of call graphs.

> Thats all I really use gprof for these days.

We could build something like gprof call graph (with some
limitations). Adding this shouln't be *that* difficult, right?

Is there interest in improving the profiling code this way? (i.e.:
somewhat automated probe collection & adding call graph support).

Regards,

-- 
Gonzalo A. Arana
Received on Tue Sep 19 2006 - 07:00:42 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 01 2006 - 12:00:06 MDT