Re: more profiling from Andres Kroonmaa on 2006-09-19 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Tue, 19 Sep 2006 21:10:15 +0300

On 19 Sep 2006 at 14:12, Gonzalo Arana wrote:

> On 9/19/06, Andres Kroonmaa <andre@online.ee> wrote:
> > > On Tue, Sep 19, 2006, Gonzalo Arana wrote:
> > >
> > > > There is a comment in profiling.h claiming that rdtsc (for x86 arch)
> > > > stalls CPU pipes. That's not what Intel documentation says (page 213
> > > > -numbered as 4-209- of the Intel Architecture Software Developer
> > > > Manual, volume 2b, Instruction Reference N-Z).
> >
> > Well, this is somewhat mixed issue. Intel documented
> > usage of rdtsc (at a time when I coded this) with a
> > requirement of a pair of cpuid+rdtsc. Cpuid flushes all
>
> I guess that's because time stamp counter may have different values on
> different CPUs. Am I right?

if you meant MP or multi-core systems, then no, tsc is in
sync on all cpus.

My understanding was that Intel required cpud+rdtsc pair
because it expected rdtsc to be used in single-OP code
section profiling by someone and then blaming Intel when
measured section of code was executed before or after the
time measuring rdtsc's. cpuid was simplest way to
guarantee on-cpu code execution order.

> > superscalar cpus, but I went on with assumption that as
> > long as probe start and probe stop are similar pieces of
> > code, the added time uncertainty is largely cancelling
> > out as we are measuring time *between* two invocations.
>
> Sounds reasonable to me: both the start and the stop could have (on
> average) the same offset error, so they would cancel each other. I
> just wonder if branch prediction does not gives us some bias in this.

At some point we must stop worrying. Afterall, there are
*too many* things that impact precision as you approach
clock tick resolution. Uncertainty of ~50 clock ticks is
damn good by any standards.

> We could have a
> matrix of profile information:
> M[caller][callee]. If you wish to get deeper levels, just add a new
> dimension it to the 'profile matrix': M[grandfather][father][callee].

imo would add too much overhead, so that it won't be
usable in production mode.

> Any way, if we trust & rely on gprof call tree, there is no point in
> doing any of this.
> Just as a note: why don't we trust in gprof profile information but we
> do trus in gprof call graph?

Because gprof call graph is determinate, but profile
information is statistical approximation. For vast
majority of cases its good enough. For eg. in this
case it seems that gprof wasn't that much off afterall
as Adrian found a bug that caused abnormal cleanups.
Sometimes gprof can produce stats that are misleading.

------------------------------------
Andres Kroonmaa
Elion
Received on Tue Sep 19 2006 - 12:08:45 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 01 2006 - 12:00:06 MDT