Death by a thousand cuts: v3.0 vs v3.2

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 17 Nov 2010 19:30:13 -0700

Hello,

     During the last couple of weeks, we have spent a lot of time
comparing Squid v3.0 and v3.2 performance under various conditions to
understand why v3.2 is sometimes 3-10% slower than v3.0. This email
shares our findings and suggests actions for addressing the problems.

We made several discoveries that will improve v3.2 performance,
including one regression bug, but my overall conclusion is that most of
the observed slowdown can be attributed to code reorganization and
various new features added after v3.0.

I am referring to this phenomenon as "death by a thousand cuts" because
these changes have negligible overhead in most locations. We had to use
custom, low-level profiling to find the main culprits, but most of the
effects of the new code in an isolated function or method are
indistinguishable from noise. It is their combined effect that matters.

Here are a few cases where we were able to measure the performance
penalty by rewriting/optimizing the code:

    no addrinfo in comm_local_port: 2.0%
    no addrinfo in comm_accept: 0.2%
    no NtoA in client_db: 0.2% for small number of clients
    no zeroOnPush for some MEMPROXY_CLASSes: 0.8%

The percentages above can be interpreted as "Squid became X% faster when
the corresponding overheads of the new code were removed". These numbers
are provided for illustration only; the exact values and the meaning of
"faster" are not important here. What's important is that most of the
isolated overheads are far _less_ than the above numbers, but add up to
measurable 3-10% performance degradation.

Two changes stand out the most in this "death by thousand cuts"
category: asynchronous calls and Ip::Address. Both changes are
necessary, but they add performance overheads we should be aware of.

I am not sure whether asynchronous calls can be significantly optimized.
I have one idea that I am going to try, but if it does not work, then we
will have to accept the performance price of this important API and
optimize to compensate elsewhere. Few things can be worse that going
back to spaghetti code!

As for Ip::Address, its implementation and use may need to be optimized,
but I need your help to understand whether my suspicions are reasonable.
I will send a separate email discussing IPv6-related overheads.

There are other overheads that we inherited from Squid v3.0 and Squid2.
IP::Address and async calls are special because, from users point of
view, these overheads did not exist in the Squid version they are
running now so they want them gone.

It may be tempting to ignore these minor regressions and just fight
major expenses such as excessive memory copying or slow parsing. The net
result will be "better performance" anyway. However, I believe we have
to do both because even if we start with bigger problems and eliminate
them, the currently smaller problems will become relatively big. And
fixes for ignored problems may become costlier with time.

We should also discuss whether it make sense to start doing semi-regular
and/or on-demand performance testing using a "standardized"
environment(s) and workload(s) so that performance regressions like the
ones described above can be detected and dealt with earlier. It would be
sad if we had to go through the same time wasting exercise during v3.3
release.

Thank you,

Alex.
Received on Thu Nov 18 2010 - 02:30:16 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 18 2010 - 12:00:05 MST