Re: Death by a thousand cuts: v3.0 vs v3.2 from Amos Jeffries on 2010-11-17 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 18 Nov 2010 03:21:02 +0000

On Wed, 17 Nov 2010 19:30:13 -0700, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> Hello,
>
> During the last couple of weeks, we have spent a lot of time
> comparing Squid v3.0 and v3.2 performance under various conditions to
> understand why v3.2 is sometimes 3-10% slower than v3.0. This email
> shares our findings and suggests actions for addressing the problems.
>
>
> We made several discoveries that will improve v3.2 performance,
> including one regression bug, but my overall conclusion is that most of
> the observed slowdown can be attributed to code reorganization and
> various new features added after v3.0.
>
> I am referring to this phenomenon as "death by a thousand cuts" because
> these changes have negligible overhead in most locations. We had to use
> custom, low-level profiling to find the main culprits, but most of the
> effects of the new code in an isolated function or method are
> indistinguishable from noise. It is their combined effect that matters.
>
> Here are a few cases where we were able to measure the performance
> penalty by rewriting/optimizing the code:
>
> no addrinfo in comm_local_port: 2.0%
> no addrinfo in comm_accept: 0.2%
> no NtoA in client_db: 0.2% for small number of clients
> no zeroOnPush for some MEMPROXY_CLASSes: 0.8%
>

By "no" do you mean removing it was the problem? or removing it was
better?

> The percentages above can be interpreted as "Squid became X% faster when

> the corresponding overheads of the new code were removed". These numbers

> are provided for illustration only; the exact values and the meaning of
> "faster" are not important here. What's important is that most of the
> isolated overheads are far _less_ than the above numbers, but add up to
> measurable 3-10% performance degradation.
>
>
> Two changes stand out the most in this "death by thousand cuts"
> category: asynchronous calls and Ip::Address. Both changes are
> necessary, but they add performance overheads we should be aware of.
>
> I am not sure whether asynchronous calls can be significantly optimized.

> I have one idea that I am going to try, but if it does not work, then we

> will have to accept the performance price of this important API and
> optimize to compensate elsewhere. Few things can be worse that going
> back to spaghetti code!
>
> As for Ip::Address, its implementation and use may need to be optimized,

> but I need your help to understand whether my suspicions are reasonable.

> I will send a separate email discussing IPv6-related overheads.
>
> There are other overheads that we inherited from Squid v3.0 and Squid2.
> IP::Address and async calls are special because, from users point of
> view, these overheads did not exist in the Squid version they are
> running now so they want them gone.
>
> It may be tempting to ignore these minor regressions and just fight
> major expenses such as excessive memory copying or slow parsing. The net

> result will be "better performance" anyway. However, I believe we have
> to do both because even if we start with bigger problems and eliminate
> them, the currently smaller problems will become relatively big. And
> fixes for ignored problems may become costlier with time.
>
>
> We should also discuss whether it make sense to start doing semi-regular

> and/or on-demand performance testing using a "standardized"
> environment(s) and workload(s) so that performance regressions like the
> ones described above can be detected and dealt with earlier. It would be

> sad if we had to go through the same time wasting exercise during v3.3
> release.

I completely agree we need to fix *all* of the "thousand cuts". I would
not call it a waste of time what you have done, even if it needs repeating.
This analysis has been long overdue. Thank you.

My view is that picking a place to stop is where people go wrong with the
biggest-first approach. There *is* no acceptable half-measure, merely
infinite incomplete steps toward the ultimate app. And yes they follow a
decreasing work/benefit growth curve over time just like everything else.

What we have to decide is whether now is the right time to start this
optimization process? or do we continue to wait until the big/buggy code
shuffling is complete?
I'm game for doing the optimization during the shuffle. Have been doing so
for the obvious ones since 3.1 went stable. As you point out the biggest
culprits between 3.0 and 3.2 are the new features in 3.1. Do you have info
on 3.1 for a 3.0->3.1 and 3.1->3.2 stepwise comparison?

Amos
Received on Thu Nov 18 2010 - 03:21:13 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 18 2010 - 12:00:05 MST