Re: Death by a thousand cuts: v3.0 vs v3.2 from Alex Rousskov on 2010-11-17 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 17 Nov 2010 21:01:17 -0700

On 11/17/2010 08:21 PM, Amos Jeffries wrote:
> On Wed, 17 Nov 2010 19:30:13 -0700, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> Hello,
>>
>> During the last couple of weeks, we have spent a lot of time
>> comparing Squid v3.0 and v3.2 performance under various conditions to
>> understand why v3.2 is sometimes 3-10% slower than v3.0. This email
>> shares our findings and suggests actions for addressing the problems.
>>
>>
>> We made several discoveries that will improve v3.2 performance,
>> including one regression bug, but my overall conclusion is that most of
>> the observed slowdown can be attributed to code reorganization and
>> various new features added after v3.0.
>>
>> I am referring to this phenomenon as "death by a thousand cuts" because
>> these changes have negligible overhead in most locations. We had to use
>> custom, low-level profiling to find the main culprits, but most of the
>> effects of the new code in an isolated function or method are
>> indistinguishable from noise. It is their combined effect that matters.
>>
>> Here are a few cases where we were able to measure the performance
>> penalty by rewriting/optimizing the code:
>>
>> no addrinfo in comm_local_port: 2.0%
>> no addrinfo in comm_accept: 0.2%
>> no NtoA in client_db: 0.2% for small number of clients
>> no zeroOnPush for some MEMPROXY_CLASSes: 0.8%
>>
>
> By "no" do you mean removing it was the problem? or removing it was
> better?

The addrinfo/NtoA/zeroOnPush code is there now. Removing or optimizing
it makes things X% better. I will provide specific examples in the email
dedicated to Ip::Address.

>> The percentages above can be interpreted as "Squid became X% faster when
>> the corresponding overheads of the new code were removed". These numbers
>> are provided for illustration only; the exact values and the meaning of
>> "faster" are not important here. What's important is that most of the
>> isolated overheads are far _less_ than the above numbers, but add up to
>> measurable 3-10% performance degradation.
>>
>>
>> Two changes stand out the most in this "death by thousand cuts"
>> category: asynchronous calls and Ip::Address. Both changes are
>> necessary, but they add performance overheads we should be aware of.
>>
>> I am not sure whether asynchronous calls can be significantly optimized.
>> I have one idea that I am going to try, but if it does not work, then we
>> will have to accept the performance price of this important API and
>> optimize to compensate elsewhere. Few things can be worse that going
>> back to spaghetti code!
>>
>> As for Ip::Address, its implementation and use may need to be optimized,
>> but I need your help to understand whether my suspicions are reasonable.
>> I will send a separate email discussing IPv6-related overheads.
>>
>> There are other overheads that we inherited from Squid v3.0 and Squid2.
>> IP::Address and async calls are special because, from users point of
>> view, these overheads did not exist in the Squid version they are
>> running now so they want them gone.
>>
>> It may be tempting to ignore these minor regressions and just fight
>> major expenses such as excessive memory copying or slow parsing. The net
>> result will be "better performance" anyway. However, I believe we have
>> to do both because even if we start with bigger problems and eliminate
>> them, the currently smaller problems will become relatively big. And
>> fixes for ignored problems may become costlier with time.
>>
>>
>> We should also discuss whether it make sense to start doing semi-regular
>> and/or on-demand performance testing using a "standardized"
>> environment(s) and workload(s) so that performance regressions like the
>> ones described above can be detected and dealt with earlier. It would be>
>> sad if we had to go through the same time wasting exercise during v3.3
>> release.
>
> I completely agree we need to fix *all* of the "thousand cuts". I would
> not call it a waste of time what you have done, even if it needs repeating.
> This analysis has been long overdue. Thank you.
>
> My view is that picking a place to stop is where people go wrong with the
> biggest-first approach. There *is* no acceptable half-measure, merely
> infinite incomplete steps toward the ultimate app. And yes they follow a
> decreasing work/benefit growth curve over time just like everything else.
>
> What we have to decide is whether now is the right time to start this
> optimization process? or do we continue to wait until the big/buggy code
> shuffling is complete?

Since we do not have a dedicated team doing reshuffling, I would not
wait for that to complete (in any context). It is impossible to say when
that will be 100% done.

> I'm game for doing the optimization during the shuffle. Have been doing so
> for the obvious ones since 3.1 went stable. As you point out the biggest
> culprits between 3.0 and 3.2 are the new features in 3.1. Do you have info
> on 3.1 for a 3.0->3.1 and 3.1->3.2 stepwise comparison?

We did not focus on v3.1 in this particular study, but I would not be
surprised if v3.1 is a little slower than both v3.0 and v3.2.

Cheers,

Alex.
Received on Thu Nov 18 2010 - 04:01:19 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 18 2010 - 12:00:05 MST