Re: Squid 3.2 performance question from Alex Rousskov on 2012-03-19 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 19 Mar 2012 23:26:15 -0600

On 03/18/2012 11:07 PM, Amos Jeffries wrote:
> On 13/03/2012 10:14 p.m., Alexander Komyagin wrote:
>> Hello. We're now trying to give a chance to the new Squid 3.2 on our
>> server, mainly because of it's SMP feature. But our tests are showing
>> that 3.2 (3.2.0.14 and 3.2.0.16 were tested) performance is noticeably
>> lower than 3.1 (3.1.15).
>>
>> We're using "httperf --client=0/1 --hog --server x.x.x.x --rate=100
>> --num-conns=1000 --timeout=5 --num-calls=10" for testing. And for 3.2
>> it's showing about 140 client timeouts (from 1000), while for 3.1 there
>> are no errors at all.
>>
>> Different workers numbers were checked (1,2,4), but results are still
>> the same -- completely unchanged -- which is rather _strange_, since as
>> far as I know (by squid website and source browsing), in our
>> configuration workers shall NOT share anything but one listening socket
>> (y.y.y.y:3128).
>> More than that, CPU use is _only_ about 20% per worker (2 CPU's - 2
>> workers), vmstat reports no high memory consumption and iostat reports
>> 0% on iowait.
>>
>> Also according to logs, that clients timeouts are caused by some of new
>> connections not being spotted and accepted as well (not gone through
>> doAccept() routine from TcpAcceptor.cc).
>
> That is sounding very much like a kernel issue, or TCP accept rate
> limiting issue.

Why would Squid v3.1 results differ from single-worker Squid v3.2
results then? I assume both v3.1 and v3.2 use the same kernel and the
same OS configuration (including ulibc).

> Once a TCP connection is picked up by oldAccept() in the doAccept()
> sequence the results can be attributed to Squid, but if they never
> actually arrive there something is wrong at a deeper level down around
> the TCP stack or sockets libraries.

> So from your results I conclude that one worker grabbed almost all the
> traffic and responded OK. But there is insufficient data about the
> interesting part of the traffic. What was going on there? which kid
> serviced it?

I agree that making one of the workers super fast essentially
invalidates the test (unless you do the same to v3.1 too, but then you
just removed or scaled up the problem so it may not be the best test
direction anyway).

My recommendation is to use a single v3.2 worker for now and figure out
why a single v3.2 worker is dropping or ignoring connections when v3.1
does not. There could be bugs in the new accept code that we need to
fix. Use no-daemon mode for both versions.

I would start by trying to understand whether those connection errors
result from connections never seen by Squid or from connections accepted
but later ignored/forgotten by Squid. I do not know much about httperf,
but with just 1000 transactions, that should be relatively easy to
determine because you can record and match each transaction on both
sides of the test.

HTH,

Alex.
Received on Tue Mar 20 2012 - 05:26:31 MDT

This archive was generated by hypermail 2.2.0 : Tue Mar 20 2012 - 12:00:07 MDT