Re: Squid 3.2 performance question

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 21 Mar 2012 01:24:49 +1300

On 20/03/2012 9:37 p.m., Alexander Komyagin wrote:
> On Mon, 2012-03-19 at 23:26 -0600, Alex Rousskov wrote:
>> On 03/18/2012 11:07 PM, Amos Jeffries wrote:
>>> On 13/03/2012 10:14 p.m., Alexander Komyagin wrote:
>>>> Hello. We're now trying to give a chance to the new Squid 3.2 on our
>>>> server, mainly because of it's SMP feature. But our tests are showing
>>>> that 3.2 (3.2.0.14 and 3.2.0.16 were tested) performance is noticeably
>>>> lower than 3.1 (3.1.15).
>>>>
>>>> We're using "httperf --client=0/1 --hog --server x.x.x.x --rate=100
>>>> --num-conns=1000 --timeout=5 --num-calls=10" for testing. And for 3.2
>>>> it's showing about 140 client timeouts (from 1000), while for 3.1 there
>>>> are no errors at all.
>>>>
>>>> Different workers numbers were checked (1,2,4), but results are still
>>>> the same -- completely unchanged -- which is rather _strange_, since as
>>>> far as I know (by squid website and source browsing), in our
>>>> configuration workers shall NOT share anything but one listening socket
>>>> (y.y.y.y:3128).
>>>> More than that, CPU use is _only_ about 20% per worker (2 CPU's - 2
>>>> workers), vmstat reports no high memory consumption and iostat reports
>>>> 0% on iowait.
>>>>
>>>> Also according to logs, that clients timeouts are caused by some of new
>>>> connections not being spotted and accepted as well (not gone through
>>>> doAccept() routine from TcpAcceptor.cc).
>>> That is sounding very much like a kernel issue, or TCP accept rate
>>> limiting issue.
>> Why would Squid v3.1 results differ from single-worker Squid v3.2
>> results then? I assume both v3.1 and v3.2 use the same kernel and the
>> same OS configuration (including ulibc).
> Actually, I don't know for sure. That's why I'm asking for help ;) I
> have even tried running squid 3.2 in non-daemon mode (pure single
> thread) - still no luck.
>
>>
>>> Once a TCP connection is picked up by oldAccept() in the doAccept()
>>> sequence the results can be attributed to Squid, but if they never
>>> actually arrive there something is wrong at a deeper level down around
>>> the TCP stack or sockets libraries.
>>> So from your results I conclude that one worker grabbed almost all the
>>> traffic and responded OK. But there is insufficient data about the
>>> interesting part of the traffic. What was going on there? which kid
>>> serviced it?
> Nope. Both workers are doing their job. Just not very well.
>
>> I agree that making one of the workers super fast essentially
>> invalidates the test (unless you do the same to v3.1 too, but then you
>> just removed or scaled up the problem so it may not be the best test
>> direction anyway).
>>
>>
>> My recommendation is to use a single v3.2 worker for now and figure out
>> why a single v3.2 worker is dropping or ignoring connections when v3.1
>> does not. There could be bugs in the new accept code that we need to
>> fix. Use no-daemon mode for both versions.
>>
>> I would start by trying to understand whether those connection errors
>> result from connections never seen by Squid or from connections accepted
>> but later ignored/forgotten by Squid. I do not know much about httperf,
>> but with just 1000 transactions, that should be relatively easy to
>> determine because you can record and match each transaction on both
>> sides of the test.
>>
>>
>> HTH,
>>
>> Alex.
> Alex, I have performed some more tests (including oprofile profiling,
> no-daemon mode, 1 worker, 2 workers, etc.). For now, it seems that the
> problem is highly related to RSBAC Networking which is enabled in our
> kernel. When I disabled it, the performance issue _has gone_. According
> to RSBAC logs, no single operation is denied.
>
> With RSBAC-Net enabled, 3.2 with 1, 2 workers and in no-daemon mode
> produces the problem. However, 3.1 works fine.
>
> Without RSBAC-Net everything is fine.
>
> By comparing oprofile results for 3.2 with and w/o RSBAC-Net, I can
> assume that RSBAC-Net subsystem performs some internal operations on
> list structures, which are indeed protected by locks - and this, in my
> point of view, may block simultaneous squid socket operations and affect
> performance.
>
> Also when I enable RSBAC full logging for squid process, 3.2 and 3.1
> logs are different in two points:
> - 3.2 has some mystical IOCTL operations on TCP sockets, right after
> create, while 3.1 hasn't;
> - 3.1 produces BIND requests, while 3.2 doesn't.
>
> So far I agree that the problem probably resides in a socket level, but
> still wonder what the significant difference between 3.1 socket ops and
> 3.2?

The unnecessary use of bind() was removed on outgoing connections by
request of several OS security teams. There were some vulnerabilities
when bind() was called with an unset IP.

The ioctl() would be the NAT lookups?
  that also occurs in 3.1, but only after receiving and parsing a
request, for every request. 3.2 moves it up to a single lookup after
connection establishment and removes the useless duplicate lookups
between pipelined requests. Reducing vulnerability to NAT table expiry
and wrong log details on early connection closures.

Or were the ioctl() packet TOS / MARK processing? That has had a bit of
a redesign in 3.2 for better QoS management.

In the architectural changes:
  - the pending queue of deferred accept() operations was changed from
LIFO to FIFO to try and serve multiple listening ports more equally
under load.
  - the acceptor callback was altered from synchronous callback to a
scheduled async call. Adding a small async I/O processing delay between
accept and first-read.
  - SMP workers compete for accept() on shared sockets

Those should be the only differences during regular operation.

Amos
Received on Tue Mar 20 2012 - 12:24:56 MDT

This archive was generated by hypermail 2.2.0 : Tue Mar 20 2012 - 12:00:07 MDT