Re: Squid 3.2 performance question from Alexander Komyagin on 2012-03-20 (squid-dev)

From: Alexander Komyagin <komyagin_at_altell.ru>
Date: Tue, 20 Mar 2012 12:37:04 +0400

On Mon, 2012-03-19 at 23:26 -0600, Alex Rousskov wrote:
> On 03/18/2012 11:07 PM, Amos Jeffries wrote:
> > On 13/03/2012 10:14 p.m., Alexander Komyagin wrote:
> >> Hello. We're now trying to give a chance to the new Squid 3.2 on our
> >> server, mainly because of it's SMP feature. But our tests are showing
> >> that 3.2 (3.2.0.14 and 3.2.0.16 were tested) performance is noticeably
> >> lower than 3.1 (3.1.15).
> >>
> >> We're using "httperf --client=0/1 --hog --server x.x.x.x --rate=100
> >> --num-conns=1000 --timeout=5 --num-calls=10" for testing. And for 3.2
> >> it's showing about 140 client timeouts (from 1000), while for 3.1 there
> >> are no errors at all.
> >>
> >> Different workers numbers were checked (1,2,4), but results are still
> >> the same -- completely unchanged -- which is rather _strange_, since as
> >> far as I know (by squid website and source browsing), in our
> >> configuration workers shall NOT share anything but one listening socket
> >> (y.y.y.y:3128).
> >> More than that, CPU use is _only_ about 20% per worker (2 CPU's - 2
> >> workers), vmstat reports no high memory consumption and iostat reports
> >> 0% on iowait.
> >>
> >> Also according to logs, that clients timeouts are caused by some of new
> >> connections not being spotted and accepted as well (not gone through
> >> doAccept() routine from TcpAcceptor.cc).
> >
> > That is sounding very much like a kernel issue, or TCP accept rate
> > limiting issue.
>
> Why would Squid v3.1 results differ from single-worker Squid v3.2
> results then? I assume both v3.1 and v3.2 use the same kernel and the
> same OS configuration (including ulibc).

Actually, I don't know for sure. That's why I'm asking for help ;) I
have even tried running squid 3.2 in non-daemon mode (pure single
thread) - still no luck.

>
>
> > Once a TCP connection is picked up by oldAccept() in the doAccept()
> > sequence the results can be attributed to Squid, but if they never
> > actually arrive there something is wrong at a deeper level down around
> > the TCP stack or sockets libraries.
>
> > So from your results I conclude that one worker grabbed almost all the
> > traffic and responded OK. But there is insufficient data about the
> > interesting part of the traffic. What was going on there? which kid
> > serviced it?

Nope. Both workers are doing their job. Just not very well.

>
> I agree that making one of the workers super fast essentially
> invalidates the test (unless you do the same to v3.1 too, but then you
> just removed or scaled up the problem so it may not be the best test
> direction anyway).
>
>
> My recommendation is to use a single v3.2 worker for now and figure out
> why a single v3.2 worker is dropping or ignoring connections when v3.1
> does not. There could be bugs in the new accept code that we need to
> fix. Use no-daemon mode for both versions.
>
> I would start by trying to understand whether those connection errors
> result from connections never seen by Squid or from connections accepted
> but later ignored/forgotten by Squid. I do not know much about httperf,
> but with just 1000 transactions, that should be relatively easy to
> determine because you can record and match each transaction on both
> sides of the test.
>
>
> HTH,
>
> Alex.

Alex, I have performed some more tests (including oprofile profiling,
no-daemon mode, 1 worker, 2 workers, etc.). For now, it seems that the
problem is highly related to RSBAC Networking which is enabled in our
kernel. When I disabled it, the performance issue _has gone_. According
to RSBAC logs, no single operation is denied.

With RSBAC-Net enabled, 3.2 with 1, 2 workers and in no-daemon mode
produces the problem. However, 3.1 works fine.

Without RSBAC-Net everything is fine.

By comparing oprofile results for 3.2 with and w/o RSBAC-Net, I can
assume that RSBAC-Net subsystem performs some internal operations on
list structures, which are indeed protected by locks - and this, in my
point of view, may block simultaneous squid socket operations and affect
performance.

Also when I enable RSBAC full logging for squid process, 3.2 and 3.1
logs are different in two points:
- 3.2 has some mystical IOCTL operations on TCP sockets, right after
create, while 3.1 hasn't;
- 3.1 produces BIND requests, while 3.2 doesn't.

So far I agree that the problem probably resides in a socket level, but
still wonder what the significant difference between 3.1 socket ops and
3.2?

I'll check squid sources again in hope to find the answers.

-- 
Best wishes,
Alexander Komyagin

Received on Tue Mar 20 2012 - 08:38:44 MDT

This archive was generated by hypermail 2.2.0 : Tue Mar 20 2012 - 12:00:07 MDT