Re: [squid-users] Squid performance profiling

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 22 Jun 2013 03:10:40 +1200

On 21/06/2013 10:34 p.m., Ahmed Talha Khan wrote:
> On Fri, Jun 21, 2013 at 10:41 AM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 06/20/2013 10:47 PM, Ahmed Talha Khan wrote:
>>> On Fri, Jun 21, 2013 at 6:17 AM, Alex Rousskov wrote:
>>>> On 06/20/2013 02:00 AM, Ahmed Talha Khan wrote:
>>>>> My test methodology looks like this
>>>>>
>>>>> generator(apache benchmark)<------->squid<------>server(lighttpd)
>>>> ...
>>>>> These results show that squid is NOT CPU bound at this point. Neither
>>>>> is it Network IO bound because i can get much more throughput when I
>>>>> only run the generator with the server. In this case squid should be
>>>>> able to do more. Where is the bottleneck coming from?
>>
>>>> The "bottleneck" may be coming from your test methodology -- you are
>>>> allowing Squid to slow down the benchmark instead of benchmark driving
>>>> the Squid box to its limits. You appear to be using what we call a "best
>>>> effort" test, where the request rate is determined by Squid response
>>>> time. In most real-world environments concerned with performance, the
>>>> request rate does not decrease just because a proxy wants to slow down a
>>>> little.
>>
>>> Then the question becomes why squid is slowing down?
>> I think there are 2.5 primary reasons for that:
>>
>> 1) Higher concurrency level ("c" in your tables) means more
>> waiting/queuing time for each transaction: When [a part of] one
>> transaction has to wait for [a part of] another before being served,
>> transaction response time goes up. For example, the more network sockets
>> are "ready" at the same time, the higher the response time is going to
>> be for the transaction which socket happens to be the last one ready
>> during that specific I/O loop iteration.
>>
> Are these queues maintained internally inside squid? What can be done
> to reduce this?

The queue is created in a single step by the kernel. It responds with a
set of FD with I/O events to be handled. Squid is then expected to
iterate over them and do the I/O.
Like Alex said there is nothing that can be done about that queue
itself. Looping over it fast and scheduling multiple internal Calls at
once is tempting but just offloads the delay from the
select/poll/epoll/kqueue loop to the AsyncCall queue, the visible/total
delay remains constant (or possibly worse if they are double queued).

>
>> 2a) Squid sometimes uses hard-coded limits for various internal caches
>> and tables. With higher concurrency level, Squid starts hitting those
>> limits and operating less efficiently (e.g., not keeping a connection
>> persistent because the persistent connection table is full -- I do not
>> remember whether this actually happens, so this is just an example of
>> what could happen to illustrate 2a).
> Can you point me to some of the key ones and their impact? So that I
> can test by changing
> these limits and seeing if it enhances/degrades the performance. Also,
> any tweaks in
> the network stack that might help with that. I am primarily interested
> in enhancing the SSL performance.

Much of the lag in SSL is due to the handshake exchanges it requires.
There are a small amount of bytes in each direction wasting entire
packet round-trip times just to set it up, followed by the processing
overheads of actually crypting the bits.

The certificate generation process is a well-known slow process, there
is nothing that can be done there as it relies heavily on the random
number generator in the machine. SSL-bump with certificate generation
uses caching to avoid that to some extent - it would be worthwhile
testing how often (if at all) your benchmarks are held up waiting for
new certs to be created.

>
>> 2b) Poor concurrency scale. Some Squid code becomes slower with more
>> concurrent transactions flying around because that code has to iterate
>> more structures while dealing with more collisions and such.
>>
> Well all that can be done on this front is that I have to wait for
> the changes to go in.
>
>> There is nothing we can do about #1, but we can improve #2a and #2b
>> (they are kind of related).
>>
>>
>>> best effort tests also
>>> give a good measure of what the proxy(server) can do without breaking
>>> it.
>> Yes, but, in my experience, the vast majority of best-effort results are
>> misinterpreted: It is very difficult to use a best-effort test
>> correctly, and it is very easy to come to the wrong conclusions by
>> looking at its results. YMMV.
>>
> Do you see any wrong conclusion that I might have made in
> interpreting these results?
>
>> BTW, a "persistent load" test does not have to break the proxy. You only
>> need to break the proxy if you want to find where its breaking point
>> (and, hence, the bottleneck) is with respect to load (or other traffic
>> parameters).
>>
>>
> Sure
>
>>> Do you see any reason from the perf results/benchmarks
>>> why squid would not be utilizing all CPU and giving out more requests
>>> per second?
>> In our tests, Squid does utilize virtually all CPU cycles (if we push it
>> hard enough). It is just a matter of creating enough/appropriate load.
>>
> Why would it not do in my test setup? I does use all CPU cores to the
> fullest in the case of HTTPS, but not in the case
> of HTTP as i pointed out earlier

You are not caching for starters. So Squid will service all requests
with an I/O overhead of contacting the backend server.

Amos
Received on Fri Jun 21 2013 - 15:10:57 MDT

This archive was generated by hypermail 2.2.0 : Fri Jun 21 2013 - 12:00:36 MDT