Re: [squid-users] Squid performance profiling

From: Ahmed Talha Khan <auny87_at_gmail.com>
Date: Fri, 21 Jun 2013 15:27:57 +0500

>
>
>> I want to share the results with the community on the squid wikis. How
>> to do that?
>
>
> We are collecting some ad-hoc benchmark details for Squid releases at
> http://wiki.squid-cache.org/KnowledgeBase/Benchmarks. So far this is not
> exactly a rigourous testing, although following the methodology for stats
> collection (as outline in the intro section) retains consistency and
> improves comparability between submissions.
>
> Since you are using a different methodology, please feel free to write up a
> new article on it. The details you just posted looks like a good start. We
> can offer wiki or static web page, or reference from our benchmarking page
> to a blog publication of your own.
>

Yes sure that would be great. I will write a blog post and post here aswell

>> Some results from the tests are:
>>
>> Server response size = 200 Bytes
>> New means keep-alive were turned
>> Keep-alive mean keep-alive were used with 100 http req/conn
>> C= concurrent requests
>>
>>
>> HTTP HTTPS
>> New
>> | Keep-Alive New | Keep-Alive
>>
>> RPS
>> c= 50 6466 | 20227
>> 1336 | 14461
>> c= 100 6392 | 21583
>> 1303 | 14683
>> c = 200 5986 | 21462
>> 1300 | 13967
>>
>> Throughput(mbps)
>> c = 50 26 |
>> 82.4 5.4 | 59
>> c=100 25.8 | 88
>> 5.25 | 60
>> c=200 24 | 88
>> 5.4 | 58
>>
>> Latency(ms)
>> c= 50 7.5 | 2.7
>> 36 | 3.75
>> c= 100 15.8 | 5.27
>> 80 | 8
>> c=200 26.5 | 11.3
>> 168 | 18
>>

The SSL numbers seem pretty low to me on such a powerful machine. Do
you think these can be improved somehow.
For HTTPS i was using 1024 Bytes key size. The ciphers being selected
between ab and squid and between squid and
lighttpd were TLS_RSA_WITH_AES_256_CBC_SHA and
TLS_DHE_RSA_WITH_AES_256_CBC_SHA respectively

>>
>> With this results I profile squid with "perf" tool and got some
>> results that I could not understand. So my question are related to
>> them
>
>
> Thank you. Some very nice numbers. I hope they give a clue to anyone still
> thinking persistent connections need to be disabled to improve performance.
>
>
>> For the HTTS case, the CPU utilization peaks around 90% on all cores
>> and the perf profiler gives:
>>
>> 24.63% squid libc-2.15.so [.] __memset_sse2
>>
>> 6.13% squid libcrypto.so.1.0.0 [.] bn_sqr4x_mont
>>
>> 4.98% squid [kernel.kallsyms] [k] hypercall_page
>>
>> |
>>
>> --- hypercall_page
>>
>> |
>>
>> |--93.73%-- check_events
>>
>>
>> Why is so much time spent in one instruction by squid? and too a
>> memset instruction! Any pointers?
>
>
> Squid was originally written in C and still has a lot of memset() calls
> around the place clearing memory before use. We have made a few attempts to
> track them down and remove unnecessary usages but a lot still remain.
> Another attempt was tried in the more recent code, so you may find a lower
> profile rating in the current 3.HEAD.
>
> Also check whether you have memory_pools on or off. That can affect the
> amount of calls to memset().
>

Memory pools were ON. I did not change the default behavior

>
>> Since in this case all CPU power is being used so it is understandable
>> that the performance cannot be improved here. The problem arises with
>> the HTTP case.
>
>
> On the contrary, code improvements can be done to reduce CPU cycle
> requirements by Squid, which in turn raise the performance. If your
> profiling can highlight things like memset() or Squid functions in the
> current consuming large amounts of CPU effort can be targeted at reducing
> those occurances for best work/performance gains.
>

Yes obviously code improvements can be done. What I mean to say was
that in the current scenario with
the current code base these will stay constant
>
>> For the plain HTTP case, the CPU utilization is only around 50-60% on
>> all the cores and perf says:
>>
>>
>> 8.47% squid [kernel.kallsyms] [k] hypercall_page
>> --- hypercall_page
>> |--94.78%-- check_events
>>
>> 1.78% squid libc-2.15.so [.] vfprintf
>> 1.62% squid [kernel.kallsyms] [k] xen_spin_lock
>> 1.44% squid libc-2.15.so [.] __memcpy_ssse3_back
>>
>>
>> These results show that squid is NOT CPU bound at this point. Neither
>> is it Network IO bound because i can get much more throughput when I
>> only run the generator with the server. In this case squid should be
>> able to do more. Where is the bottleneck coming from?
>
>
> Your guesses would seem to be in the right direction. Your data should
> contain hints where to look closer. memcpy() and memory paging being so high
> are suspicious hint.
>
>

I was looking for some deeper understanding of how and why this could
happen. Why would squid not use all the
CPU resource at its disposal. Could you give me pointers on that..I am
willing to do extra testing and digging if it can help achieve better
performance especially
in the case of HTTPS
>
>> If anyone is interested with very detailed benchmarks, then I can provide
>> them.
>
>
> Yes please :-)

Will do

>
> PS. could you CC the squid-dev mailing list as well with the details. The
> more developer eyes we can get on this data the better. Although please do
> test a current release first, we have significantly changed the ACL handling
> which was one bottleneck in Squid, and have altered the mempools use of
> memset() is several locations in the latest 3.HEAD code.
>

Done

> Amos

--
Regards,
-Ahmed Talha Khan
Received on Fri Jun 21 2013 - 10:28:12 MDT

This archive was generated by hypermail 2.2.0 : Fri Jun 21 2013 - 12:00:36 MDT