[squid-users] Squid performance profiling from Ahmed Talha Khan on 2013-06-20 (squid-users)

From: Ahmed Talha Khan <auny87_at_gmail.com>
Date: Thu, 20 Jun 2013 13:00:03 +0500

Hello All,

I have been trying to benchmark the performance of squid for sometime
now for plain HTTP and HTTPS traffic.

The key performance indicators that i am looking at are Requests Per
Second(RPS), Throughput(mbps) and Latency (ms).

My test methodology looks like this

generator(apache benchmark)<------->squid<------>server(lighttpd)

All 3 are running on seperate VM on AWS.
The specs for all the machines are
8 VCPU @ 2.13 GHZ
16 GB RAM
Squid using 8 SMP workers to utilize all cores

In all these tests I have made sure that the generator and server are
always more powerful than squid. For latency calculation, Time per
request is calculated with and without squid inline and the difference
between them is taken.

I am using a release 3.HEAD just prior to the release of 3.3.

I want to share the results with the community on the squid wikis. How
to do that?

Some results from the tests are:

Server response size = 200 Bytes
New means keep-alive were turned
Keep-alive mean keep-alive were used with 100 http req/conn
C= concurrent requests

HTTP HTTPS
New
| Keep-Alive New | Keep-Alive

RPS
                                      c= 50 6466 | 20227
                          1336 | 14461
                                      c= 100 6392 | 21583
                         1303 | 14683
                                      c = 200 5986 | 21462
                          1300 | 13967

Throughput(mbps)
                                      c = 50 26 |
82.4 5.4 | 59
                                      c=100 25.8 | 88
                                  5.25 | 60
                                       c=200 24 | 88
                                    5.4 | 58

Latency(ms)
                                       c= 50 7.5 | 2.7
                                   36 | 3.75
                                       c= 100 15.8 | 5.27
                               80 | 8
                                      c=200 26.5 | 11.3
                               168 | 18

With this results I profile squid with "perf" tool and got some
results that I could not understand. So my question are related to
them

For the HTTS case, the CPU utilization peaks around 90% on all cores
and the perf profiler gives:

24.63% squid libc-2.15.so [.] __memset_sse2

6.13% squid libcrypto.so.1.0.0 [.] bn_sqr4x_mont

4.98% squid [kernel.kallsyms] [k] hypercall_page

--- hypercall_page

|--93.73%-- check_events

Why is so much time spent in one instruction by squid? and too a
memset instruction! Any pointers?

Since in this case all CPU power is being used so it is understandable
that the performance cannot be improved here. The problem arises with
the HTTP case.

For the plain HTTP case, the CPU utilization is only around 50-60% on
all the cores and perf says:

8.47% squid [kernel.kallsyms] [k] hypercall_page
--- hypercall_page
|--94.78%-- check_events

1.78% squid libc-2.15.so [.] vfprintf
1.62% squid [kernel.kallsyms] [k] xen_spin_lock
1.44% squid libc-2.15.so [.] __memcpy_ssse3_back

These results show that squid is NOT CPU bound at this point. Neither
is it Network IO bound because i can get much more throughput when I
only run the generator with the server. In this case squid should be
able to do more. Where is the bottleneck coming from?

If anyone is interested with very detailed benchmarks, then I can provide them.

--
Regards,
-Ahmed Talha Khan

Received on Thu Jun 20 2013 - 08:00:10 MDT

This archive was generated by hypermail 2.2.0 : Fri Jun 21 2013 - 12:00:36 MDT