Some performance measurements and questions

From: Arjan de Vet <Arjan.deVet@dont-contact.us>
Date: Mon, 17 Mar 1997 18:44:32 +0100 (MET)

Hi,

I've been doing some performance measurements because I still have a very
annoying problem of Squid becoming slow, even for cache hits. I've made
some graphs which you can see at:

        http://www.iaehv.nl/users/devet/squid/perf/

and they will be used in the rest of the story. They have been made from
last Friday's access.log file.

Let me first give an explanation of the three graphs:

1). The number of parallel connections at a given moment.

2). The number of connections per hour (blue), number of megabytes per
    hour (red) and average nr. of kilobytes per hour (green, use left
    y-axis for the numbers).

3). The minimum, average and median response times for TCP_HIT type cache
    hits only. Notice the enormous delays (up til 20 seconds) at some
    moments.

As you can see there is a strong relation between the number of parallel
connections and the response times for cache hits. My question is: what
exactly is causing this? Some inefficiency in Squid or the kernel, a too
slow machine or something else?

Possible causes for the large number of parallel connections:

A) The reason of the growing number of parallel connections in the
afternoon can possibly be explained by the Internet becoming slow because
the USA starts to wake up (this site is in Europe). The average connection
will take longer so the number of parallel connections will grow (the
number of connections per hour doesn't change that much as you can see in
the second graph).

B) Another explanation can be that at a certain moment there isn't enough
CPU power anymore to handle all filedescriptors at a satisfying speed
(disks are not that busy so it's unlikely that they are the cause).

The bad thing is that both A) and B) will influence and worsen each other
of course :-(.

With respect to B), here is a part of gprof output corresponding to the
Squid run between 14:00h and 16:30h.

                0.06 3603.66 1/1 _start [1]
[2] 73.5 0.06 3603.66 1 main [2]
               12.45 3404.92 8560/8560 comm_select [3]
                0.08 174.90 8560/8560 eventRun [40]
                0.00 7.13 1/1 normal_shutdown [148]
                0.00 1.75 1/1 mainInitialize [245]
                1.44 0.00 1022/241043 _close [15]

As you see, the main loop where comm_select and eventRun are called was
only executed 8560 times during 2.5 hours (that's approx. once per second
:-(). The result of this is that even when you ask for something which is
in the cache you get one buffer (4K? 8K?) per second and for some reason (I
do not yet know) it takes at least 5 seconds before you get the first data
back (see graph 3 where the *minimum* response time becomes 4-6 seconds).
I verified this by just doing requests by hand via telnet.

Let's examine comm_select:

               12.45 3404.92 8560/8560 main [2]
[3] 69.7 12.45 3404.92 8560 comm_select [3]
                3.68 742.48 227434/227434 httpReadReply [6]
                1.33 681.56 104324/104324 clientReadRequest [7]
                2.54 603.64 351663/351663 commHandleWrite [10]
                6.00 456.42 1023172/1023172 comm_select_incoming [13]
                0.92 321.05 83959/83959 diskHandleRead [17]
                2.06 250.68 97214/97214 diskHandleWrite [30]
                0.64 86.33 64162/64162 httpSendRequest [62]
                1.47 83.12 65816/931195 comm_nbconnect <cycle 1> [121]
[...]
                0.09 1.55 8559/1030642 _select [34]

Notice that comm_select_incoming has been called many times more than
comm_select itself. This ratio worsens as more parallel connections are
active. Question: do we need to call comm_select_incoming for each fd
handled?

In the profile output I also get two cycles:

[4] 24.4 20.77 1175.99 931195+2249458 <cycle 1 as a whole> [4]
                1.43 254.34 173765 comm_close <cycle 1> [29]
                0.29 163.82 60394 storeComplete <cycle 1> [41]
                0.54 148.40 63216 icpProcessMISS <cycle 1> [42]
                0.00 134.06 199 info_get <cycle 1> [48]
                0.60 123.60 62236 proxyhttpStart <cycle 1> [51]
[...]

and

[5] 20.1 208.17 776.23 36686105+60984751 <cycle 2 as a whole> [5]
                0.00 620.69 95 _vsyslog <cycle 2> [8]
              143.11 48.68 11525175 _doprnt <cycle 2> [35]
                0.89 49.52 247677 localtime_u <cycle 2> [81]
                2.00 44.67 219818 mktime <cycle 2> [82]
               42.35 0.00 56639306 .div <cycle 2> [83]
               11.27 2.28 26954389 _db_print <cycle 2> [120]
                2.64 7.94 335312 offtime_u <cycle 2> [134]

Both consume a large amount of the time but I'm rather unfamiliar with
this particular kind of gprof output.

I also ran a profile for a Squid run during the evening and then you see
this:

                1.49 5113.65 1/1 _start [1]
[2] 83.9 1.49 5113.65 1 main [2]
              101.76 4388.04 1169557/1169557 comm_select [3]
                1.58 597.83 1169557/1169557 eventRun [17]
                0.00 5.33 1/1 normal_shutdown [181]
                0.27 4.29 1/2 storeWriteCleanLog [156]
                0.00 3.18 1/1 mainInitialize [219]

and for comm_select:

              101.76 4388.04 1169557/1169557 main [2]
[3] 73.7 101.76 4388.04 1169557 comm_select [3]
                6.71 1275.66 786639/786639 httpReadReply [5]
                3.41 1039.10 716391/716391 commHandleWrite [6]
                9.63 577.47 2099041/2099041 comm_select_incoming [19]
                1.14 466.01 101170/101170 clientReadRequest [24]
               17.44 226.53 1169572/3268582 _select [8]

Here the comm_select and comm_select_incoming ratio is more 'normal' I
think. There's only 1 cycle this time (cycle 1 from the other profile).

Did anybody else ever experience and/or study this kind of behavior
before? What were your conclusions?

If you need more information, don't hesitate to ask.

Arjan
Received on Tue Jul 29 2003 - 13:15:40 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:15 MST