Re: [squid-users] Strange performance effects on squid during off peak hours

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 16 Sep 2010 01:41:11 +1200

On 16/09/10 01:01, Martin Sperl wrote:
> Hi everyone,
>
> we are seeing a strange response-time effect over 24 hours when delivering content via Squid+icap service (3.0.STABLE9 - I know old, but getting something changed in a production environment can be VERY hard...). Icap server we use is rewriting some URLs and also rewriting some of the content response.
>
> Essentially we see that during peak hours the Average response time is better than during off-peak hours.
> Here a report for one day for all CSS files that are delivered with CacheStatus TCP_MEM_HIT (as taken from the extended access-logs of squid) for a single server (all servers show similar effects):
>
> Here the quick overview:
> +------+------+-------+
> | hour | hits | ART |
> +------+------+-------+
> | 0 | 4232 | 0.016 |
> | 1 | 4553 | 0.015 |
> | 2 | 4238 | 0.015 |
> | 3 | 4026 | 0.018 |
> | 4 | 1270 | 0.024 |
> | 5 | 390 | 0.042 |
> | 6 | 61 | 0.054 |
> | 7 | 591 | 0.034 |
> | 8 | 445 | 0.038 |
> | 9 | 505 | 0.035 |
> | 10 | 716 | 0.034 |
> | 11 | 1307 | 0.030 |
> | 12 | 2552 | 0.023 |
> | 13 | 3197 | 0.021 |
> | 14 | 3567 | 0.020 |
> | 15 | 4095 | 0.019 |
> | 16 | 4037 | 0.019 |
> | 17 | 4670 | 0.017 |
> | 18 | 5349 | 0.016 |
> | 19 | 5638 | 0.017 |
> | 20 | 6262 | 0.014 |
> | 21 | 5634 | 0.014 |
> | 22 | 4809 | 0.016 |
> | 23 | 5393 | 0.016 |
> +------+------+-------+
> <snip>
> You can see that for off-peak hours (6am UTC 91% of all request with TCP_MEM_HIT for css files are>0.030 seconds).
> As for "peak" hours most requests are responded at 0.011s and 0.001s (@18:00 with 5.5% of all requests).
>
> I know, that the numbers reported by squid also include some "effects" of the network itself.
> But we also see similar effects on active monitoring of html+image downloads within our Span of control (this is one of our KPIs, which we are exceeding during graveyard-shift hours...).
>
> We have tried a lot of things:
> * virtualized versus real HW (0.002s improvement during peak hours)
> * removing diskcache (uses the default settings compiled into squid when no diskcache is defined - at least the version of squid that we have)
> * moving diskcache to ramdisk and increasing it (this has a negative effect!!!) - I wanted to change to aufs, but the binary we have does not support it..
> * tuning some linux kernel parameters for increasing TCP buffers
>
> Has someone experienced similar behavior and has got any recommendations what else we can do/test (besides upgrading to squid 3.1, which is a major effort from the testing perspective and which may not resolve the issue either)?
>

Squid is still largely IO event driven. If the network IO is less than
say 3-4 req/sec Squid can have a queue of things waiting to happen which
get delayed a long time (hundreds of ms) waiting to be kicked off.
  Your overview seems to show that behaviour clearly.

There have been some small improvements and fixes to several of the
lagging things but I think its still there in even the latest Squid.

With the knowledge that it only happens under very low loads and
self-corrects as soon as traffic picks up; is it still a problem? if so
you may want to contact The Measurement Factory and see if they have
anything to help for 3.0.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.8
   Beta testers wanted for 3.2.0.2
Received on Wed Sep 15 2010 - 13:41:15 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 16 2010 - 12:00:03 MDT