Re: [squid-users] squid fails with a TCP_SWAPFAIL_MISS when handling 'n' concurrent requests for the same object

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 12 Sep 2012 11:18:04 +1200

On 12.09.2012 10:54, Saurabh Sheth wrote:
> Squid (versions: 3.1 and 2.6) has a object in its cache and responds
> to individual requests to this object just fine (TCP_HIT:NONE). From
> the access.log ->
>
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 41136 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 24752 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 28848 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 41136 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 24752 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 45232 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 28848 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 49328 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 49328 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 32944 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 37040 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 37040 TCP_HIT:NONE
>
>
> However, when I make a huge number of concurrent requests for the
> same object, squid fails to load the object from the disk fast enough
> and gives a TCP_SWAPFAIL_MISS ->
>
>
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 53424 TCP_HIT:NONE
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 37031
> TCP_SWAPFAIL_MISS:DIRECT
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 28839 TCP_MISS:NONE
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 32935 TCP_MISS:NONE
>
>
>
> All subsequent requests hit the origin server directly causing huge
> load on the origin server (TCP_MISS:NONE) ->
>
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 32935 TCP_MISS:NONE
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 28839 TCP_MISS:NONE
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 37031 TCP_MISS:NONE
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 37031 TCP_MISS:NONE
> 10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
> http://originserver/data/object HTTP/1.1" 200 32935 TCP_MISS:NONE
>
> This is undesirable in the production setup, since such huge number
> of requests hitting the origin server directly have the result of a
> DOS attack on the origin server. This has brought down our origin
> server more than once now.

Well, when you think about it this is a DOS on Squid as well. The
backend server is only facing the overflow which squid can't erase fast
enough. So any attacker trying this has to pass *two* DOS thresholds,
first the squid one then the backend on top. There is always another
idiot infected PC, so DOS resolution is not about *solving* the traffic
problem, but raising the bar and reducing the impact/damage when it
happens.

>
> I am looking for any help or pointers on how can I deal with such a
> huge number of concurrent requests to squid for the same object
> effectively, any help is highly appreciated. I am already considering
> the option of rate limiting using iptables, however if there is a
> effective way to deal with this in the squid configuration itself; I
> would love to understand.

You were a bit vague about which specific release versions of Squid you
have. 2.6 should have had collapsed forwarding feature which acts as a
great DOS barrier. It has not been ported to squid-3 yet, but
efficiencies have been improved in the cache handling so you could try
the latest 2.7 or 3.2 releases and see if this raises the bar high
enough for you.

Amos
Received on Tue Sep 11 2012 - 23:18:06 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 12 2012 - 12:00:03 MDT