[squid-users] squid fails with a TCP_SWAPFAIL_MISS when handling 'n' concurrent requests for the same object

From: Saurabh Sheth <saurabh.sheth_at_arrisi.com>
Date: Tue, 11 Sep 2012 15:54:21 -0700

Squid (versions: 3.1 and 2.6) has a object in its cache and responds to
individual requests to this object just fine (TCP_HIT:NONE). From the
access.log ->

10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 41136 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 24752 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 28848 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 41136 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 24752 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 45232 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 28848 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 49328 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 49328 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 32944 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 37040 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:41:55 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 37040 TCP_HIT:NONE

However, when I make a huge number of concurrent requests for the same
object, squid fails to load the object from the disk fast enough and
gives a TCP_SWAPFAIL_MISS ->

10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 53424 TCP_HIT:NONE
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 37031 TCP_SWAPFAIL_MISS:DIRECT
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 28839 TCP_MISS:NONE
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 32935 TCP_MISS:NONE

All subsequent requests hit the origin server directly causing huge load
on the origin server (TCP_MISS:NONE) ->

10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 32935 TCP_MISS:NONE
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 28839 TCP_MISS:NONE
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 37031 TCP_MISS:NONE
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 37031 TCP_MISS:NONE
10.192.x.x - - [11/Sep/2012:15:42:23 -0700] "GET
http://originserver/data/object HTTP/1.1" 200 32935 TCP_MISS:NONE

This is undesirable in the production setup, since such huge number of
requests hitting the origin server directly have the result of a DOS
attack on the origin server. This has brought down our origin server
more than once now.

I am looking for any help or pointers on how can I deal with such a huge
number of concurrent requests to squid for the same object effectively,
any help is highly appreciated. I am already considering the option of
rate limiting using iptables, however if there is a effective way to
deal with this in the squid configuration itself; I would love to
understand.
Received on Tue Sep 11 2012 - 22:54:30 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 12 2012 - 12:00:03 MDT