Re: [squid-users] squid 3.1.10 page allocation failure. order:1, mode:0x20

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 19 Aug 2013 19:27:45 +1200

On 17/08/2013 6:45 a.m., inittab wrote:
> Hello,
>
> I wanted to get some suggestions on my current setup and ask if i'm
> expecting too much out of my hardware for the traffic load.

Sorry for the slow reply.

NOTE: If you determine that it is a memory leak, please upgrade to the
current Squid-3.3 or later versions. There are a few dozen leaks in 3.1
and 3.2 series of various sizes which have been fixed. Not everybody is
hitting them due to specific behaviour causing each one, but you may be.

> it appears i am running into out of memory problems and hitting swap,
> squid processes then end up dying out.
> [root_at_squid01 squid]# dmesg | grep "page allocation"
> swapper: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> kswapd0: page allocation failure. order:1, mode:0x20
> squid: page allocation failure. order:1, mode:0x20
>
>
>
> I currently have 2 dell 2950's running squid 3.1.10, we generally see
> ~200Mbps total.

How many HTTP requests/second is the most relevant traffic speed metric
for Squid.

FYI: 200Mbps of traffic coudl be coming from 1 single HTTPS / CONNECT
request per day, or from a million IMS requests. The effect on and by
Squid CPU and memory is drastically different for each of those cases
and varies greatly for all permutations in between.

Each request requires soem KB amount of buffer memory - 1 request/day vs
a million requests/day and you can see where the relevance starts to
appear for your particular problem.

> box stats are:
> 2x Six-Core AMD Opteron(tm) Processor 2427 @2.2Ghz
> 32gb ram
> 1x Intel E1G44HTBLK Server Adapter I340-T4 all 4 ports bonded with 802.3ad
> /var/spool/squid 512G raid5

Ah. RAID. Well there is some more disk I/O overheads you could possibly
avoid:
http://wiki.squid-cache.org/SquidFaq/RAID

Keep in mind tha the cache data is effectively a local _backup_ of data
elsewhere. It is non-critical. The only benefit you gain from RAID is
advance warning about disk failures and some time to correct them
without Squid crashing.

> The boxes are both running 10 squid processes on different ports in
> transparent mode
> I am using iptables rules to redirect traffic to the different squid ports ex:
> 22M 1351M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3120
> 20M 1216M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3121
> 18M 1094M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3122
> 16M 985M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3123
> 15M 886M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3124
> 13M 798M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3125
> 12M 718M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3126
> 11M 647M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3127
> 9631K 582M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3128
> 8668K 524M REDIRECT tcp -- * * 10.96.0.0/15
> 0.0.0.0/0 statistic mode random probability 0.100000 tcp
> dpt:80 redir ports 3129
>
> sysctl.conf:
> net.ipv4.ip_forward = 0
> net.ipv4.conf.default.rp_filter = 1
> net.ipv4.conf.default.accept_source_route = 0
> kernel.sysrq = 0
> kernel.core_uses_pid = 1
> net.ipv4.tcp_syncookies = 1
> net.bridge.bridge-nf-call-ip6tables = 0
> net.bridge.bridge-nf-call-iptables = 0
> net.bridge.bridge-nf-call-arptables = 0
> kernel.msgmnb = 65536
> kernel.msgmax = 65536
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> net.netfilter.nf_conntrack_max = 196608
>
>
> example squid config file: squid-p3120.conf
> acl adminnet src 10.3.25.0/24
> acl proxyvlan src 10.5.22.0/24
> acl SSL_ports port 443
> acl Safe_ports port 80 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 # https
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT
> http_access allow manager localhost
> http_access allow manager adminnet
> http_access allow manager proxyvlan
> http_access deny manager

For high speed Squid-3.2 or later I am recommending that people at least
place the manager ACL tests down ...

> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access deny to_localhost
... here. So that the faster port rejections can protect better against
some DDoS issues. ("manager" has become a regex test.).

> http_access allow localhost
> http_access allow customers

NP: given the above ACLs are all allow, you could in this proxy even
move the manager allow lines down to here. They will be far rarer than
your normal client traffic I think.

> http_access deny all
> hierarchy_stoplist cgi-bin ?
You can simplify the config by removing hierarchy_stoplist.

> coredump_dir /var/spool/squid/p3120
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
> hosts_file /etc/hosts
> dns_nameservers 10.5.7.13 10.5.7.23
> cache_replacement_policy heap LFUDA
> cache_swap_low 90
> cache_swap_high 95
> maximum_object_size_in_memory 96 KB
> maximum_object_size 100 MB
> cache_dir aufs /var/spool/squid/p3120 204800 16 256
> cache_mem 100 MB
> logfile_rotate 10
> memory_pools off

It does vary between installations, but memory_pools can offer reduction
on a lot of memory allocator overheads when it is enabled.

> quick_abort_min 0 KB
> quick_abort_max 0 KB
> log_icp_queries off
> client_db off
> buffered_logs on
> half_closed_clients off
> url_rewrite_children 20
> pid_filename /var/run/squid-p3120.pid
> unique_hostname squid01-p3120.eng.XXXXXX
> visible_hostname squid.eng.XXXXXXX
> icp_port 3100
> tcp_outgoing_address 10.5.22.101
> emulate_httpd_log on
>
>
>
> Anyone have any suggestions on whether or not i'm doing something
> terribly wrong her or missing some kind of performance tuning?

Your memory requirements in MB of RAM per proxy are:
   100 (cache_mem) + 15*0.1 (cache_mem index) + 15*205 (cache_dir index)
+ 0.25 * R (active request buffers)

I note that this is already 3.1GB just for the index values. So 10
proxies will be only leaving ~1GB of RAM for the operating system use,
other processes, and Squids active request buffering.

I suggest dropping the cache_dir size to 100000 and measure the RAM
usage on the box to see how much you can increase it back up.

Amos
Received on Mon Aug 19 2013 - 07:28:16 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 20 2013 - 12:00:05 MDT