Re: [squid-users] squid consuming near all (95+ %) CPU, it is normal? from Amos Jeffries on 2012-11-11 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 12 Nov 2012 14:41:59 +1300

On 12.11.2012 03:25, Frantisek Hanzlik wrote:
> With this squid configuration:
> acl localnet src 172.16.0.0/12
> acl localnet src 192.168.0.0/16
> acl SSL_ports port 443
> acl SSL_ports port 85
> acl SSL_ports port 81
> acl SSL_ports port 5443
> acl Safe_ports port 80
> acl Safe_ports port 21
> acl Safe_ports port 443
> acl Safe_ports port 70
> acl Safe_ports port 210
> acl Safe_ports port 1025-65535
> acl Safe_ports port 280
> acl Safe_ports port 488
> acl Safe_ports port 591
> acl Safe_ports port 777
> acl Safe_ports port 5443
> acl Safe_ports port 85
> acl Safe_ports port 81
> acl CONNECT method CONNECT
> http_access allow manager localhost
> http_access deny manager
> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access allow localnet
> http_access allow localhost
> http_access deny all
> http_port 3128
> hierarchy_stoplist cgi-bin ?
> cache_dir ufs /var/spool/squid 1000 16 256 max-size=999000
> cache_mem 512 MB
> maximum_object_size 4096 KB
> memory_pools off
> cache_swap_low 90
> cache_swap_high 95
> dns_nameservers 172.16.1.1
> client_db off
> half_closed_clients off
> max_filedesc 4096
> coredump_dir /var/spool/squid
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
> acl users src 172.31.0.0/16
> delay_pools 1
> delay_class 1 2
> delay_parameters 1 5000000/10000000 5000000/10000000
> delay_access 1 allow users
>
> squid very often load CPU at near 100%, with cca 200 users and 4000
> connections (~2000 to users, 2000 to internet). Removing delay pool
> configuration has no big effect.

Users and connections is meaningless in HTTP. Requests per second
flowing over those connections is what counts.

The proxy might have all 4000 links idle (low CPU; zero bandwidth; zero
disk I/O), be downloading video (or .iso) images simultaneously (low
CPU; maxed out bandwidth; high disk I/O), or parsing and processing
header-only requests (100% CPU; moderate or low bandwidth; no disk I/O).

NP: 3.2 uses HTTP/1.1. A lot of the protocol performance features in
HTTP/1.1 are done by removing object bodies and reducing the
transactions to header-only requests and responses.

> HW configuration: Dual core E8500_at_3.16GHz CPU, 4GB RAM, 2x SATA 7k2
> Raid Edition disks in SW RAID1 for squid cache (disk performance
> seems isn't problem, IOWAIT is small), gigabit ethernet cards to
> internet (~800 Mbps line) and to LAN.
> It is squid-3.2.3.20121106.r11695-1.fc14.i686 on Fedora 14 i686
> (I test it with some older squid 3.1 version and same configuration
> too, but results were same, or rather worse)
>
> It this CPU load normal, or can be there done some performance
> tunnning for it?

In order of likelyhood:

Experiment #1 is remove that SW RAID and test again.
Sure iowait is not bad (um, to *master* disk it is same as accessing
without RAID at all), however iowait is only half the story with SW
RAID. Being SW every I/O op sucks away some ... CPU. Disk I/O CPU
loading in particular can be doubled depending on the implementations
buffering efficiency.
  If you need RAID at all use HW RAID instead of SW. The only benefit
you get from RAID under Squid is some advance notice of disks failing
before the proxy crashes (or starts TCP_SWAPFAIL_MISS'ing - but UFS
cache type still crashes so maybe you do need RAID).
  By using SW RAID in particular you are taking CPU cycles away from
Squid, which would otherwise be using them to process a higher req/sec
peak capacity. If your peak traffic is low req/sec this is not a
problem, but for several thousand users I expect your peak capacity
needs are high.
  Choice is yours, but I seriously advise moving away from SW RAID.

Experiment #2 is to set "memory_pools on".
Up to 90% of malloc/free calls are for very short strings and small
objects. Squid can save on a lot of system allocator cycles and shrink
the overall system RAM requirements a little bit by allocating these in
batches/blocks/pools. This will help speed up req/sec capacity in the
traffic which consists of mostly HTTP headers.

Experiment #3 is to raise max_filedesc
I suspect this setting is the only thing limiting your proxy to 2000
user connections and 2000 internet connections. Notice that makes 4000,
and with 100 connections held in reserve for unexpected disk access it
would seem that you are not using disks at all for many of these
connections (TCP_MISS and TCP_MEM_HIT being most of your load?).
When there are not enough FD to service all incoming requests Squid
starts limiting them and spends extra CPU cycles doing management of the
waiting client connection queue.

Experiment #4 is to set "cache_swap_low 97" and "cache_swap_high 98".
Periodic garbage collection happens when the cache fills up. With a
1GB disk cache the default settings make about 50-100MB of objects being
processed in the cache and erased from disk. Most objects will be only a
few KB - see your avg object size stats.
NOTE: this may only marginally appear in iowait, but shows up better
in the related erase/unlink operation stats.

Amos
Received on Mon Nov 12 2012 - 01:42:02 MST

This archive was generated by hypermail 2.2.0 : Tue Nov 20 2012 - 12:00:05 MST