Re: [squid-users] squid consuming too much processor/cpu

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 22 Mar 2010 22:07:32 +1300

Muhammad Sharfuddin wrote:
> On Mon, 2010-03-22 at 08:47 +0100, Marcello Romani wrote:
>> Muhammad Sharfuddin ha scritto:
>>> On Mon, 2010-03-22 at 19:27 +1300, Amos Jeffries wrote:
>>>>> Thanks list for help.
>>>>>
>>>>> restarting squid is not a solution, I noticed only after 20 minutes
>>>>> after restarting, squid started consuming/eating CPU again.
>>>>>
>>>>> On Wed, 2010-03-17 at 19:54 +1100, Ivan . wrote:
>>>>>> you might want to check out this thread
>>>>>> http://www.mail-archive.com/squid-users@squid-cache.org/msg56216.html
>>>>> Neither I installed any package.. i.e not checked
>>>>>
>>>>> On Wed, 2010-03-17 at 05:27 -0700, George Herbert wrote:
>>>>>> or install the Google malloc library and recompile Squid to
>>>>>> use it instead of default gcc malloc.
>>>>> On Wed, 2010-03-17 at 15:01 +0200, Henrik K wrote:
>>>>>> If the system regex is issue, wouldn't it be better/simpler to just
>>>>>> compile
>>>>>> with PCRE? (LDFLAGS="-lpcreposix -lpcre"). It doesn't leak and as a bonus
>>>>>> makes your REs faster.
>>>>> Nor I re-compiled Squid, as I have to use binary/rpm version of squid
>>>>> that shipped with the Distro I am using
>>>>>
>>>>> issue resolved via removing acl that blocked almost 60K urls/domains
>>>>>
>>>>> commenting following worked
>>>>> ##acl porn_deny url_regex "/etc/squid/domains.deny"
>>>>> ##http_access deny porn_deny
>>>>>
>>>>> so how can I deny illegal contents/website ?
>>>>>
>>>> If those were actually domain names...
>>> they are both urls and domain
>>>
>>>> * use "dstdomain" type instead of regex.
>>> ok nice suggestion
>>>
>>>
>>>> Optimize order of ACLs so do most rejections as soon as possible with
>>>> fastest match types.
>> >>
>>> I think its optimized, as the rule(squeezing cpu) is the first rule in
>>> squid.conf
>> That's the exact opposite of "optimizing" as the cpu-consuming rule is
>> _always_ executed.
>> First rules should be non-cpu consuming (i.e. non-regexp) and should
>> block most of the traffic, leaving the cpu-consuming ones at the bottom,
>> ralrely executed.
>>
>>>> If you don't mind sharing your squid.conf access lines we can work
>>>> through optimizing with you.
>>> I posted squid.conf when I start this thread/topic, but I have no issue
>>> posting it again ;)
>> I think he meant the list of blocked sites / url
> its 112K after compression, am I allowed to post/attach such a big
> file ?

The mailing list will drop all attachments.

>>
>>> squid.conf:
>>> acl myFTP port 20 21
>>> acl ftp_ipes src "/etc/squid/ftp_ipes.txt"
>>> http_access allow ftp_ipes myFTP

The most optimal form of that line is:

   acl myFTP proto FTP
   http_access allow myFTP ftp_ipes

NP: Checking the protocol is faster than checking a whole list of IPs or
list of ports.

>>> http_access deny myFTP
>>>

Since you only have two network IP ranges that might be possibly allowed
after the regex checks it's a good idea to start the entire process by
blocking the vast range of IPs which are never going to be allowed:

  acl vip src "/etc/squid/vip_ipes.txt"
  acl mynet src "/etc/squid/allowed_ipes.txt"
  http_access deny !vip !mynet

>>> #### this is the acl eating CPU #####
>>> acl porn_deny url_regex "/etc/squid/domains.deny"
>>> http_access deny porn_deny
>>> ###############################
>>>
>>> acl vip src "/etc/squid/vip_ipes.txt"
>>> http_access allow vip
>>>
>>> acl entweb url_regex "/etc/squid/entwebsites.txt"
>>> http_access deny entweb

Doing the same process to entwebsites.txt that was done to domains.deny
file will stop this one becoming a second CPU waste.

>>>
>>> acl mynet src "/etc/squid/allowed_ipes.txt"
>>> http_access allow mynet
>>>

This is the basic process for reducing a large list of regex down to an
optimal set of ACL tests....

What you can do to start with is separate all the domain-only lines from
the real regex patterns:

   grep -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$"
/etc/squid/domains.deny >dstdomain.deny

    grep -v -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$"
/etc/squid/domains.deny >url_regex.deny

... check the output of those two files. Don't trust my 2-second pattern
creation.

You will also need to strip any "^", "$", "http://" and "/" bits off the
dstdomain patterns.

When thats done see if there are any domains you can wildcard in the
dstdomain list. Loading the result into squid.conf may produce WARNING
lines about other duplicates that can also be removed. I'll call the ACL
using this file "stopDomains" in the following example.

For the other file with ones where URL still needs a full pattern match,
... split that to create another three files:
   1) dstdomains where the domain is part of the pattern. I'll call this
"regexDomains" in the following example.
   2) the full URL regex patterns with domains in (1). I'll call this
"regexUrls" in the example below.
   3) regex patterns where domain name does not matter to the match.
I'll call that "regexPaths".

When thats done, change your config to make your CPU expensive lines:

   acl porn_deny url_regex "/etc/squid/domains.deny"
   http_access deny porn_deny

change into these:

# A
   acl stopDomains dstdomain "/etc/squid/dstdomain.deny"
   http_access deny stopDomains

#B
   acl regexDomains dstdomain "/etc/squid/dstdomain.regexDomains"
   acl regexUrls url_regex -i "/etc/squid/regex.urls"
   http_access deny regexDomains regexUrls

#C
   acl regexPaths urlpath -i "/etc/squid/regex.paths"
   http_access deny regexPaths

As you can see regex is not done unless it really has to be done.
  At "A" the domains which don't have to use regex at all get blocked
very fast with little CPU usage.
  At "B" the domains get checked and only the ones which might actually
patch get a regex done to them.
  At "C" we have no choice so a regex is done as before. But (a) the
list should now be very small and not use much CPU, and (b) most of the
blocked domains are already blocked.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE8 or 3.0.STABLE25
   Current Beta Squid 3.1.0.18
Received on Mon Mar 22 2010 - 09:07:45 MDT

This archive was generated by hypermail 2.2.0 : Mon Mar 22 2010 - 12:00:05 MDT