Re: [squid-users] squid consuming too much processor/cpu

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Mon, 22 Mar 2010 11:48:02 -0300

Or use an alternative: ufdbGuard.

ufdbGuard is a URL filter for Squid that has a much easier
configuration file than the Squid ACLs and additional
configuration files.
ufdbGuard is also multithreaded and very fast.

And a tip: if you are really serious about blocking
anything, you should also block 'proxy sites' (i.e. sites
used to circumvent URL filters).

-Marcus

Amos Jeffries wrote:
> Muhammad Sharfuddin wrote:
>> On Mon, 2010-03-22 at 08:47 +0100, Marcello Romani wrote:
>>> Muhammad Sharfuddin ha scritto:
>>>> On Mon, 2010-03-22 at 19:27 +1300, Amos Jeffries wrote:
>>>>>> Thanks list for help.
>>>>>>
>>>>>> restarting squid is not a solution, I noticed only after 20 minutes
>>>>>> after restarting, squid started consuming/eating CPU again.
>>>>>>
>>>>>> On Wed, 2010-03-17 at 19:54 +1100, Ivan . wrote:
>>>>>>> you might want to check out this thread
>>>>>>> http://www.mail-archive.com/squid-users@squid-cache.org/msg56216.html
>>>>>>>
>>>>>> Neither I installed any package.. i.e not checked
>>>>>>
>>>>>> On Wed, 2010-03-17 at 05:27 -0700, George Herbert wrote:
>>>>>>> or install the Google malloc library and recompile Squid to
>>>>>>> use it instead of default gcc malloc.
>>>>>> On Wed, 2010-03-17 at 15:01 +0200, Henrik K wrote:
>>>>>>> If the system regex is issue, wouldn't it be better/simpler to just
>>>>>>> compile
>>>>>>> with PCRE? (LDFLAGS="-lpcreposix -lpcre"). It doesn't leak and as
>>>>>>> a bonus
>>>>>>> makes your REs faster.
>>>>>> Nor I re-compiled Squid, as I have to use binary/rpm version of squid
>>>>>> that shipped with the Distro I am using
>>>>>>
>>>>>> issue resolved via removing acl that blocked almost 60K urls/domains
>>>>>>
>>>>>> commenting following worked
>>>>>> ##acl porn_deny url_regex "/etc/squid/domains.deny"
>>>>>> ##http_access deny porn_deny
>>>>>>
>>>>>> so how can I deny illegal contents/website ?
>>>>>>
>>>>> If those were actually domain names...
>>>> they are both urls and domain
>>>>
>>>>> * use "dstdomain" type instead of regex.
>>>> ok nice suggestion
>>>>
>>>>
>>>>> Optimize order of ACLs so do most rejections as soon as possible
>>>>> with fastest match types.
>>> >>
>>>> I think its optimized, as the rule(squeezing cpu) is the first rule in
>>>> squid.conf
>>> That's the exact opposite of "optimizing" as the cpu-consuming rule
>>> is _always_ executed.
>>> First rules should be non-cpu consuming (i.e. non-regexp) and should
>>> block most of the traffic, leaving the cpu-consuming ones at the
>>> bottom, ralrely executed.
>>>
>>>>> If you don't mind sharing your squid.conf access lines we can work
>>>>> through optimizing with you.
>>>> I posted squid.conf when I start this thread/topic, but I have no issue
>>>> posting it again ;)
>>> I think he meant the list of blocked sites / url
>> its 112K after compression, am I allowed to post/attach such a big
>> file ?
>
> The mailing list will drop all attachments.
>
>>>
>>>> squid.conf:
>>>> acl myFTP port 20 21
>>>> acl ftp_ipes src "/etc/squid/ftp_ipes.txt"
>>>> http_access allow ftp_ipes myFTP
>
> The most optimal form of that line is:
>
> acl myFTP proto FTP
> http_access allow myFTP ftp_ipes
>
> NP: Checking the protocol is faster than checking a whole list of IPs or
> list of ports.
>
>>>> http_access deny myFTP
>>>>
>
> Since you only have two network IP ranges that might be possibly allowed
> after the regex checks it's a good idea to start the entire process by
> blocking the vast range of IPs which are never going to be allowed:
>
> acl vip src "/etc/squid/vip_ipes.txt"
> acl mynet src "/etc/squid/allowed_ipes.txt"
> http_access deny !vip !mynet
>
>
>>>> #### this is the acl eating CPU #####
>>>> acl porn_deny url_regex "/etc/squid/domains.deny"
>>>> http_access deny porn_deny
>>>> ###############################
>>>>
>>>> acl vip src "/etc/squid/vip_ipes.txt"
>>>> http_access allow vip
>>>>
>>>> acl entweb url_regex "/etc/squid/entwebsites.txt"
>>>> http_access deny entweb
>
> Doing the same process to entwebsites.txt that was done to domains.deny
> file will stop this one becoming a second CPU waste.
>
>>>>
>>>> acl mynet src "/etc/squid/allowed_ipes.txt"
>>>> http_access allow mynet
>>>>
>
>
> This is the basic process for reducing a large list of regex down to an
> optimal set of ACL tests....
>
>
> What you can do to start with is separate all the domain-only lines from
> the real regex patterns:
>
> grep -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$"
> /etc/squid/domains.deny >dstdomain.deny
>
> grep -v -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$"
> /etc/squid/domains.deny >url_regex.deny
>
> ... check the output of those two files. Don't trust my 2-second pattern
> creation.
>
> You will also need to strip any "^", "$", "http://" and "/" bits off the
> dstdomain patterns.
>
> When thats done see if there are any domains you can wildcard in the
> dstdomain list. Loading the result into squid.conf may produce WARNING
> lines about other duplicates that can also be removed. I'll call the ACL
> using this file "stopDomains" in the following example.
>
>
> For the other file with ones where URL still needs a full pattern match,
> ... split that to create another three files:
> 1) dstdomains where the domain is part of the pattern. I'll call this
> "regexDomains" in the following example.
> 2) the full URL regex patterns with domains in (1). I'll call this
> "regexUrls" in the example below.
> 3) regex patterns where domain name does not matter to the match.
> I'll call that "regexPaths".
>
>
> When thats done, change your config to make your CPU expensive lines:
>
> acl porn_deny url_regex "/etc/squid/domains.deny"
> http_access deny porn_deny
>
> change into these:
>
> # A
> acl stopDomains dstdomain "/etc/squid/dstdomain.deny"
> http_access deny stopDomains
>
> #B
> acl regexDomains dstdomain "/etc/squid/dstdomain.regexDomains"
> acl regexUrls url_regex -i "/etc/squid/regex.urls"
> http_access deny regexDomains regexUrls
>
> #C
> acl regexPaths urlpath -i "/etc/squid/regex.paths"
> http_access deny regexPaths
>
>
> As you can see regex is not done unless it really has to be done.
> At "A" the domains which don't have to use regex at all get blocked
> very fast with little CPU usage.
> At "B" the domains get checked and only the ones which might actually
> patch get a regex done to them.
> At "C" we have no choice so a regex is done as before. But (a) the list
> should now be very small and not use much CPU, and (b) most of the
> blocked domains are already blocked.
>
>
>
>
> Amos
Received on Mon Mar 22 2010 - 14:48:08 MDT

This archive was generated by hypermail 2.2.0 : Mon Mar 22 2010 - 12:00:05 MDT