Re: [squid-users] Squid network read()'s only 2k long? from Marcus Kool on 2010-11-01 (squid-users)

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Mon, 01 Nov 2010 16:18:11 -0200

I am author of ufdbGuard, a free URL filter for Squid.
You may want to check it out: ufdbGuard is multithreaded and supports
POSIX regular expressions.

If you do not want to use ufdbGuard, here is a tip:
ufdbGuard composes large REs from a set of "simple" REs:
largeRE = (RE1)|(RE2)|...|(REn)
which reduces the CPU time for the RE matching logic considerably.

Marcus

Henrik K wrote:
> On Mon, Nov 01, 2010 at 03:00:21PM +0000, declanw_at_is.bbc.co.uk wrote:
>> Besides that, I have a laaarge url_regexp file to process, and I was
>> wondering if there was any benefit to trying to break this regexp out to a
>> perl helper process (and if anyone has a precooked setup doing this I can
>> borrow)
>
> The golden rule is to run as few regexp as possible.. no matter how big they
> are.
>
> Dump your regexpes through Regexp::Assemble:
> http://search.cpan.org/dist/Regexp-Assemble/Assemble.pm
>
> Then compile Squid with PCRE support (LDFLAGS="-lpcre -lpcreposix") for
> added performance.
>
> I've only modified Squid2 myself, but for Squid3 you probably need to change
> this in cache_cf.cc:
>
> - while (fgets(config_input_line, BUFSIZ, fp)) {
> + while (fgets(config_input_line, 65535, fp)) {
>
> ... because Squid can't read a huge regexp in a single line otherwise.
> Of course your script must not feed too many regex to go over that limit.
>
> I'm also assuming you've converted as many rules as possible to dstdomain
> etc, which is the first thing to do.
>
>
>
Received on Mon Nov 01 2010 - 18:18:14 MDT

This archive was generated by hypermail 2.2.0 : Tue Nov 02 2010 - 12:00:02 MDT