[squid-users] Re: Large ACLs and squid.

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Fri, 07 Dec 2001 18:16:00 +0100

dstdom_regex have the same problems as url_regex or urlpath_regex, only
that the list usually is shorter..

dstdomain scales quite well with size. A 75000 dstdomain acl should be
about the same as a 16 entries dstdom_regex list if my calculations is
correct (binary tree search rather than linear list, with some twists to
speed up common lookups).

Please use the squid-users list for Squid questions.

Please note that my answers are assuming you are using one of the
supported Squid versions. You mentioning USE_BIN_TREE below makes me
wonder.. USE_BIN_TREE existed in the ancient Sqiud-1.x releases while
the splay code which is now the default (and only) search method for
ordered ACL's (i.e. most of the non-regex based acl types) was still
being somewhat experimental...

Btw, even if you did use a ancient Squid version having USE_BIN_TREE, it
can not have made a single bit of difference for url_regex or any of the
other regex based ACL types as such types cannot be ordered/sorted. The
regex based ACL types is and has always been linear lists of regex
patterns.

Regards
Henrik

Mike Bruno wrote:
>
> Hello Henrik.
>
> I saw the conversation below on a newsgroup, and hopefully you can
> answer a question for me.
>
> My question is that if i have, say 75,000 domains, and another few
> thousand dstdom_regex's within some ACLs which i'd like to deny, is
> Squid able to handle this? Or, is it better to use a redirector such
> as SquidGuard?
>
> Naturally, as you state below, i see how url_regex doesn't scale, but
> there must be an upper limit on dstdom_regex and dstdomain - correct?
> I'm looking for some ballpark guidelines and advice.
>
> Much appreciated for any advice you could lend.
>
> Thank you
> -Mike Bruno
> mbruno@onurb.com
>
> ---------------------------------------------------
> Search Result 1
> From: Henrik Nordstrom (hno@squid-cache.org)
> Subject: Re: [squid-users] Squid choking on large ACL lists--high CPU
> usage
> Newsgroups: mailing.unix.squid-users
> View: (This is the only article in this thread) | Original Format
> Date: 2001-11-04 03:48:13 PST
>
> url_regex has a scalability problem when the lists grows large,
> especially if
> most are word matches and not fixed matches. For each pattern it has
> to do a
> full match against every URL seen.
>
> If you want to block whole sites it is better to use the dst_domain or
> dst
> ACL types. These scale a whole lot better.
>
> If you want to block sites with a certain word or pattern in their
> domain
> name, it is better to use dstdom_regex. It has a much smaller search
> scope
> than url_regex (only the host.domain part, not the whole URL).
>
> Regards
> Henrik Nordstrm
> Squid Hacker
>
> On Sunday 04 November 2001 05.03, Adam Maynard wrote:
> > acl [block_sites,unblock_sites,direct] url_regex -i "textfile"
> >
> > Squid is being used here to filter web content. I wasn't around when
> it was
> > set up & I'm not all that familiar with it yet. I think the block
> list has
> > about 1000 entries. We tried to load a much larger list using the
> same
> > format (about 40x bigger i think) & CPU usage went to max.
> USE_BIN_TREE
> > reconfigure helped a little but not enough. Any ideas?
> > Thanks,
> > Adam Maynard
> >
> > ----- Original Message -----
> > From: "Henrik Nordstrom" <hno@squid-cache.org>
> > To: "Adam Maynard" <ml@cirrusnetworks.com>;
> <squid-users@squid-cache.org>
> > Sent: Saturday, November 03, 2001 7:24 PM
> > Subject: Re: [squid-users] Squid choking on large ACL lists--high
> CPU usage
> >
> > > What kind of ACL lists are you using?
> > >
> > > And how large?
> > >
> > > Regards
> > > Henrik Nordstrm
> > > Squid Hacker
> > >
> > > On Saturday 03 November 2001 23.36, Adam Maynard wrote:
> > > > Anybody know why using a large acl list would push squid's cpu
> usage
> > > > through the roof? I don't remember exact version # or config
> info. I
> >
> > know
> >
> > > > gnuregex is enabled & it's running on linux 2.4.5. Any general
> insight?
> > > >
> > > > AM
Received on Fri Dec 07 2001 - 11:38:42 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:05:16 MST