Re: [squid-users] Access control : How to block a very large number of domains

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 27 Jun 2009 16:07:40 +1200

hims92 wrote:
> hello,
> I performed the tests (to block sites using squidguard) with some less
> domains but squid did not respond properly, that is the network got slow.
>
> squid-2.5.STABLE11.tar
> squidGuard-1.2.10.tar
> Berkeley DB 4.2.52
>
> number of domains in black list - 656490 (0.6 million) ; urls - 141581 (0.1
> million)
> Peak time requests - 200/sec

Squid 2.5 is rather old now. Even 2.6 is now obsolete.

Sounds like the dnsserver traffic cap being reached. That was solved in
by adding or improvements to an internal DNS resolver in later versions.

>
> Amos Jeffries-2 wrote:
>> On Mon, 15 Jun 2009 12:26:16 -0700 (PDT), hims92
>> <himanshu.singh.cse07_at_itbhu.ac.in> wrote:
>>> Hi,
>>> As far as I know, SquidGuard uses Berkeley DB (which is based on BTree
>> and
>>> Hash tables) for storing the urls and domains to be blocked. But I need
>> to
>>> store a huge amount of domains (about 7 millions) which are to be
>> blocked.
>>> Moreover, the search time to check if the domain is there in the block
>>> list,
>>> has to be less than a microsecond.
>>>
>>> So, Will Berkeley DB serve the purpose?
>>>
>>> I can search for a domain using PATRICIA Trie in less than 0.1
>>> microseconds.
>>> So, if Berkeley Trie is not good enough, how can I use the Patricia Trie
>>> instead of Berkeley DB in Squid to block the url.
>> Do do tests with such a critical timing you would be best to use an
>> internal ACL. Which eliminates networking transfer delays to external
>> process.
>>
> Can you a bit more specific how to do that; am pretty new to squid.
>
>
>> Are you fixed to a certain version of Squid?
>>
> No am not. But presently, my institution has :
> squid-2.5.STABLE11.tar
> squidGuard-1.2.10.tar
> Berkeley DB 4.2.52
>
> And would like to find the solution, if possible for these versions only.
>
>
>
>> Squid-2 is not bad to tweak, but not very easy to add to ACL either.
>>
>> The Squid-3 ACL are fairly easy to implement and drop a new one in. You
>> can
>> create your own version of dstdomain and have Squid do the test. At
>> present
>> dstdomain uses unbalanced splay tree on full reverse-string matches which
>> is good but not so good as it could be for large domain lists.
>>
> How to create our own version of dstdomain?
> Does the earlier versions(2.x) of squid also use unbalanced splay tree for
> searching a url/domain or do they use linear search, binary search or some
> other efficient search technique.

Ah, that I'm not sure of. I only joined the squid project 3 years ago.
2.5 was way before my time. There was a lot of improvements during 2.6
when peoples local 2.5 patches got merged apparently.

I'm still learning stuff about 2.5 as one would hear tales of walking
disk drives when I was a student :)

> Is it possible to may be store all the domains and urls (0.7 million approx)
> in a vector (STL) and then perform binary_search to find the result of the
> query?

That has been tried and found slower than the existing splay methods.
Too much overhead in the STL.

> I tested the binary_search in a stand alone cpp program, and the query time
> was pretty satisfactory for me.
>
> How does squid handle the requests for domain ips? Does it stores all domain
> ips somewhere or first perform a dns lookup for the domain name and then
> searches for whether its in deny/access list or not before giving access?

The access control lists are processed and converted to whatever native
type they need at configure time (thus a domain name can be entered in
'dst' type and will add all its current set of IPs to the dst list.)

During operation there is a fqdncache for rDNS results and ipcache for
DNS results. When a domain needs converting its looked up there first
then a DNS request is started if not found or the TTL has expired. Then
when an IP is available its checked against the ACL list.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
   Current Beta Squid 3.1.0.9
Received on Sat Jun 27 2009 - 04:07:50 MDT

This archive was generated by hypermail 2.2.0 : Sat Jun 27 2009 - 12:00:03 MDT