Re: [squid-users] Access control : How to block a very large number of domains from hims92 on 2009-06-26 (squid-users)

From: hims92 <himanshu.singh.cse07_at_itbhu.ac.in>
Date: Thu, 25 Jun 2009 23:03:38 -0700 (PDT)

hello,
I performed the tests (to block sites using squidguard) with some less
domains but squid did not respond properly, that is the network got slow.

squid-2.5.STABLE11.tar
squidGuard-1.2.10.tar
Berkeley DB 4.2.52

number of domains in black list - 656490 (0.6 million) ; urls - 141581 (0.1
million)
Peak time requests - 200/sec

Amos Jeffries-2 wrote:
>
> On Mon, 15 Jun 2009 12:26:16 -0700 (PDT), hims92
> <himanshu.singh.cse07_at_itbhu.ac.in> wrote:
>> Hi,
>> As far as I know, SquidGuard uses Berkeley DB (which is based on BTree
> and
>> Hash tables) for storing the urls and domains to be blocked. But I need
> to
>> store a huge amount of domains (about 7 millions) which are to be
> blocked.
>> Moreover, the search time to check if the domain is there in the block
>> list,
>> has to be less than a microsecond.
>>
>> So, Will Berkeley DB serve the purpose?
>>
>> I can search for a domain using PATRICIA Trie in less than 0.1
>> microseconds.
>> So, if Berkeley Trie is not good enough, how can I use the Patricia Trie
>> instead of Berkeley DB in Squid to block the url.
>
> Do do tests with such a critical timing you would be best to use an
> internal ACL. Which eliminates networking transfer delays to external
> process.
>
Can you a bit more specific how to do that; am pretty new to squid.

> Are you fixed to a certain version of Squid?
>
No am not. But presently, my institution has :
squid-2.5.STABLE11.tar
squidGuard-1.2.10.tar
Berkeley DB 4.2.52

And would like to find the solution, if possible for these versions only.

> Squid-2 is not bad to tweak, but not very easy to add to ACL either.
>
> The Squid-3 ACL are fairly easy to implement and drop a new one in. You
> can
> create your own version of dstdomain and have Squid do the test. At
> present
> dstdomain uses unbalanced splay tree on full reverse-string matches which
> is good but not so good as it could be for large domain lists.
>
How to create our own version of dstdomain?
Does the earlier versions(2.x) of squid also use unbalanced splay tree for
searching a url/domain or do they use linear search, binary search or some
other efficient search technique.
Is it possible to may be store all the domains and urls (0.7 million approx)
in a vector (STL) and then perform binary_search to find the result of the
query?
I tested the binary_search in a stand alone cpp program, and the query time
was pretty satisfactory for me.

How does squid handle the requests for domain ips? Does it stores all domain
ips somewhere or first perform a dns lookup for the domain name and then
searches for whether its in deny/access list or not before giving access?

-- 
View this message in context: http://www.nabble.com/Access-control-%3A-How-to-block-a-very-large-number-of-domains-tp24041263p24215419.html
Sent from the Squid - Users mailing list archive at Nabble.com.

Received on Fri Jun 26 2009 - 06:03:41 MDT

This archive was generated by hypermail 2.2.0 : Sat Jun 27 2009 - 12:00:03 MDT