Re: [squid-users] what are the Pros and cons filtering urls using squid.conf?

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Tue, 11 Jun 2013 16:33:30 +0300

On 6/11/2013 3:36 PM, Marcus Kool wrote:
>
>
> On 06/11/2013 09:09 AM, Jose-Marcio Martins wrote:
>> On 06/11/2013 12:50 PM, Marcus Kool wrote:
>>
>>>
>>> There is a big misunderstanding:
>>> in the old days when the only URL filter was squidguard, Squid had
>>> the be reloaded in order for
>>> squidguard to reloads its database.
>>> And when Squid reloads, *everything* pauses.
>>> _But things have changed since then_:
>>> - ICAP-based URL filters can reload a URL database without Squid
>>> reloading
>>> - ufdbGuard, which is a URL redirector just like squidGuard, can also
>>> reload a URL database without
>>> Squid reloading.
>>>
>>> The above implies that ICAP-based filters and ufdbGuard are a good
>>> alternatives for squidguard or
>>> filtering by ACLs.
>>
>> ...
>>
>>> ufdbGuard loads the URL database in memory and is multithreaded.
>>
>> OK. Ok if can handle 50000 queries per second.
>>
>> So my question is... a more direct and precise question, just about
>> ufdbGuard. While ufdbGuard reloads its URL database, does it pauses
>> answering queries ? If yes, how long does it takes ?
>
> ufdbGuard does not pause answering queries from Squid during a reload
> since that would pause Squid and is considered an interruption of service.
>
> ufdbGuard releases the current URL database, loads a new configuration
> and loads a new URL database in 10 seconds on average.
> ufdbGuard has a configurable behaviour in this 10-second interval and
> does either one of:
> - allow all URLs; send immediately an "OK" back to Squid (default)
> - allow all URLs but also introduce artifical delays when sending
> replies back to Squid.
> The effect is that traffic is slowed down and the total number of
> unfiltered URLs is reduced.
> - deny all URLs; send immediately and "not OK" back to Squid. The end
> user receives a message like "try again in a few moments".
>
> The last option is for the admins who need maximum control and are
> afraid that users or applications can benefit from the URL filter
> passing all URLs for 10 seconds.
>
> Marcus
it a very clever IDEA.
I still prefer a real time upgradeable DB which doesn't require reload etc.
The above will require a more precise algorithms that will work with
another way then only categories.
I am almost sure that squidguard actually compiles a basic algorithm
when loading the config files.
If there is someone that is familiar with the internals and we can think
together on two ways tree(one by category and second by filtering
levels) I think it will be a very nice idea.
Something like:
porn is bad and in level -51
malware is bad and in level -50
news is bad and in level -30
etc...
this way we can filter with another approach then we used to.
The only different is the static algorithm which verifies the url by
domain and url.
check in the db if there is a domain+path then if there is what level is on.
check in the db if there is domain only then if there is on what level
it is.
it's a very simple idea which will make lot of load on the DB but will
make the algorithm very simple.
A youtube video can be filtered very easily by categorizing a full set
of urls in youtube format with a small "addon" algorithm which knows of
all the ways a youtube video can appear.
it's better then a simple REGEX and makes the search better.
If in squidguard I would categorize a video by one domain and video
identifier we can use a set of domains that the next search can be
wither precise or if the ID exists.
It can work with lots of sites since a URLs should reflect on specific
content.
if it's more then a URL per object then a HEADER will provide enough
data on the request to identify it.

What do you think about the above examples?

Eliezer
Received on Tue Jun 11 2013 - 13:33:59 MDT

This archive was generated by hypermail 2.2.0 : Tue Jun 11 2013 - 12:00:13 MDT