Re: [squid-users] what are the Pros and cons filtering urls using squid.conf?

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Tue, 11 Jun 2013 14:53:41 -0300

On 06/11/2013 10:33 AM, Eliezer Croitoru wrote:
>> ufdbGuard does not pause answering queries from Squid during a reload
>> since that would pause Squid and is considered an interruption of service.
>>
>> ufdbGuard releases the current URL database, loads a new configuration
>> and loads a new URL database in 10 seconds on average.
>> ufdbGuard has a configurable behaviour in this 10-second interval and
>> does either one of:
>> - allow all URLs; send immediately an "OK" back to Squid (default)
>> - allow all URLs but also introduce artifical delays when sending
>> replies back to Squid.
>> The effect is that traffic is slowed down and the total number of
>> unfiltered URLs is reduced.
>> - deny all URLs; send immediately and "not OK" back to Squid. The end
>> user receives a message like "try again in a few moments".
>>
>> The last option is for the admins who need maximum control and are
>> afraid that users or applications can benefit from the URL filter
>> passing all URLs for 10 seconds.
>>
>> Marcus
> it a very clever IDEA.
> I still prefer a real time upgradeable DB which doesn't require reload etc.
> The above will require a more precise algorithms that will work with another way then only categories.
> I am almost sure that squidguard actually compiles a basic algorithm when loading the config files.

squidguard and ufdbguard do the same thing:
looking at the username and IP address and hostname of the user to find out to which group the user belongs.
in squidguard/ufdbguard terminology: determine the "source".
Then for each source, there is a ACL based on URL categories, e.g.:
block adult, block proxies, allow socialnetworking, block news and allow the rest.

> If there is someone that is familiar with the internals and we can think together on two ways tree(one by category and second by filtering levels) I think it will be a very nice idea.
> Something like:
> porn is bad and in level -51
> malware is bad and in level -50
> news is bad and in level -30
> etc...
> this way we can filter with another approach then we used to.
> The only different is the static algorithm which verifies the url by domain and url.
> check in the db if there is a domain+path then if there is what level is on.
> check in the db if there is domain only then if there is on what level it is.
> it's a very simple idea which will make lot of load on the DB but will make the algorithm very simple.

You lost me here. What are you trying to achieve?

> A youtube video can be filtered very easily by categorizing a full set of urls in youtube format with a small "addon" algorithm which knows of all the ways a youtube video can appear.
> it's better then a simple REGEX and makes the search better.

Also here: what are you trying to achieve on Youtube?

> If in squidguard I would categorize a video by one domain and video identifier we can use a set of domains that the next search can be wither precise or if the ID exists.
> It can work with lots of sites since a URLs should reflect on specific content.
> if it's more then a URL per object then a HEADER will provide enough data on the request to identify it.

Headers are not sent by a Squid to a URL redirector. One needs an ICAP server for that.

> What do you think about the above examples?
>
> Eliezer
Received on Tue Jun 11 2013 - 17:53:59 MDT

This archive was generated by hypermail 2.2.0 : Tue Jun 11 2013 - 12:00:13 MDT