Re: [squid-users] web filtering

From: Christoph Haas <email@dont-contact.us>
Date: Sat, 30 Sep 2006 16:19:13 +0200

On Saturday 30 September 2006 05:11, Chuck Kollars wrote:
> Our experience with web filtering is the differences
> in tools are _completely_ swamped by the quality and
> depth of the blacklists. (The reverse of course is
> also true: lack of good blacklists will doom_any_
> filtering tool.)
>
> We currently have over 500,000 (!) sites listed in
> just the porn section of our blacklist. With quality
> lists like these, any old tool will do a decent job.

And large portions of those half million sites are probably already
something different but porn sites or the domains were given up. I
wouldn't judge the quality completely by the quantity.

> Lots of folks need to get such lists reasonably and
> regularly (quarterly?).

Daily even.

> Useful lists are far far too
> large to be maintained by local staff. Probably what's
> needed is a mechanism whereby everybody nationwide
> contributes, some central site consolidates and
> sanitizes, and then publishes the lists.

I'd welcome such an effort. Some companies invest a lot of effort into URL
categorisation - not just regarding porn sites. But they have several
employees working full-time on that and run a kind of editor's office.
For a free/open-source project you would need a lot of people and some
mechanism (e.g. a web spider) that searches for further sites. And doing
that job is boring. So compared to other free/open-source projects there
is much less motivation to contribute constantly.

> This would be a huge effort. It's not easily possible
> even with lots of clever scripts and plenty of compute
> power. We've already seen more than a handful of
> "volunteers" swallowed up by similar efforts.

I believe that the only blacklist that survived over the ages was
http://urlblacklist.com/ - just that they are non-free now. I may be
mistaken about its history though.

There already exist DNS-based blacklists that are very effective for mail
spam detection. Perhaps a DNS-based register where you can look up if a
certain domain belongs to a certain category might help. Large
installations like ISPs could mirror the DNS zone and private people could
just use them. Perhaps even the Squid developers could support such a
blacklist.

So IMHO we lack both a source (volunteers, spider, web based contribution
system) and a good way to use it. Huge static ACLs don't work well with
Squid.

Since I had to tell our managers at work on how well URL filtering works
(we use a commercial solution) I pulled some numbers. And around 3,000
domains are registered at the DeNIC (german domain registry) alone every
day. Now try that with other registries and get a rough number on many
domains need to get categorized every day. That's the reason why it's so
hard to create reasonable blacklists. (And also the cause for my rants
where people expect decent filtering by just using the current publicly
available blacklists).

You didn't tell much about your intentions though. :)

Kindly
 Christoph
Received on Sat Sep 30 2006 - 08:19:30 MDT

This archive was generated by hypermail pre-2.1.9 : Sun Oct 01 2006 - 12:00:04 MDT