Re: [squid-users] open webpage category database proposal from Bgs himself on 2003-07-28 (squid-users)

From: Bgs himself <bgs@dont-contact.us>
Date: Mon, 28 Jul 2003 12:37:16 +0200 (MEST)

Hi !

> RBL works for mail servers because a hostname is either a mailserver, or it
> isn't. I don't see the idea working so neatly for web servers, because a
> single website can have many many different types of content in subpages -
> just think of www.geocities.com for a fairly extreme example of this.

You are right with the geocities.com example, but most of the internet
traffic is based on domain related sites.

> Also a reverse lookup when processing an incoming email may add a few seconds
> to the transmission time - but who cares? The same is not true of website
> access - people would care a lot (and complain).

This data is well cache-able. The overhead is not that big. And don't
forget that for many companies, institutions a well managable
supervision/filtering is more important than a few seconds of browsing
time a day.

> > 'a' is the category ID
> > 'b' is the sub category ID
> > 'c' is the rating
> > 'd' is the custom flag
>
> I think a big difficulty here would be the question of whether your idea of
> illegal / undesirable / objectionable / etc content is the same as mine - one
> person's humour site may be another person's idea of pornography...

This has abosulutely nothing to do with my opinion of pronoghraphy and
alike. What is illegal or objectionable is decided opn the proxy
side. "You think pornography is not good ? Filter the appropriate
category!". You canb use the information for filtering, statistics,
partial restrictions or anything you like. With the sub categories and
custom flags (and raing) the whole thing can be made rather objective.

>
> > The db would be user managed: everyone could add (with proper checking of
> > course) new sites. After a while this might grow into an uptodate db.
>
> Email RBLs can be automatically checked - you don't need a person to decide
> whether a mail server is an open relay or not, and databases like Razor and
> DCC help to check for machines which spew out spam on a regular basis.

Every system has its weak points. RBLs automatizm is also a weak point, it
has a lot of false positives, mostly in DHCP regions. Some of my customers
have daily nightymares because of RBL. They plain hate is. They are
inocent, not openrelay but regularly cannot send email to certain
domains.

In the last turn is always about the balance of pros and cons.

> I don't see that this is possible with websites - there would be much more
> human decision-making involved (if the website contents can be checked
> automatically, why not just do it on your own proxy server instead of in a
> centralised database?)

Until proper AI, ther won't be a proxy side solution. Even then the
purpose is questionable as you have to download everything to decide. That
a lot of wasted bw.

This db wouldn't be that centralised. There can be as many mirrors as you
like. Just as with RBL...

Bye
Bgs
Received on Mon Jul 28 2003 - 05:32:39 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:18:18 MST