Re: [squid-users] open webpage category database proposal

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Tue, 29 Jul 2003 00:44:36 +0200

On Monday 28 July 2003 20.58, Antony Stone wrote:

> But it's not a few seconds per day - it's a few seconds per http
> request - that can add up to an awful lot once you start browsing
> around for some information. I really think that doing some sort
> of DNS lookup for each URL as it is requested would create an
> unacceptable overhead on browsing speed.

A properly designed system would not involve a DNS lookup for each URL
unless only individual URLs are rated for this
domain/server/directory/whatever.

Only criteria is that you can make a hierarchy of where your ratings
are applied (domains/servers/directory level) and DNS will fit close
100% as an open distribution medium with a very high cache ratio
utilizing already existing frameworks and technology for distribution
and caching.

Due to the sheer size of a decent ratings databases you will need to
implement your own DNS master servers however, using some form of
replicated backend database with a DNS interface frontend. The
standard DNS servers is not suitable for these volumes of master
information or rates of updates required.

> Yes, sure, but who puts the pages into the appropriate category?

Exacly. And this is THE major technical problem to solve for this to
become a reality. How to get someone to rate the pages with a
reasonably good accuracy. With web pages coming and disappearing at a
rate of god know how many hundreds of thousands per month in is a
quite big problem to solve.

How to get the ratings distributed is minor and not really a problem.
There is many ways to do this, where DNS is one possible framework
which fits very well given a correct application of DNS properties to
the distribution problem.

The reasons why there is noone using DNS for this purpose today is
mainly business and commercial issues, not technical issues. The good
ratings systems are commercial databases with their own interfaces
closely connected to licensing etc to gain access to the information
in different manners depending on the ratings provider. The thing you
license is access to their capability to rate content, not how these
ratings are distributed to your site.

> Suppose I want to block pornographic sites. I implement that
> policy by blocking all sites with "pornographic" in the category.
> Then I find I can still view some pornographic images because
> somebody else decided they should be in the "artistic" category.
> Their idea of pornography differs from mine - that's why it's
> important.

Then you ask the system to have them reclassified as pornography, and
if the system is good in design it allows for both classifications if
the content indeed fits both classifications.

Regards
Henrik

-- 
Donations welcome if you consider my Free Squid support helpful.
https://www.paypal.com/xclick/business=hno%40squid-cache.org
If you need commercial Squid support or cost effective Squid or
firewall appliances please refer to MARA Systems AB, Sweden
http://www.marasystems.com/, info@marasystems.com
Received on Mon Jul 28 2003 - 16:44:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:18:19 MST