Re: [squid-users] open webpage category database proposal

From: Antony Stone <Antony@dont-contact.us>
Date: Mon, 28 Jul 2003 19:58:04 +0100

On Monday 28 July 2003 11:37 am, Bgs himself wrote:

> > Also a reverse lookup when processing an incoming email may add a few
> > seconds to the transmission time - but who cares? The same is not true
> > of website access - people would care a lot (and complain).
>
> This data is well cache-able. The overhead is not that big. And don't
> forget that for many companies, institutions a well managable
> supervision/filtering is more important than a few seconds of browsing
> time a day.

But it's not a few seconds per day - it's a few seconds per http request -
that can add up to an awful lot once you start browsing around for some
information. I really think that doing some sort of DNS lookup for each URL
as it is requested would create an unacceptable overhead on browsing speed.

> > I think a big difficulty here would be the question of whether your idea
> > of illegal / undesirable / objectionable / etc content is the same as
> > mine - one person's humour site may be another person's idea of
> > pornography...
>
> This has abosulutely nothing to do with my opinion of pronoghraphy and
> alike. What is illegal or objectionable is decided opn the proxy
> side. "You think pornography is not good ? Filter the appropriate
> category!".

Yes, sure, but who puts the pages into the appropriate category?

Suppose I want to block pornographic sites. I implement that policy by
blocking all sites with "pornographic" in the category. Then I find I can
still view some pornographic images because somebody else decided they should
be in the "artistic" category. Their idea of pornography differs from mine
- that's why it's important.

> This db wouldn't be that centralised. There can be as many mirrors as you
> like. Just as with RBL...

I think it doesn't matter how many mirrors you have - it still takes a lot
more time to do the DNS-type lookup than it would take to fetch the webpage
without the lookup - and the lookup is needed for every URL, otherwise the
proxy doesn't know whether to allow it or not.

I'm basically saying I think it's a good idea, but the performance impact
would be too great in practice, the categories would be difficult to
standardise between different people who decide where a particular URL fits
into the classification system, and there's too much human judgement involved
to be able to do it for free.

I'm happy to be proved wrong, but that's what I think.

It would be good if something like this could work; I'm just not optimistic.
 

Antony.

-- 
The idea that Bill Gates appeared like a knight in shining armour
to lead all customers out of a mire of technological chaos
neatly ignores the fact that it was he who, by peddling
second-rate technology, led them into it in the first place.
 - Douglas Adams in The Guardian, August 25, 1995
Received on Mon Jul 28 2003 - 12:58:20 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:18:19 MST