Re: [squid-users] open webpage category database proposal

From: Bgs himself <bgs@dont-contact.us>
Date: Tue, 29 Jul 2003 10:51:20 +0200 (MEST)

 Hi !

> > This data is well cache-able. The overhead is not that big. And don't
> > forget that for many companies, institutions a well managable
> > supervision/filtering is more important than a few seconds of browsing
> > time a day.
>
> But it's not a few seconds per day - it's a few seconds per http request -
> that can add up to an awful lot once you start browsing around for some
> information. I really think that doing some sort of DNS lookup for each URL
> as it is requested would create an unacceptable overhead on browsing speed.

But it's not one lookup per URL, just one lookup per site. And you can
cache positives for days as this is not something that changes
frequently...

From an employers point of view: it doesn't matter if you lose (let's say
a big number) 10 minutes a day because of the category filtering if he
wins half an hour of unwanted browsing time. (or something that it's
illegal in that specific location, etc.)

In my practice I feel that the pros outweight the (you are
right: existing) cons. In places where speed is the key they are simply
not going to use it.

> > > I think a big difficulty here would be the question of whether your idea
> > > of illegal / undesirable / objectionable / etc content is the same as
> > > mine - one person's humour site may be another person's idea of
> > > pornography...
> >
> > This has abosulutely nothing to do with my opinion of pronoghraphy and
> > alike. What is illegal or objectionable is decided opn the proxy
> > side. "You think pornography is not good ? Filter the appropriate
> > category!".
>
> Yes, sure, but who puts the pages into the appropriate category?
>
> Suppose I want to block pornographic sites. I implement that policy by
> blocking all sites with "pornographic" in the category. Then I find I can
> still view some pornographic images because somebody else decided they should
> be in the "artistic" category. Their idea of pornography differs from mine
> - that's why it's important.

This is why there are other fields than category. For example artistic
naked pictures may be considered pornographic in some places. In this case
you may also filter, let's say category art, subcategory act. Or refince
it by a flag that tells you what kind of pictures are there.

You may filter politics altogether or just politics->neonazism. You have 3
entire fields to refine the category which in fact is unusable for several
reasons on its own.

>
> > This db wouldn't be that centralised. There can be as many mirrors as you
> > like. Just as with RBL...
>
> I think it doesn't matter how many mirrors you have - it still takes a lot
> more time to do the DNS-type lookup than it would take to fetch the webpage
> without the lookup - and the lookup is needed for every URL, otherwise the
> proxy doesn't know whether to allow it or not.

Nope. As I mentioned it above, you only have to do one lookup per site. It
takes even less time than doing the normal DNS lookup for that site and
you have to do that for every site and its even less cacheable.

I agreee that there are difficulties and filling the db is is not as
straightforward as in the case with RBL, but looking at the whole picture
I think its something that could be done in practice.

Bye
Bgs
Received on Tue Jul 29 2003 - 03:46:48 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:18:19 MST