RE: Adult Sites

From: Nottingham, Mark (Australia) <mark_nottingham@dont-contact.us>
Date: Tue, 23 Feb 1999 23:58:31 -0500

This is a difficult one; I agree that it's nearly impossible to block any
good portion of 'objectionable' content while leaving 'normal' content
untouched. However, practically speaking, there is a real need for SOME sort
of filtering, if only to discourage people from casually surfing porn on
work time (in a corporate setting).

The compromise that I usually go for is to use Squirm to block based on
*host name* in the url, like this:

regexi ^http:\/\/[^\/]*adult[^\/]*\/
http://redirect-host/youre/a/bad/boy.html
regexi ^http:\/\/[^\/]*hardcore[^\/]*\/
http://redirect-host/youre/a/bad/boy.html
regexi ^http:\/\/[^\/]*xxx[^\/]*\/ http://redirect-host/youre/a/bad/boy.html
[and so on... we have about 25 patterns like this, as well as a few for
specific hosts]

Certainly, this will not get even half of the sites out there (particularly,
anything that is in a ~user directory), and will block some legitimate
sites; however, there will be far less 'false blockings' then if it were by
the full URL, or content. The idea here is just to discourage people who are
trolling for porn because they're bored.

This would certainly not be appropriate for an ISP, but it's worked fairly
well at a few companies where I've implemented it; the patterns, if well
chosen, do a remarkable job in getting the most flagrant violations out. I
should mention that the redirect page states that if the user believes the
requested page is legitimate, to contact the Webmaster, etc, etc.

Just my .02. (I know that I'm going o get a bunch of mail back from people
saying that it isn't perfect; what is?)

> -----Original Message-----
> From: Brian Ristuccia [mailto:brianr@osiris.978.org]
> Sent: Wednesday, February 24, 1999 3:20 PM
> To: David Luyer
> Cc: Josh Kuperman; Sherwin de Claro; squid-users@ircache.net
> Subject: Re: Adult Sites
>
>
> On Wed, Feb 24, 1999 at 11:41:13AM +0800, David Luyer wrote:
> >
> > Josh Kuperman wrote:
> >
> > > You could try to filter the regular expression "sex"
> which would stifle
> > > about 10%.
> >
> > Is that 10% of sex sites, or 10% of the net's legitimate content?
> >
>
> Most likely 10% of the net's legitimate content, and less
> than 1% of all sex
> sites...
>
> > Placenames such as Essex, Sussex and Middlesex, programming
> references
> > such as the header file 'bytesex.h', links to useful places
> like Sexual
> > Abuse Recovery, Sexual Harassment sites, Sexuality
> Information Department,
> > articles about sextuplets, the benefits of de-sexing of
> pets, ... and so
> > on and so on.
> >
> > There are appropriate words that can be used to filter out
> sex sites.
> > 'sex' just isn't one of them.
> >
>
> I urge you both to rethink your strategies. Site blocking by
> keywords, even
> very carefully chosen ones, is random at best. Whether you
> opt for keyword
> matching in the URL, filename, or document text, the risks
> are grave that
> you will inadvertantly block a large percentage of the
> Internet while still
> missing many of the adult sites you intended to block.
>
> keyword filename document text
> ------- ---------------
> ----------------------------------------------------
> ass assemble.html "..or the right of the people peacibly to
> assemble.."
>
> tit petition.html "..and to petitition the government.."
>
> fuck fucking-ie.html "..After a week of debugging the proxy
> system, we
> tracked the problem to yet another
> fucking bug in IE.
> This patch will allow the proxy to
> work around the
> problem."
>
> breast breast.gif "Yesterday, we found the breast
> possible solution to
> the problem that was causing documents to be
> incorrectly cached."
>
> In cases 1 and 2, the "adult" keyword gets matched as a
> substring in another
> word. In case 3, it's used as an expletive by someone who's
> angry about
> having to work around someone elses's broken software again.
> In case 4's
> document text, we have an easy typing error, where someone
> inadvertantly
> entered breast instead of best. In case 4's filename, we have
> a picture from
> a chicken recipe or women's health site.
>
> > (I keep seeing this suggestion. It really _isn't_ a good idea.)
> >
>
> Unfortunately, neither would the use of any other words. No
> matter how "porn
> only" a word may sound, it's often just one or two characters
> away from a
> commonly used word, a commonly found substring (like the
> essex, sussex,
> middlesex examples you gave), or used an an expletive by
> casual programmers.
>
> --
> Brian Ristuccia
> brianr@osiris.978.org
> brianr@debian.org
> bristucc@cs.uml.edu
>
Received on Tue Feb 23 1999 - 21:43:08 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:44:41 MST