Re: [squid-users] squid and regex help

From: Antony Stone <Antony@dont-contact.us>
Date: Fri, 5 Sep 2003 22:31:43 +0100

On Friday 05 September 2003 10:21 pm, Christoph Haas wrote:

> On Sat, Sep 06, 2003 at 02:22:00AM +1200, mdew wrote:
> > Using regex "/etc/squid.adservers" I'm attempting to block any URL's
> > with "penis" AND "large" in the url. Basically *penis*large* and
> > *large*penis* ..I was looking at doing like so..
> >
> > (/large/ && /penis/)
> > (/penis/ && /large/)
>
> See "man 7 regex". I would suggest something like:
> (large.*penis|penis.*large)

Beware of attempting this sort of thing without word boundaries. For
example, there is a town in the north of England called Penistone, and it's
not hard to find several URLs (eg in Google) which include the 5 letters
"penis" without being the sort of thing you're trying to block:

http://www.penistonereinforcements.com

I didn't bother to look for a URL which had "large" somewhere in it as well,
but it's not hard to imagine such a false positive existing.

Maybe you're happy to block a few false positive web pages in exchange for a
higher number of true positives, but it's a choice you should be aware you're
making.

Antony.

-- 
The only problem with the Universe as a platform, though, is that it is 
currently running someone else's program.
 - Ken Karakotsios, author of SimLife
Received on Fri Sep 05 2003 - 15:31:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:19:33 MST