Re: [squid-users] bad regex is blocking the wrong sites

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 04 Oct 2011 13:41:22 +1300

 On Mon, 03 Oct 2011 14:42:43 -0700, devadmin wrote:
> Hello Im new to blocking with squid, right now im using a bad site
> list
> and that works fine, blocks urls as it should, but Im also
> experimenting
> with the bad regex style blacklist because I see a lot of porn is
> still
> getting through, but the badregex is blocking farmvilla zynga content
> as
> well as AOL email! I would like to know why "gay" and "porn" would
> cause
> aol and farmville to be blocked and any suggestions that might be

 Welcome to the world of filtering. Just about every admin this planet
 has tried it at some point and none succeeded yet. Best advice is not to
 bother. Try other means. If you continue with this good luck.

> helpful would be so very much appreciated, I have teenagers on the
> lan
> and need to protect them from this garbage the best of my ability.

 Protection begins with education and awareness. The form of
 "protection" you are attempting is akin to blindfolding them and tying
 them up in a closet. As soon as they move out of the sanitised zone you
 are building they will have to face more hardened peers and come off the
 worse because of it.
  Denial of access to information (bad experiences included) is a
 violation of human rights.

 That said, I know there are places (certain countries and school
 systems) which mandate this kind of filtering. If you are operating
 inside one of those you will find it better practice to make extensive
 use of local whitelists and public blacklists. The public blacklists
 have professionals paid to make them correct and keep up with changes.
 It is more than a full time job keeping up with the thousands of new
 websites which appear every day.

>
> heres the contents of the bad regex blacklist im using, just a single
> line.
>
> .*porn*.*
>
> one entry. and this single entry causes all those sites/services and
> more to be blocked. What am I doing wrong?

 That regex matches the text "por", contained anywhere in the object
 being scanned.

  .* -> zero or more of any character
  por -> a 'p' followed by 'o' followed by 'r'
  n* -> a sequence of _zero_ or more 'n'
  .* -> zero or more of any character

 ie. http://PORtal.facebook.com/

 <snip>
>
> acl manager proto cache_object
> acl localhost src 127.0.0.1/32
> acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
> acl localnet src 10.10.1.0/24 # RFC 1918 possible internal network
> acl blacklist dstdomain "/etc/squid3/squid-block.acl"
> #acl badregex url_regex -i "/etc/squid3/badregex.acl"

 url_regex is not a great idea if you are writing the lists yourself. It
 matches the entire URL end-to-end including the query string portion and
 path. You want to have different word filters for each piece of the URL.

>
> http_access deny blacklist
> http_access deny badregex

 The first step to using regex blocklists safely is to reduce the places
 where you are testing it.

 At minimum add a whitelist:
   http_access deny !whitelistA blacklist
   http_access deny !whitelistB badregex

> http_access allow manager localhost
> http_access deny manager
> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access allow localhost
> http_access allow localnet
> http_access deny all
>

 Amos
Received on Tue Oct 04 2011 - 00:41:31 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 04 2011 - 12:00:03 MDT