Re: [squid-users] Re: I need help with url_regex

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 10 Sep 2010 18:26:34 +1200

On 10/09/10 09:17, devlin7 wrote:
>
> Thanks Amos for the feedback.
>
> It must be that I am entering it incorrectly because anything with a * or ?
> doesn't work at all.
>
> Are you sure that the "." is treated as "any character"

I am. In posix regex...
  "." means any (single) character.
  "*" means any zero or more of the previous item.
  "*" means any one or more of the previous item.
  "?" means zero or one of the previous item.
  "\" means treat the next character as exact, even if its usually special.

by "item" above I mean one character or a whole bracketed () thing.

To be matched as part of the source text the reserved characters all
need to be escaped like \? in the pattern.

>
> I would have thought that blocking .info would block any site that had .info
> in it like www.porn.info but from what you are saying it would also block
> www.sinfo.com. Am I correct?

Yes. These also-rans are most of the problem for this type of config.

>
> So is there a beetter way?

Yes, a few. Breaking the denial into several rules will help do it
faster and more precisely.

In most cases you will find you can do away with the regex part entirely
and ban a whole domain. This way you can also search online and download
lists of proxy domains to block wholesale. It's far easier than trying
to build the list yourself. SquidGuard, DansGuardian, ufdb tools provide
some lists like this. Also RHSBL anti-spam lists often include open
proxy domains.

Some matches you can limit to only trying the matching on certain
domains and doing the regex on only the path portion of the URL
(urlpath_regex matches path+query string):

   acl badDomains dstdomain .example.com .info
   acl urlPathRegex urlpath_regex ^/browse\.php \.php\?q= \.php\?u=i8v
   http_access deny badDomains urlPathRegex

There will be some patterns which detect certain types of broken CMS
(usually the search component "\?q=" like I mentioned) which act like a
proxy even if they were not intended that way. Doing a urlpath_regex
without the domain protection above is needed to catch many site using
these CMS. Just be sure of and careful with the patterns.

NP: Ordering your rules in the same order I've named them above will
even provide some measure of speed gain to the proxy. dstdomain is
rather fast matching, regex is slow and resource hungry.

To backup everything you need reliable management support behind the
blocking policy. With stronger enforcement for students caught actively
trying to evade it. Without those you are is the sad position of an
endless race.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.8
   Beta testers wanted for 3.2.0.2
Received on Fri Sep 10 2010 - 06:26:40 MDT

This archive was generated by hypermail 2.2.0 : Mon Sep 13 2010 - 12:00:02 MDT