Re: url_regex vs urlpath_regex, regex acl syntax?

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 23 Sep 1999 01:05:52 +0200

Josh Kuperman wrote:
>
> I am basically confused over when to use url_regex, urlpath_regex, and
> dstdom_regex. We are trying to restrict chatting. There are many very good
> websites that offer chat; most major commercial sites do.

Lets take http://server.example.com/some/path/to/an/object

url_regex matches against the whole URL above.

urlpath_regex matches against /some/path/to/an/object

dstdom_regex matches against server.example.com

> Though if I had dstdom_regex instead of url_regex, would it work just as
> well?

Unless some of the sites includes part of the path component yes.

> What is the most rational way to block any about.com site containing
> mpchat.htm or parachat.htm without blocking the rest of about.com?

If ACL processing speed is important then

acl about.com dstdomain about.com
acl about.com_chat urlpath_regex mpchat.htm parachat.htm
http_access deny about.com about.com_chat

For a more generic approach, replace the first acl with a include of a
list of potential partial chat hosts, and the second with a include of
commonly seen chat path patterns.

If processing speed isn't that important, then you could make a
url_regex patterns for matching these.

> Are the regular expressions understood by squid the same as in _Mastering
> Regular Expressions : Powerful Techniques for Perl and Other Tools_ by
> Jeffrey E. Friedl, Andy Oram? That is are ^,.,$, etc all understood in the
> conventional way?

Yes. Squid uses the regular expressions commonly referred to as
"extended".

> #acl aclname url_regex [-i] ^http:// ... # regex matches whole URL
>
> does this mean the regex matches the expression following http://, e.g
> 'a.b.c' would match http://a.b.c, but would '.b.c'?
>
> #acl aclname urlpath_regex [-i] \.gif$ ... # regex matches on URL pa
> th
>
> does this simply match anything preceding the $. I am little confused by
> the ^ and the $ since it look like one is match from the front and the
> other is match from the back, though this is not the case.

It is.

The first matches strings beginning with "http://" (i.e. HTTP urls). The
second matches strings ending in ".gif". the "..." is only a
illustration that you may write more than one pattern on the same line.

> If I have a a file containing URLs extracted from the access log, is there
> a set of options I could use with grep (or PERL) to test?

egrep or "grep -E" should be fine.

I think perl's regular expressions is slightly different, but for most
"simple" patterns any of the regexp tools can be used with identical
results.

If you want to be 100% sure, then compile Squid with GNU regex, and use
GNU "grep -E" for testing.

--
Henrik Nordstrom
Spare time Squid hacker
Received on Wed Sep 22 1999 - 17:17:30 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:48:32 MST