url_regex vs urlpath_regex, regex acl syntax?

From: Josh Kuperman <josh.kuperman@dont-contact.us>
Date: Wed, 22 Sep 1999 17:25:30 -0400

I am basically confused over when to use url_regex, urlpath_regex, and
dstdom_regex. We are trying to restrict chatting. There are many very good
websites that offer chat; most major commercial sites do.

I have no problem using url_regex to block out sites like
www.aol.com/community/chat. I have a file I call chatsites
and use this acl:

acl chatsites url_regex "/etc/squid/chatsites"

Though if I had dstdom_regex instead of url_regex, would it work just as
well? Still I have other problems. about.com is more difficult. They have
their various subject heading. a.about.com, b.about.com, c.about.com,
etc.about.com. Then they have a link to a url containing the string
'mpchat.htm'
http://freebies.about.com/mpchat.htm?PID=2724&COB=home
and from there a login site
http://freebies.about.com/gi/chat/parachat/parachat.htm?CO....

What is the most rational way to block any about.com site containing
mpchat.htm or parachat.htm without blocking the rest of about.com?

Will I need to use urlpath_regex or can this be done with url_regex?

Is there a way with a regular expression to capture say webchat.html
without worrying about blocking out Chatanooga, etc?

Are the regular expressions understood by squid the same as in _Mastering
Regular Expressions : Powerful Techniques for Perl and Other Tools_ by
Jeffrey E. Friedl, Andy Oram? That is are ^,.,$, etc all understood in the
conventional way?

#acl aclname url_regex [-i] ^http:// ... # regex matches whole URL

does this mean the regex matches the expression following http://, e.g
'a.b.c' would match http://a.b.c, but would '.b.c'?
 
#acl aclname urlpath_regex [-i] \.gif$ ... # regex matches on URL pa
th

does this simply match anything preceding the $. I am little confused by
the ^ and the $ since it look like one is match from the front and the
other is match from the back, though this is not the case.

If I have a a file containing URLs extracted from the access log, is there
a set of options I could use with grep (or PERL) to test?

--
Josh Kuperman        Saratoga Springs Public Library
sar_kuper@sals.edu   49 Henry St  
518.584.7860x211     Saratoga Springs, NY 12866
http://www.library.saratoga.ny.us 
Received on Wed Sep 22 1999 - 15:39:50 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:48:31 MST