[squid-users] File download blocking.

From: <aanderson@dont-contact.us>
Date: Fri, 3 Oct 2003 09:56:24 +0100

I realise that this is probably an often posted topic, and I have read a
lot about blocking file downloads with squid but my research has only
raised more questions than answers. I'm hoping someone can help.

My goal is to block downloads of executables and other 'undesirable' file
types such as audio and video files. However, 'normal' browsing should not
be interrupted and so files such as MS Word, Excel and PDFs should be
unaffected.

At this point I know this:

1. How to block ftp downloads completely. This is relatively easy by
blocking the FTP protocol, ftp port and URLs with FTP in them. (It is http
downloads I'm having problems with.)

2. It is possible to limit file sizes using reply_body_max_size. However
this will affect viewing of PDFs and similar documents greater than the
given size.

3. It is possible to filter on mime type. However, there are so many mimes
type it would be impractical to devise a list of acceptable or unacceptable
types. Allowing only text/html or text/plain will not work.

4. It is possible to filter on file extensions using regular expressions,
e.g:

acl rejected_extensions url_regex -i (https?://)(\w*.+/*)
(\.wav|\.mov|\.mpeg.|\.mp.|\.avi|\.rm.|\.rar|\.wm.
|\.divx|\.cda|\.midi|\.iso)(.*)
acl reject_extensions urlpath_regex -i (\.wav|\.mov|\.mpeg.|\.mp.
|\.avi|\.rm.|\.rar|\.wm.|\.divx|\.cda|\.midi|\.iso|\.pls)

The problem I've found with this is, again, that the list is too vast to be
practical. How many executable file types are there?

5. One suggestion to the above problem is to filter on acceptable file
types rather than unacceptable types. E.g:

acl accepted_extensions url_regex -i (https?://)(\w*.+/*)(\..htm.
|\.htm|\.css|\.srf|\.xml|\.nsf|\.asp|\.asp.|\.pl|\.cgi|\.php.
|\.jsp|\.gif|\.jpg|\.jpeg|\.swf|\.png|\.bmp|\.pdf|\.txt|\.doc|\.xls|\.ppt)
(.*)
acl accept_extensions urlpath_regex -i (\..htm.
|\.htm|\.xml|\.nsf|\.asp|\.asp.|\.pl|\.cgi|\.php.
|\.jsp|\.gif|\.jpg|\.jpeg|\.swf|\.png|\.bmp|\.pdf|\.txt|\.doc|\.xls|\.ppt)

However, both of these acls assume that the URL will contain a .file
extension. But obviously, not all URLs do. Take a URL which refers to a
directory with a default file such as http://www.microsoft.com/security/.
That file does not appear in the URL and so will not pass with rule. So I
came up with the following:

acl accept_directory_url url_regex -i (https?://)(\w*.+/$)

This works in conjunction with acl accepted_extensions url_regex but these
still block URLs with no file type and no terminating /, e.g:

http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=test

Any regex rule I come up with to deal with these URLs seems to be a catch
all rule which will allow everything through. Doh!

So is it actually possible to do what I want using squid? Am I asking too
much? Should I be looking at a commercial solution rather than sticking
with squid, or there an add-on I can use with squid to achieve my aims (I'm
already using squidGuard as well)?

Ash Anderson
MCP, MCSA, A+.

****************************************************************************

ActivityBase 5.1 is now available, please contact your local customer
support manager to schedule an upgrade
E-mail info@id-bs.com for more information and read more at www.id-bs.com

*****************************************************************************

 The information contained in this email may contain confidential or
 legally privileged information. If you are not the intended recipient any
 disclosure, copying, distribution or taking any action on the contents
 of this information may be unlawful. If you have received this email in
 error, please delete it from your system and notify us immediately. Any
 views expressed in this message are those of the individual sender, except
 where the message states otherwise. IDBS takes no responsibility for any
 computer virus which might be transferred by way of this email and
 recommends that you subject any incoming E-mail to your own virus
 checking procedures. We may monitor all E-mail communication through our
 networks.
 If you contact us by E-mail, we may store your name and address to
 facilitate communication.

**********************************************************************
Received on Fri Oct 03 2003 - 02:56:28 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:20:17 MST