Content filtering

From: Simon Rainey <srainey@dont-contact.us>
Date: Tue, 07 Mar 2000 10:22:00 +0000

Hi,

I've asked this question in the past, but times have changed and there may
now be some product out there that fits the bill. I'm looking for a web
content filtering platform with the following spec:

1. URL-based filtering with the ability to categorize sites, e.g. porn,
offensive language, violence etc.
2. content-based filtering, e.g. block documents that contain offensive
language
3. virus scanning of HTTP and FTP downloads
4. user-level filtering policies - 250,000 users each with an arbitrary policy
5. group-level filtering policies - 5000 groups each with an arbitrary policy
6. hierarchical management structure, so the filtering policy of a group of
users can be delegated to a local manager
7. simple to use web-based management of filter lists, users, groups and
policies
8. scalable to cope with a T3 feed (45Mbps)
9. cheap, because this is to be used in the education marketplace

Not much to ask is it ;-)

Using Squid and some tools we have written in-house we already have the
ability to do 1, 8 and 9. We also have 5 and 6 covered to a degree.

Content filtering is more of a challenge. There is no point scanning
non-text documents for offensive text, so the load might not be that high.
However we need a weighted banned words list and the ability to set the
threshold on a per-user or per-group basis. Also, each document must be
fully scanned before it is sent to the client so that a decision can be
made before the user is exposed to any potentially dodgy material.

Requirement 4 would be bad enough if users had to authenticate themselves
with the proxy. In fact the spec requires that users already logged on to
the network (Win NT or Win 2k on the local LAN) should not have to
re-authenticate to get web access (via the centralized proxy farm at the
centre of the WAN). This implies integration with the existing
authentication system so that a user ID / client IP address mapping can be
established.

I-Gear will do some of the above, but it's expensive and doesn't get close
enough to the spec. I've looked at ActiveGuardian, but it needs work and
I'm not convinced it will scale economically.

Personally I think this sort of thing should be done at a local level
rather than centrally. The problem is the management overhead and the cost
of putting a proxy on every LAN (1,100 sites in this instance).

Any suggestions, bright ideas or reasoned arguments why this couldn't work
are welcomed.

Regards,
Simon.

-------------------------------------------------------------
Simon Rainey Direct Line: +44 1235 823238
Principal Internet Consultant Fax: +44 1235 823424
RM IFL Engineering E-mail: srainey@rmplc.net
Internet for Learning, Research Machines plc, New Mill House,
183 Milton Park, Abingdon, Oxfordshire, OX14 4SE, England.
Received on Tue Mar 07 2000 - 03:38:31 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:51:56 MST