POESIA - an opensource Internet content filtering project

From: STARYNKEVITCH Basile <Basile.Starynkevitch@dont-contact.us>
Date: Fri, 15 Feb 2002 14:54:42 +0100

[[an email to the Squid cache developers' mailing list, with copy to the
POESIA mailing list]]

Dear All Squid Developers,

It is my pleasure to announce (as the technical coordinator) to the
Squid developers team the start of the POESIA project

    Public Opensource Environment for a Safer Internet Access
    (IAP2117/27572)

POESIA is an opensource (using the GNU General Public Licence)
Internet Content filtering project, with partial funding from the
European Commission, undeer the Safer Internet Action Plan (INFOSOC
DG) = IAP. Total POESIA project budget is more than 1.9million Euro,
with an E.C. funding of 1.02million Euro. Motivations of the European
safer Internet Action Plan includes protection of European youth from
harmful Internet content.

The POESIA project started on february 04th 2002 and should last 24
months.

The abstract of the project is available on the following European
Commission page:

http://www.europa.eu.int/information_society/programmes/iap/projects/filtering/poesia/index_en.htm

The following 2 paragraphs are copied from the above mentioned page

   Development covers the creation of a library of filtering components,
   and the extension of existing Internet related open-source software to
   use this library. Library components will provide a set of two-layered
   (crude/elaborate) filtering functions covering multiple filtering
   modes (e.g. images, natural language text, URLs, etc). Adaptative
   decision taking mechanisms will combine the output of these components
   to deliver a final filtering decision. POESIA uses caching (extending
   the open-source Squid cache) both for Internet content and for
   filtering scores, enabling mutualization of filtering costs and hence
   the use of more expensive filtering techniques. Communication
   mechanisms will be developed so that several POESIA systems in the
   same area can communicate to share their cached contents and scores.
   
   Filtering will cover a range of modes, including image filtering,
   natural language text filtering, URL, PICS and JavaScript
   filtering. [...]

It should be noted that POESIA will incorporate highly innovative
technologies (including natural language processing, image processing,
static analysis), well ahead of the usual positive|negative URL or
keywords lists techniques used in most other filters.

The POESIA project will soon have its web site on
www.poesia-filter.org - this web site will probably be available on
march 2002.

POESIA typical use should be in educational settings, for instance as
a proxy&firewall&filter between an Internet connection and a
classroom. POESIA aims to run on a PC/Linux.

POESIA will (very probably) extend the Squid cache. Since the project just begin, we
do not have yet a definitive architectural design of POESIA.

My first tentative impressions (looking into Squid-cache version
squid-2.5.PRE3-20020210) are that we might consider a shallow
extension of Squid cache which:

   communicates with the POESIA master process (e.g. thru Unix named
   pipes) which does the bulk of the content filtering.

   stores and manage the POESIA filtering scores in addition of the cached
   content

   sends (when so requested) to the POESIA master process any needed
   content

   sends (when available) to the POESIA master process the filtering
   scores or otherwise request the POESIA master process to compute
   them

   recieves from the POESIA master process a filtering decision
   (accept/reject a content) - so that filtering decision is viewed from Squid
   as an extension of Squid's access control lists

   communicate with other fellow Squid+Poesia systems

Is the Squid cache developer team interested in having such extensions
into Squid (or do POESIA have to fork its own branch of Squid?)

Do the few ideas above appear compatible with the current Squid
design?

Given the above tentative ideas, which part of Squid should be patched
(we already begin to work on this but obviously will appreciate any
help or hints)?

Regards

N.B. Any opinions expressed here are only mine, and not of my organization.
N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.

---------------------------------------------------------------------
Basile STARYNKEVITCH ---- Commissariat à l Energie Atomique * France
DRT/LIST/DTSI/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX
work email: Basile point Starynkevitch at cea point fr
home email: Basile at Starynkevitch point net
Received on Fri Feb 15 2002 - 08:52:23 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:47 MST