Re: [squid-users] Squid and Search Engines

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Sun, 8 Feb 2004 13:54:45 +0100 (CET)

On Sat, 7 Feb 2004, OTR Comm wrote:

> Is it possible to setup a search engine, like Glimpse or Swish-e, to
> catalog and search against the information in the Squid cache?

Technicallly it should be possible, but you need to write another
retreiver spider for the engine knowing how to read the squid cache files
instead of fetching from the web or indexing local files.

The format of the cache files are described in the programmers guide and
iirc there is even a perl module in CPAN for reading these files.

> Does this even make sense? Should I ask this at the Squid Development
> list?

The developer list for the preferred search engine is a better place to
ask I think. There is no modifications required to Squid but the search
engine needs to be slightly modified to know how to read the Squid cache
data.

Each file in the cache contains

a) Meta data like the URL of the file, size, time cached etc. Of this the
search engine needs to use the URL as "name" of the indexed object.

b) The object HTTP headers.

c) The object contents. This is what needs to be indexed.

b+c is the HTTP reply as received by Squid.

Regards
Henrik
Received on Sun Feb 08 2004 - 09:31:57 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:02 MST