Re: [squid-users] Squid and Search Engines

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 9 Feb 2004 00:14:04 +0100 (CET)

On Sun, 8 Feb 2004, OTR Comm wrote:

> Do you by any chance know the name of the CPAN module?

No, and I am not sure I remember correctly. May have been confused there,
thinking about something else.

> When I do a 'file' on a particular cache file, I get back that it is
> DBase 3 format, is this correct, or is this just the closest that Linux
> can get on determining the type of file?

It is just file which does not know how to identify the file format.

> For example, I have a cache file:
>
> /usr/local/squid/var/cache/00/09/0000092D
>
> with header information:
>
> ^Co
> Content-Length: 2173
> Content-Type: image/gif
> Last-Modified: Sun, 11 Jan 2004 05:20:46 GMT
> Accept-Ranges: bytes
> ETag: "5db8d2aa2d8c31:627d33"
> Server: Microsoft-IIS/6.0
> Date: Thu, 22 Jan 2004 03:02:01 GMT
> Connection: close
> <snip>
>
> and from that, the 'purge' utility returns the URL of:
>
> http://www.whitehouse.org/kids/images/tn-palm.gif
>
> How is the URL deciphered? For the life of me, I can't figure it out.

It is in the header. I do not know what tool you used for viewing the
header but somehow it all got compressed to just a ^Co above..

The file header contains binary information and the output will almost
certainly look very strange if you just dump it on the screen.

The HTTP headers is not part of the file header but part of the cached
HTTP response which follows the file header.

> I read in the Programming Guide that "A cache swap file consists of two
> parts: the cache metadata, and the object data."

See Chapter 28 "Store ``swap meta'' Description". This chapter describes
the on-disk cache file format.

The purge tool you already mentioned is also a good reference on how this
works.

Regards
Henrik
Received on Sun Feb 08 2004 - 16:14:08 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:02 MST