Re: Ideas/solutions to caching identical objects ...

From: <carson@dont-contact.us>
Date: Tue, 3 Dec 1996 16:42:08 -0500

>>>>> "Duane" == Duane Wessels <wessels@nlanr.net> writes:

Duane> I must've missed something. I thought the goal was to increase hit
Duane> counts on identical objects with different names. I don't see how
Duane> this helps since the request doesn't include the MD5.

This whole scheme assumes that it is either possible to get the MD5 checksum
from the HTTP/1.1 headers or that handling the MD5 calculation on-the-fly is
acceptable.

Disk files are stored as now, but instead of hashing the URL for the bucket,
we use part of the MD5 checksum (either first n bits or some other scheme).

The log file changes from:

Filename URL Origdate Expiredate Size

to:

Filename MD5 URL Origdate Expiredate Size

An in-memory and/or on-disk index is maintained mapping URLs to MD5
checksums. Hmmm... perhaps 2 cache trees, one a tree hashed on URL
containing symlinks to the tree hashed on the checksum...

When a request is received, Squid follows the URL hashing as usual. The
behavior differs in 3 cases:

- The URL is not in the cache

In this case, Squid must do a "HEAD URL" to grab the HTTP/1.1 headers which
include the MD5 checksum. If an identical checksum is already in the cache,
Squid just adds a URL->MD5 link.

If the server does not support the HTTP/1.1 MD5 header, Squid has to fetch
the entire document (just like the old days). However, once it fetches the
document, it has the MD5 hash, and can compare it to those it already has
and either add a new unique file or an additional URL->MD5 link.

- The URL is in the cache but needs to be checked for freshness.

Squid can either do a standard IMS, or a HEAD to compare MD5 checksums from
the server (if available).

- The file needs to be removed

Squid must either chase down all URL links to the file, or leave invalid
links lying around and clean them up as they are referenced or during
garbage collection.

I hope the above makes sense, as I'm fairly tired at the moment...

--
Carson Gaspar -- carson@cs.columbia.edu carson@lehman.com
http://www.cs.columbia.edu/~carson/home.html
<This is the boring business .sig - no outre sayings here>
Received on Tue Dec 03 1996 - 13:48:24 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:49 MST