Re: Handling aliased download sites

From: Dancer <dancer@dont-contact.us>
Date: Mon, 10 Apr 2000 03:05:52 +0000

I've thought about this from time to time, specifically that multiple URL's
may refer to identical copies of an object, and that it would be preferential
(from a bandwidth perspective, maybe from others as well) to store only a
single copy of the object.

However, let's download...oh, I don't know...random.exe from a server.
http://www.somewhere.com/random.exe

We store that object.

Someone else comes along and gets
http://msdownload.com/4598AB024985700FF/Considerable_unpredictable_guff/RANDOM.EXE

Now, that's the same actual entity, in theory. However we don't know that it
is until we get it. Once we've gotten it, we then need to remember that the
second URL refers to the same object as the first one.
* What about last-modified and expires times? What if they're different?
* What if one is 'Cache-control: public' and one is 'Cache-control: private'?

* Different E-tags?

Do we revalidate the single object based on the most restrictive policy
specified in the headers or on the most relaxed? Do we create some notional
'average' value?

And of course, when the object is removed from the cache, we have to remember
everything URL that it was stored under, and drop those as well.

I'm not saying that it can't be done, or shouldn't be done. I can see ways of
making 'magic' cachefiles that refer to other objects in the store. What I
_am_ saying though is that there are a whole slew of issues that need
thinking through.

D

Bert Driehuis wrote:

> Has anyone given any thought to the problem of multiple download sites?
>
> For example, downloads of Microsofts Internet Explorer could be handled by
> a gazillion of web sites, like msvaus.www.connxion.com or
> mskuys.www.connxion.com. Squid handles them as different objects, thereby
> increasing the bandwidth demands if two users download them from different
> sites. In IE4, this could be tweaked by setting up a redirector for the
> file IE5SITES.DAT, and replacing it with a stripped down copy to make
> everyone use the same download site. However, with IE5 Microsoft turned to
> redirects from their main download site, busting this workaround. Of
> course, rewriting the URL from a redirector is still possible, but that
> sort of defeats the purpose of allowing the user to pick a "good" download
> site.
>
> I've considered adding a feature to Squid that uses regexes to rewrite the
> URL MD5 hash for such sites, so that once a copy from msvaus has been
> cached and a request is then made for the identical URI but at mskuys, it
> would still get the cached copy.
>
> Thoughts? Am I once again overlooking features in post-2.2 Squids? :-)
>
> Cheers,
>
> -- Bert
>
> Bert Driehuis, MIS -- bert_driehuis@nl.compuware.com -- +31-20-3116119
> Every nonzero finite dimensional inner product space has an
> orthonormal basis. It makes sense, when you don't think about it.
Received on Sun Apr 09 2000 - 21:06:04 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:22 MST