Re: A More Aggressive Approach to Caching

From: Bruce R. Lewis <brlewis@dont-contact.us>
Date: Tue, 17 Dec 1996 22:32:37 GMT

I think most of us realize that the following is not true:

>The real problem is not the web sites themselves, but the fact that
>end-users enjoy getting personalized web pages.

The problem pages tend to be ones that are totally non-personalized, but
provide false Last-Modified/Expires headers or none at all.

Mike Schwartz was right on target saying

>Content providers disable caching because their business models require
>access logs, and at present disabling caching is the only guaranteed means.

Enabling caching lets lots of users load your pages faster, thus making
them like your site better so they tell their friends and your site gets
lots of hits. But WWW business models presently aren't interested in
increasing their hits. They're only interested in increasing their
easy-to-measure hits.

Not all business models are this stupid. Take the TV business model as
an example. Suppose I wanted to sell CNN on a system for TV analagous
to the cache-disabling system they use on their web site. I could walk
into the office of a CNN exec and say, "I have this great system that
will enable you to know exactly how many viewers you have on any given
day. It has the slightly annoying side effect that viewers have to wait
two minutes between video clips, and often can't get any signal at all.
But you'll have great statistics to show the board!" I would be laughed
out of the office in no time.

Ideally, businesses would find other ways to measure the value of their
web site than access logs. But most likely they won't, so that the
long-term solution will have to be the hit-metering stuff Mike Schwartz
mentioned.

Duane's cache-hit-rate-based idea would definitely help cache server
load, but would also have the side effect of "punishing" servers that
aren't doing anything cache-unfriendly. But depending on your cache
load, maybe that's what you want. It has the plus that it will work no
matter what cache-unfriendly technique is used, including changing the
content of the pages, e.g. with the time of day.

I want to throw out another idea for that would not punish
cache-friendly servers. Call it the "fool me once" algorithm. When a
cache retrieves an object with a "suspect" last-modified or expires
header, it sets a bit to remember to treat it specially. The next time
it retrieves the object, it compares the "new" with the "old". If there
hasn't been a change, the old last-modified is used to determine a ttl.
Then the cache will refresh based on how often objects actually change
rather than when the fake last-modified header claims they change.

This may not be desirable either for the increased work during refresh
or just because sites would start doing other things, e.g. randomly
changing the actual content, to thwart caching. But I thought I'd throw
the idea out just for fun.

-- 
b(l)char *l;{write(1,"\r",1);write(1,l,strlen(l));sleep(1);}main(){b("Bruce ");
b("Lewis "); b("Analyst "); b("Programmer "); b("MIT Information Systems ");
b("<URL:http://web.mit.edu/brlewis/www/>\n");exit(0);}
Received on Tue Dec 17 1996 - 14:44:04 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:55 MST