Re: Handling of IMS

From: Henrik Nordstrom <henrik.nordstrom@dont-contact.us>
Date: Wed, 03 Jul 1996 23:58:03 +0200

Here are my answer to the IMS proposal from Thomas Schmidt.

It is not a very good idea to let every IMS request go thought
the whole cache hierarchy. You algorithm is effectively the same
as always sending the request directly to the original site, except
that it goes thought the whole cache hierarchy to make sure that a
older page never is returned on a later request.

We have a number of different sides of the same problem:
1. It should be fast.
2. It should give as recent information as possible

And as usual it is not possible to get both.

Now some HTTP talk...

If-Modified-Since
is a modifier to GET, making it is a conditional
GET of a single object.

Pragma: no-cache
is a modifier that this request can't be served from a cache,
i.e. has to be handled by the source site.

In my opinion the only way to make sure that you get the absolutely
latest version is a combination of these two. Only IMS is allowed
to return a possibly older object than available at the source site.

How Netscape handles this:
IMS is used to check with the proxy/cache if it has a newer page
than in Netscape own cache.
no-cache is used to reload a page. Plain reload also sends IMS,
and shift-reload only sends no-cache.

Back to more general talk...

One main problem with caching is references to other objects. HTTP only
has knowledge about a single object at a time, therefore there is no
(well almost anyway) way to guarantee 100% consistency when using a
cache. It may be that the page is a old version, and one or more objects
is new, or the other way around. Fortunately much of this can be resolved
with a little cooperation from the web authors. (see note below)

The only way to make sure that a object always is the latest, is to always
ask the source site (by disabling the cache, or use Pragma: no-cache).

Note: How to write pages for a cache

The basic rule is that different object should have different URLs. For example
if you have a page that is changed from time to time with different images, then
you should use unique URLs for your images (i.e. the URL should indicate which
version of the page it belongs to). It might be a good idea
to have the old versions of the images about one week (or half the lifetime
of the page) if the old page still have some relevance.

Another rule is that periodically updated objects should have correct expire dates.
For example a weekly newspaper that will be replaced with the current issue
each week. Here should the expiry date be set to when the next issue will
be released.

---
Henrik Nordström
Thomas Schmidt <tcs@morini.in-berlin.de> wrote:
> Some weeks ago I posted a proposal how to handle IMS:
> 
> 1     From requestor to page source through the cache hierarchy each cache
> has to check, if it owns a modified version:
> 1.1   No, our page isn't more recent or we've no such page. Forward query
> to next cache.
> 1.2   Yes, we've a modified page. Ask next cache in hierarchy, if it has a
> more recent page than ours.
> 
> 2.    At last the request arrives the pages owner. It should also look for
> modification. If its page is
> 2.2   not modified against the requests date it should return NOT-MODIFIED.
> 2.1   modified it should return this page.
> 
> 3.    From page source to requestor. If a cache
> 3.1   gets a full page it should forward it.
> 3.2   gets a NOT-MODIFIED it should
> 3.2.1 return this answer if check 1.1 applied
> 3.2.2 return its page if check 1.2 applied.
> 
> At last the requestor gets the most recent version of this page from the
> closest possible site.
> 
> 
> OK ?
> 
>         Thomas
> 
> --
> |Thomas Schmidt    | Email: tcs@morini.in-berlin.de | Phone: +49-30-7829537|
> |Leuthener Str. 4a |[Email: tcs@cs.tu-berlin.de]    | Fax: +49-30-7828103  |
> |D-10829 Berlin    |                                | Data: +49-30-7828103 |
> 
> 
Received on Wed Jul 03 1996 - 14:58:36 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:32:34 MST