RE: Proxying PHP Pages

From: Nottingham, Mark (Australia) <mark_nottingham@dont-contact.us>
Date: Mon, 14 Dec 1998 09:36:52 +1100

> I've been a little concerned about some of the refresh patterns going
> around on this list recently. The implications of the above are as
> follows:
>
> - No document will have its freshness checked for the first 24 minutes
> in cache (min 1440). For documents that are rapidly modified
> this could
> cause a staleness problem. Apparently this has not been an
> issue since
> it has not caused complaints, so that's good.

AFAIK, Squid will only follow refresh_patterns if the object can be
cached; i.e., has a validator (practically speaking, a Last-Modified). I
could be wrong, but in my experience it does the right thing.

> - Every document in the cache longer than 12 hours will
> generate an IMS
> request (max 43200). What we and others have found is that most
> documents are rarely modified. A recent paper by Douglis, Feldmann,
> Krishnamurthy, and Mogul explores this in depth (sorry I don't have it
> with me so can't report what the mean and median ages were, but if
> memory serves they were weeks to months, not hours to days).
> The impact
> of max is then to cause IMS requests for not-modified documents more
> often than necessary. While these requests are small in
> bandwidth they
> contribute directly to user latency. Given the observed modification
> patterns a larger max would make sense.

Bingo. See below re: Squij work.

> - The Alex protocol is based upon the assumption that a document
> modified recently will be more likely to be modified soon than a
> document modified a long time ago. This is based upon a lot of prior
> work (decades of it in fact - pre web if you can remember
> back that far!
> :-), and seems to behave quite well, in other words it matches human
> behavior patterns. The percent factor says how old you'll let the
> document get before checking to see if it's been modified. 200% says
> let it get to be twice its age before checking - but this
> seems kind of
> long to me. Instead, I think the percent parameter should be smaller
> (20%, 50%), and increase max to prevent needless IMS requests. The
> Squid default for max is, I believe, 3 weeks; but documents won't get
> more than 20% of their age in the cache without refreshing. This
> exhibits geometric scaling in network traffic, which is pretty
> attractive.

Interesting. Do you have any references for this?

> The interaction of refresh pattern (which drives TTL in cache),
> staleness, and network traffic is fairly complicated. One thing I've
> been interested in lately is the relationship of the Squid
> administrator's desire to reduce bandwidth demand and the user and
> content publisher's desire for lower latency and staleness.
> How much of
> an issue has this been for the ISPs and other cache administrators out
> there? Have others tried such "fairly aggressive caching
> practices" and
> run into problems? I'd be interested in your feedback.

I'm thinking about/started writing a paper about these issues for the
next caching workshop, in part to verify my work on Squij
(http://mnot.cbd.net.au/squij/). If you/anyone would like to
collaborate/making suggestions/tell me I'm wrong, it'd be more than
welcome to provide a draft.
Received on Sun Dec 13 1998 - 15:40:53 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:39 MST