Re: Pre-fetching some URLs

From: Dancer <>
Date: Tue, 22 Oct 1996 12:39:14 +1000

Adrian Havill wrote:
> Dancer wrote:
> > > By the way, what is the typical hit rate that you might
> > > be getting ? I am getting around 40% rate hit and for
> > > some of the people that I speak to, they said it is
> > > pretty high.
> >
> > We're running a 900MB cache here. We peak at around a 25% hit-rate. We
> We use a two squid setup (v1.0.18) running as neighbors. Our Tokyo squid
> reported the following hit percentages this morning: HTTP: 27%, FTP:
> 15%, Gopher: 7%. Our Osaka neighbor squid reported HTTP 31% FTP 8%
> Gopher 0%.
> We usually average just under 30% for HTTP.
> 40% is wierdly high, but not impossible. I find that we can increase
> our HTTP hit rate by about 5% if we pre-fetch in every URL that's in the
> current issue of "Yahoo Online" (the magazine, Japanese edition).
> Typing those URLs in is the chore we give to the data-entry guy in the
> office we hate. >:-> It'd be nice if Yahoo published a disk every month
> with nothing but the URLs in the magazine.

Our central hub cache _does_ identify the subnet caches as neighbours,
but the subnets are connected to the hub by 28K8 modems, and thus full
sharing isn't really in order.

I've thought about pre-fetching, but it seems a false economy, mostly.
The first person to access the URL will bring it into the cache. If
_nobody_ accesses it, then caching it is a waste of space.

Sure, it speeds up the first access to an entity, but that's the only
saving you get. Subsequent accesses hit the copy that the first one
cached, and you may as well have just let the clients do it by making
requests through the proxy. You've spent the time pre-fetching, for ...
what? Saving a few seconds per document on the first access of each

Doesn't add up, for me.

OTOH, prefetching documents that are routinely requested first during
peak load periods (the comics at for
example) during the off-peak period makes sense. If you can automate it.
Like I said, doing it by hand, or even semi-automated just isn't
cost-effective on time.

Received on Mon Oct 21 1996 - 20:34:49 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:19 MST