Hit Rates and TTL Settings

From: David J N Begley <david@dont-contact.us>
Date: Sat, 26 Oct 1996 15:17:32 +1000 (EST)

Wow! Without even trying (or even intending to do so), I appear to have
started a little "mine's bigger than yours" comparison here (of cache sizes
and hit rates, of course!). :-)

Well, to round things out...

John Heaton wrote:
>Considering M*&^%*^&soft's habit of setting the Expired: field down to 0
>seconds, I'm surprised that any of their stuff sits in a cache long enough
>to be a problem. :-)

Well, for October 21, 1996, our stats for "www.microsoft.com" were (no more
comparisons, please!):

                            UDP COUNTS TCP COUNTS TCP BYTES
Server counts %all %hit counts %all %hit Mbytes %all %hit
----------------------- ----------------- ----------------- ------------------
http://com.microsoft.ww 2645 1% 12% 890 1% 36% 11.40 1% 40%

I think it's possible to cache Microsoft's crap! ;-) (See below.)

Andrew Brennan wrote:
>Cachemgr.cgi's utilization report has 0.32 for HTTP objects. Is this
>32% or 0.32% ?? And I suppose people will ask why I even try if it's
>0.32% ...

I read it as "32%" (since "per centum" is really "per hundred", and the two
decimal places used measure to the hundreths ... so sue me for trying to get
mileage out of what I was taught at school!).

I don't bother paying attention to cachemgr.cgi's figures - it's probably
because they're measured over a different (usually longer) time period to
the stats you're after (I do daily reports - which produces numbers like the
Microsoft stuff above), and don't indicate that as the contents of the cache
changes, so will the hit rates.

FWIW, I also pay more attention to the traffic ("TCP Bytes") figures than
the raw request hit rates, since it's the former for which we pay, not the
latter. :-)

Martin Gleeson wrote:
>Ours is a little bigger at 15Gb :-)

Okay Marty, so I'm just quoting figures for the temporary central cache
(doesn't include distributed internal peers, nor faculty-based child
proxies). :-)

Martin Gleeson wrote:
>This certainly helped us a lot as well. We've set netscape.com to 12 hours
>ttl. How long do you give them?

See below.

Tai Jin wrote:
>Those are good numbers. I'd also like to know the size of your user
>population, how long you set the TTLs (what is the distribution of
>TTLs for cached objects), and whether users are reloading more often
>as a result.

User population? Good question. Well, the organisation as a whole has over
12,000 students and over 1,000 staff (not incl. casuals, &c.). There are
five peer proxies to this one, and two child proxies. It's known that many
decide not to use any of the proxies (they won't have much choice
soon). Since usage varies from day-to-day it's hard to get a
"representative sample" (especially since all our logs are compressed away
at the moment) - but if I look at the number of unique IP addresses
accessing just this proxy yesterday (Friday), the number is 323 (this
includes dial-in lines and lab machines, which more than one person would be
using during the course of a single day, plus the child proxies which others
may be using).

The timing values I'm currently using are:

gopher 4 8640
http 4 8640
ftp 4 8640
ttl_pattern . 8640 20% 8640
ttl_pattern/i netscape\.com 10080
ttl_pattern/i mcom\.com 10080
ttl_pattern/i excite\.com 10080
ttl_pattern/i yahoo\.com 10080
ttl_pattern/i lycos\.com 10080
ttl_pattern/i infoseek\.com 10080
ttl_pattern/i webcrawler\.com 10080
ttl_pattern/i altavista 10080
ttl_pattern/i geocities\.com 10080
ttl_pattern/i mckinley\.com 10080
ttl_pattern/i southernaccess\.com 10080
ttl_pattern/i healthychoice\.com 10080
ttl_pattern/i theglobe\.com 10080
ttl_pattern/i sportszone\.com 10080
ttl_pattern/i stargalaxy\.com 10080
ttl_pattern/i microsoft\.com 10080
ttl_pattern/i pathfinder\.com 10080
ttl_pattern/i sunsite 10080

I also had the watermarks set much higher when I was initially filling the
cache (they're back to the stated defaults now). I know the settings above
may seem ridiculously high (and one day I'll get back to tuning them
further), but as stated they've helped fill the cache to capacity and
provide the hit rates previously noted.

To be honest, I really don't know how many people are hitting "reload"; I
wouldn't guess many, based on the hit rates we're seeing. We plan to
increase the cache size (and further tune the parameters above) as more
caches appear locally and as time permits.

Yes, there is the possibility of someone seeing a "stale" page; however,
this doesn't appear to be often. Most people seem to get what they want
within at most a 24-hour turn-around. When uncacheable things are added
(like CGI output), that all factors in as well.


(sorry for the length)
Received on Fri Oct 25 1996 - 22:19:05 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:22 MST