Re: Trub cacheing www.cnn.com

From: Joern Clausen <joern@dont-contact.us>
Date: Wed, 13 Nov 1996 12:42:35 +0100 (MET)

-----BEGIN PGP SIGNED MESSAGE-----

>
> I'm finding that my Squid (1.0.15) is often serving out-of-date pages
> from www.cnn.com. The CNN site does not seem to use Last-modified or
> Expires headers.
>
> So I devised the following rules, which don't seem to work!
>
> ttl_pattern www.cnn.com/.*htm 360 20% 360
> ttl_pattern www.cnn.com/$ 360 20% 360
>
> What I am trying to achieve is that http://www.cnn.com/ (the root page)
> and all htm/html files only cached for a maximum of 6 hours but that
> graphics are cached by my default rules as normal.
>
> I realise I could use:
>
> ttl_pattern www.cnn.com 360 20% 360
>
> But then everything on the whole site would only be cached for 6 hours
> and most of the graphics (headers, menus etc) don't change from day to day.
>
> Does anyone else have this problem and have you found a suitable solution?

I had absolutely the same problem with cnn.com. It first appeared some
weeks ago, so I guess they changed their http-server and forgot to
emit sensible expire or even last-modified headers. So I sat down and
tried to figure out the ttl mechanism, and this is what I came up with
(a little bit inspired by an other mail on this list):

- ----8<--snip--------------------------------------------------------------

# we try to improve the caching by modifying the ttls.
# regular documents are cached for three days, or 20% of their
# current age. URLs ending in "/", "index.html" or "welcome.html"
# (and derivated forms) are considered index-pages which change
# more often, so they are checked in shorter intervals.
ttl_pattern ^http:// 4320 20% 43200
ttl_pattern/i \.(gif|jpe?g|xbm|png)$ 7200 50% 43200
ttl_pattern /$ 360 10% 43200
ttl_pattern/i /index\.html? 360 10% 43200
ttl_pattern/i /welcome\.html? 360 10% 43200

- ----8<--snap--------------------------------------------------------------

I'm not sure if this makes sense, but these are all the ttls I have
defined. The effect was immense: The expected result was, that I always
received the recent pages from CNN. The surprising result was, that our
cache finally started filling up. We use a relatively small cache, we
can only afford 700 MBytes of harddisk space. I was running it with the
default ttls, and the observation was, that on fridays, about 120-140
MBytes were used, and during the weekends, the cache cleaned down to
sometimes less than 60 MBytes. With these new ttls, the cache grew in
a few days to now 550 MByte, and the growth seems to decrease slowly
(the new semester just started, so I guess it will keep growing a little
bit more before it finally stabilizes, due to more requests).

I guess, somebody is heavily working on a nice documentation? Maybe
this could include some advices and examples on setting ttls. This
is especially important, because these ttls won't help me much, if
my parent uses different settings. So some guidelines for finding
agreements among neighbors and parents/children would be handy.

- --
     Joern Clausen email: joern@TechFak.Uni-Bielefeld.DE

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: ascii

iQCVAgUBMom0JN0jU/8wKfL5AQGe+QQAwNZSnvRiYx5XJtPkx/az3yYGYItikfhx
73SCXfx4VkIia3EBq645uMHVp6YAaxNwNgCwgLKgepuIVlWeHO0t66kF8kiE5hpn
xM4aImf4m6WZJopc0rHa6gyBRNhTwWMQ/mbiNizLUjpU37HcW8KHcm+CZBMChpoP
1HA0hs4kJ4E=
=LdER
-----END PGP SIGNATURE-----
Received on Wed Nov 13 1996 - 03:43:10 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:33:32 MST