Re: Need following information from squid logs? from Andres Kroonmaa on 1997-12-15 (squid-users)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Mon, 15 Dec 1997 11:37:04 +0200 (EETDST)

>
> > Look at squids store log, and produce output on how much data in megabytes
> > squid is releasing from the disk cache over a specified time period
...
> > I would like to somehow do this ...
> > because with this information you can see if when the proxy cache is full and doing
> > LRU replacement how much data it is actually discarding, and therefore judge if
> > you need to add more disk cache for a busy proxy.
>
> Let gurus on the list correct me if I am wrong, but I would not rely much on
> the RELEASE traffic to estimate "optimal" cache size. If you look through the
> logs, objects are often released for reasons _other_ than LRU replacement
> (e.g., updates and "reloads"). And even if you filter out those "exceptions"
> with a smart script, it is unlikely that you will guess how many hits you
> have lost because of those objects being purged from the cache!
>
> Estimating the "best" cache size is very tricky, IMHO. After a certain
> [relatively small] threshold, cache "utilization" does not increase with the
> cache size. That is, you are getting fewer and fewer hits per GB you adding.
>
> Nevertheless, people continue to increase the size of their caches because,
> they say, "it will payoff in a long term". In other words, you are bying disk
> space once, but getting hits from it every day. Thus, to find the optimum
> size you have to estimate the benefits you are getting from a single hit and
> then calculate how much time it will take to pay for added disk capacity.
>
> To estimate how many hits a given cache size generates, you probably need a
> trace-driven program that will simulate LRU replacement and other things for
> a given cache size (unless you know somebody who already maintains such a
> cache in a similar environment). Any better ideas?

Squid's current LRU discards 8 objects of 256 sorted by last reference time,
blindly. Even if all 256 were fetched the same day, 8 of them gets released
when LRU gets on them. So, LRU is not optimal in details, but on a large scale
is is satisfactory. Trouble with this last ref is that during squid restarts
true date of last ref is lost, and upon startup is set to the date of object
fetch. So, to make LRU more efficient, you'd like to keep squid runnning as
long as possible, ideally non-stop.

To estimate needed disk size, you'd want to analyse the average frequency of
object reuse, or perobject hitrate. Some works on this matter has shown that
most hits on cached object happen during 24h after the initial fetch, (about
40% if I recall), another amount of hits is during next week or 2, (20% I
guess), and all other hits are spread over the timeframe of 4 weeks (20%) All
longer-time hits are pretty rare and take neglible amount of overall. Thus,
it is believed that any object should stay in cache for 2 weeks to give its
most bang for buck, better if can live there for 4 weeks. This all is about
static content, like zips, gifs, etc. of course. Thus, you'd watch how old
objects is LRU releasing, and if they are not older than 1 week, you'd like
to think about adding disks. Most simple place to check IMHO is the cachemgr's
Info page, "Storage LRU expiration age", I dunno how accurate it is, but if
it says that your storage expiration is more than 7 days, you are ok, if it
says that its less than 2-4 days, you could increase your hitrate by adding
disks. My opinion.

----------------------------------------------------------------------
  Andres Kroonmaa mail: andre@online.ee
  Network Manager
  Organization: MicroLink Online Tel: 6308 909
  Tallinn, Sakala 19 Pho: +372 6308 909
  Estonia, EE0001 http://www.online.ee Fax: +372 6308 901
----------------------------------------------------------------------
Received on Mon Dec 15 1997 - 01:43:27 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:57 MST