Re: Cache Digests

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Tue, 6 Oct 1998 13:24:12 -0600 (MDT)

On Tue, 6 Oct 1998, Niall Doherty wrote:

> > 1) Entries: Suppose your cache has 1000 objects (entries: count). When
> > building a local (store) digest, Squid guessed that allocating space for 500
> > cache objects would be enough (entries: capacity). "Entries: utilization"
> > shows 200%. That's OK if "bits: utilization" is normal, see below.
>
> When does it make a decision on how much space to allocate ?

Every time the local (store) digest is rebuilt. Every hour by hard
coded default.

> And where does it allocate the space ? (disk/ram ?).

We use RAM to build a digest. The ready-to-use digest is then swapped to
disk. As far as I can see, the current code does not free the in-memory copy.
There are some subtle reasons for that.

> Would it not be able to guess very accurately because it *knows*
> how many items it has ?

It does not know. To avoid false hits Squid does not digest stale, private,
etc. entries. The number of those entries is unknown a priory.
See storeDigestCalcCap().

> Even if it is changing, it doesn't take very long to build the digest
> so its final value wouldn't have changed much ?

True, but the final value is still unknown when we start building.

> Does it change the size of the digest each time it is being created ?

No. If the expected size change is small, it reuses the old digest.
The hard coded default "small change" is 10%.
See storeDigestResize().

> It creates it every hour doesn't it ? or does (will?) it work out
> how often would be best to create it... - if it assigns an expiry
> time this expiry time has to be the same as the period above ?

Expiration time is the time for the new rebuild. Thus, peers know when to
request a new digest. The new rebuild happens every N hours. Currently N is
set to 1.

> My cache (that you saw the stats for) is only ~10% full at the moment,
> so I'm wondering when did it make the guess and why did it get it wrong
> given that it's not going to change much over the couple of minutes
> it takes to build the digest ?!

Your Squid guessed the size right! Bit-utilization is 49%, very close to
optimal 50%.
 
> (I just spotted that util = count*100/capacity - I'm slowly getting
> there :-)

Right.

> > 2) Bits: For each entry in "entries: capacity" Squid allocates bits_per_entry
>
> Will bits_per_entry be configurable ? or is there any need for it -
> have you found if 5 is a "good enough" value for all circumstances ?

We made everything to make it configurable. That is, the exchange protocols,
hashing functions, etc. do not use hard coded values. What's left is to write
a squid.conf entry for this (and other) digest parameters.

> > Ideally, the bits utilization should be close to 50%.
> That's coz the algorithm is "supposed" to be optimized to set 50% of
> the bits on/off, right ? so that this minimizes the chances of collisions
> in the table - and hence, in this case, more than 1 URL having the same
> entry ?

Close, but not exactly. 1% would be ideal for low false hit ratio
(underutilized digest). 99% would be ideal for size (over-utilized digest). 50%
is the "optimal" tradeoff value.

> What's a very long average sequence length, example ?

I do not know. My guess is that anything beyond 3 would be suspicious, beyond
10 very suspicious. I think the ideal value is "1", but I might be
calculating/reporting it wrong (Squid might be showing "2" when in fact it is
"1.5").

> > > entries: count: 281373 capacity: 88517 util: 318%
> still wondering why this is at 318%...

Big difference is due to the cache filling up. Squid sees to values:
"maximum cache size" and "current cache size". In most cases it will use
"current" size. When cache is filling up, "current size" is not a good
measure. Note that the stats you see via cache manager report _current_
values for entry count, but "old" capacity values (the ones that were
actually used to build a digest).

Anyway, your digest has a perfect size because of ~50% bit-utilization.
> > > bits: per entry: 5 on: 216176 capacity: 442592 util: 49%

Look at storeDigestCalcCap(). I think the code is self explanatory. Just note
that memInUse() returns the total number of allocated objects, not their
total size.

> Now - would you be able to comment on the following ? These
> are cache digests from peer servers in SE, DK and DE (all of them
> are filling up at the moment):
>
> I assume it's pure fluke (coincidence) that the following two
> caches have identical values for the utilisations !

Maybe not. If caches started filling up approximately at the same time and
have constant data rates, percentages would be close.
 
> X1 digest: size: 11665 bytes
> entries: count: 6301 capacity: 18663 util: 34%
> deletion attempts: 0
> bits: per entry: 5 on: 22042 capacity: 93320 util: 24%
> bit-seq: count: 33752 avg.len: 2.76
>
> X2 digest: size: 88574 bytes
> entries: count: 47826 capacity: 141718 util: 34%
> deletion attempts: 0
> bits: per entry: 5 on: 167581 capacity: 708592 util: 24%
> bit-seq: count: 256269 avg.len: 2.77
>
>
> What would cause the following values to be so low ?

There is a lower limit on the digest size. I guess it kicked in.
Also, the "wrong" values of average object size may have affected these.
Again, see storeDigestCalcCap().

> X3 digest: size: 787692 bytes
> entries: count: 70233 capacity: 1260307 util: 6%
> deletion attempts: 0
> bits: per entry: 5 on: 274671 capacity: 6301536 util: 4%
> bit-seq: count: 525332 avg.len: 12.00

Alex.
Received on Tue Oct 06 1998 - 12:27:45 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:21 MST