Re: Cache Digest rejection %

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Tue, 20 Oct 1998 11:06:58 -0600 (MDT)

On Tue, 20 Oct 1998, Niall Doherty wrote:

> > > conf_max_age_stale 1846649 95.33
> >
> > Here is your answer! 95% of your entries expire in less than an hour because
> > of your max age settings in squid.conf. Let's see why:
>
> > > # Everything else (gopher)
> > > refresh_pattern . 720 100% 720
> >
> > And here is why!
>
> Boy - that was a tricky one :-)
>
> > Cache Digests do not have a URL (only MD5 is stored in
> > memory) when digests are build. So default "." pattern is used. The line
> > above works for gopher _and_ digests... :-/
>
> Ok - teaching time again... so you're saying that the Cache Digest is built
> and *THEN* the entries are checked for "freshness" ? That can't work coz
> you don't (can't ?) delete objects from the bloom filter. So at what step
> do you check for freshness ? If you do it entry by entry can you not pass
> the URL to the refreshCheck function ? I just checked the source quickly;
> there's a function refreshCheckDigest() - you call this for each StoreEntry
> in the digest - and at that stage the StoreEntry doesn't have a URL anymore?

Exactly. We check freshness _while_ we are building the digest. The digest is
build based on in-memory store entries. Most of those entries do not have
URLs (only memory-cached and such do). Thus, we have no URL to search refresh
patterns and end up using the default "." pattern.

> This is kind of awkward ! I guess doing the refreshCheck for Cache Digest
> entries is just to get a "rough" guess at whether it's worth storing an
> object or not.

Correct.

> If you can't use the refresh rules in the file, it might
> be just better to store every entry w/out checking at all ?! It isn't
> really all that much extra space...

It's not a space issue. If you include everything, then there will be a lot
of false-hits. Your peers will request digested objects using HTTP. At the
time of the request, actual refresh patterns will be used and stale entries
will be marked for revalidation. However, revalidation on behalf of a peer is
prohibited unless you are a parent or have miss_access enabled. Thus, your
peers will receive error messages instead of revalidated objects (false-hit).
 
> > Perhaps we should have a separate refresh_pattern pattern for digests..
>
> Doesn't make sense though ? You can't arbitrarily decide what refresh
> pattern to use when you've no idea what the URL is (HTTP, FTP / GIF, HTML
> etc.)

You cannot decide precisely, but you can give a rough estimate. It also gives
you an option to specify a pattern which would mean "ignore refresh_pattern"
to include as many entries as possible (only internal object flags will be
considered).
 
> > Temporary cure: change "." pattern or hack refreshCheck to use pattern other
> > than "." when "uri" is NULL...
>
> Just changed the refresh pattern - easy to check quickly:
>
> refresh_pattern . 60 50% 43200 override-lastmod
>
> Looks like that was it:
>
> store digest: size: 444316 bytes
> entries: count: 503393 capacity: 710905 util: 71%
> deletion attempts: 0
> bits: per entry: 5 on: 1537667 capacity: 3554528 util: 43%
> bit-seq: count: 1745298 avg.len: 2.04
> added: 503393 rejected: 189825 ( 27.38 %) del-ed: 0
> collisions: on add: 0.85 % on rej: 0.87 %
>
> 27.38% rejected. If you don't bother w/ refresh checks then it's not
> that much extra space. Well, it's OK for RAM, but it does mean extra
> bandwidth when transferring it to neighbours...

How many extra bytes? 10K?

Alex.
Received on Tue Oct 20 1998 - 11:11:15 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:36 MST