Re: memory-mapped files in Squid from Alex Rousskov on 1999-01-28 (squid-dev)

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Thu, 28 Jan 1999 10:12:54 -0700 (MST)

On Thu, 28 Jan 1999, Andres Kroonmaa wrote:

> Of 100% fresh objects that we fetch, some part will survive a cycle.

Correct me if I am wrong, but the recommended LRU expiration age is 3 - 7
days. The "cycle" is not something short. If a cache receives 3GB of
cachable traffic per day and has 30GB disk cache, it would take 10 days to
complete a cycle (just an example). If an object was not referenced twice
in 3-7 days, it is useless with a very high probability.

Thus, FIFO may work for most of reasonably configured caches, especially
if augmented with a small buffer for popular objects (say, those
referenced at least twice).

> Some part is expired long before. But just this part that survives is
> what increases squid' hit/miss ratio, and we want to keep it. The
> other part is a waste, and we'd want to replace it first.

For a lot (most?) of reasonable configurations, it does not matter much
what you are replacing as long as that stuff is several days old. The
latter is the reason why all "traditional" caching policies perform the
same on any reasonable size cache. Essentially, you are selecting between
old garbage and very old garbage.

> 1) we want to have our disks as full as possible. 90% at least.
> 2) we want that these disks are full of _useful_ stuff, not waste.

Sure. However, none of the traditional policies caches useful data! All
traditional policies that I know of fill cache with garbage, on average.
Provided your cache is of a reasonable size, of course.

The real challenge is avoid caching waste in the first place. And not to
reduce disk space requirements, but to decrease disk _bandwidth_
contention and hence speedup hits.

> So, we have to look at allocation algorithms that integrates both
> 1) intelligent replacement policy
> 2) and does not suffer speed loss at high disk utilisation.

IMO, all reasonable _replacement_ policies will perform the same no matter
how intelligent they are. I have not seen a single study that shows an
alternative super intelligent policy outperforming LRU-TH by a significant
margin on a reasonable size cache.

We may need an intelligent _caching_ (or cache admission) policy, but I am
afraid we are not ready for that yet.

> If we just overwrite useful stuff on every pass, whats left behind?

We overwrite garbage on every pass. What is left is garbage. On average,
only a small portion of your cache actually gives you hits. Preserving
that portion is probably feasible.

> Perhaps some 40% of waste. Having effective use of 60% of available
> disk space seems very expensive. Performance win on disks can easily
> transform into performance loss on network.

Perhaps some 80% is waste? 90%? 95%? That is why I said that a few
[simple] experiments/calculations are needed to (a) estimate disk
"utilization" (portion of useful objects) and (b) average object age with
FIFO policy.

$0.02,

Alex.
Received on Tue Jul 29 2003 - 13:15:56 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:02 MST