Re: V2.3 & Cache Digest

From: Jens-S. Voeckler <voeckler@dont-contact.us>
Date: Fri, 30 Jun 2000 10:16:49 +0200 (CEST)

On Fri, 30 Jun 2000, Clement wrote:

]I am still using ICP for peering. I wonder how much better it will be
]if I switch to use the cache digest? If you are using it, can you give
]me some idea of using it with 2.3? Your information is appreciated.

Some recently acquired knowledge from Seafood prompts me to suggest
the following procedure - please convert my numbers into your numbers.
And please, gurus, correct me if I am wrong:

- To obtain your own numbers, use a busy weekday (Tu,We,Th).
- I have my client caches to use four of the (new) parent caches.
- Each of my parent caches has got a digest size of about 3 MB.
  Your milage may vary, so check your cache mgr interface.
- Therefore, each client gathers 295 MB over a day (3MB * 4 parents * 24).
  Use appropriate numbers for your own cache(s).
- To add some safety margin, if any of my siblings (not peer) transfers
  less than 300 MB (200 MB) per day, it might be better off with
  ICP. On the other hand, if it transfers more than 300 MB (500 MB, 1GB)
  per day, it is a prime candidate for cache digests. As you can see,
  there is an area of hysteresis between the decisions.
- Insert your own numbers. Smaller digests will become feasible with
  less daily object volume.

Wait, this is just half of the story. Peer caches or clients caches
with a high request rate might still benefit from cache digests:

- The average size of an URL (except for the ? part) is 53.737 Byte.
  I cheched the number this morning with >1mio URLs, and only 7300
  contained a question mark (hmm, one of my clients is misconfigured).
  The part past the question mark which is chopped off by squid might
  be heavy-tailed, and thus increase the average noticably.
- Therefore the average size of an ICP message is about 78 Byte
  (20 Header + 4 Rq Addr + URL size). Do your own calculations,
  if you like, e.g.:

     perl -ne '@x=split(/ +/,$_,8); $n++; $s += length($x[6]); \
        END { printf "%d documents, %.3f MB URL size, %.3f Byte/URL\n", \
        $n, $s/1048576.0, $s/$n; }' access.log.0
     # I never claimed efficiency with that

- On the other hand, for comparison reasons, the header of the HTTP
  message used for digest transfers is neglegible compared to the
  message (=digest) size and xfer frequency.
- Hence, a client which would send *more* than 3.9 million ICP requests
  per day (295 MB / 78 Byte) will profit from cache digests (in my
  case). A client with less than 3 mio requests per day might be
  better of with ICP, unless the first constraint from above applies.
  Again, there is an area of hysteresis between the decisions.
- Insert your own numbers - smaller digests will become feasible with
  less requests per day.

This is still not the full story. If your peer is behind a high latency
link (satellite), CDs might still be beneficial - ok, I am walking on thin
ice with this last remark. My suggestions look at requests and at volume,
but they didn't look at latency.

Use both constraints in conjunction, and add a good pinch of salt to
make up for inaccuracies, e.g. more than 500 MB per day or more than
3 mio requests per day.

The picture will turn in favour of cache digests, once digest diffs are
implemented!!!

My calculations apply for CD-only (no-query option to cache_peer) and
ICP-only (no-digest option to cache_peer).

Le deagh dhùrachd,
Dipl.-Ing. Jens-S. Vöckler (voeckler@rvs.uni-hannover.de)
Institute for Computer Networks and Distributed Systems
University of Hanover, Germany; +49 511 762 4726
Received on Fri Jun 30 2000 - 02:19:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:54:14 MST