Re: Cache Digests Diffs

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Sun, 8 Aug 1999 18:23:59 -0600

On Mon, 9 Aug 1999, Henrik Nordstrom wrote:

> I have been looking into this, but I have some trouble finding digests
> matching your statistics. I get a much higher rate of single bit changes
> (more in line with what I expected). Maybe the digests I have collected
> are a-typical or your bit count is busted or something odd is going on
> here which I can't explain..

Henrik,

        My stats are based on consecutive digests. The differences grow
fast as you start "skipping" digests. Anyway, I will try to apply your
program. However, I may not have enough time to do it promptly. The
"contest" rules were explicitly designed to limit any influence from our
statistics or approach. So you can just apply the simple algorithm (100
line program posted on the Web) to your dataset and see how it competes
with your smart algorithm. Just please use consecutive digests.

Again, I will try to run your program through my digest collection as soon
as I can.

Thanks for the interest!

Alex.

> Extreme example taken from bo2.us.ircache.net with several hours in
> between the digests (31% of the bytes changed):
>
> -- Basic information --
> Size: 861667 bytes / 6893336 bits
> Chaged bytes: 273016 (31%)
> Changed bits: 320481 (4%)
> -- Changed bits / byte --
> 0: 588651 68%/total
> 1: 229925 84%/diff 26%/total
> 2: 38965 14%/diff 4%/total
> 3: 3894 1%/diff 0%/total
> 4: 216 0%/diff 0%/total
> 5: 16 0%/diff 0%/total
> 6: 0 0%/diff 0%/total
> 7: 0 0%/diff 0%/total
> 8: 0 0%/diff 0%/total
>
> Attached is a program to print fairly detailed statistics about digest
> changes. Can you please run this on what you consider a couple of
> typical digest updates and send me the result. The program assumes MIME
> input (HTTP headers + body), if you have raw digests without HTTP
> headers then remove the call to skipHttpHeader..
>
> The goal you have set up is certainly within reach, but the algorithm
> may have to differ depending on the change rate.. The main approach I am
> leaning towards are huffman encoded changed bit offset with some
> variations to handle higher bit change rates.
>
> /Henrik
>
>
> Alex Rousskov wrote:
> >
> > The best I can do is to give you the stats from our caches. Here is the
> > percentage of changed _bytes_ between two consecutive digests from ten
> > caches: (4.95%, 3.99, 2.83, 3.08, 5.49, 4.18, 3.41, 1.65, 7.24, 4.45%).
> >
> > I do not have bit stats, but here is a "typical" distribution of the number
> > of changed bits in a _changed_ byte (1 means 1 bit has changed, 7 means 7
> > bit have changed):
> >
> > #bits count percent
> > 1 12605 ( 29.40%)
> > 2 10201 ( 23.79%)
> > 3 11856 ( 27.65%)
> > 4 5903 ( 13.77%)
> > 5 1916 ( 4.47%)
> > 6 352 ( 0.82%)
> > 7 43 ( 0.10%)
>
Received on Tue Jul 29 2003 - 13:15:59 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:16 MST