Re: Cache Digests Diffs

From: Kevin Littlejohn <darius@dont-contact.us>
Date: Fri, 16 Jul 1999 10:15:46 +1000

>>> Alex Rousskov wrote
> On Fri, 16 Jul 1999, Kevin Littlejohn wrote:
>
> > Slightly different approach: Have you looked at http://rsync.samba.org/ ?
> > This is a package that generates a diff without a second copy - it uses
> > rolling checksums and other niftyness.
>
> As far as I can tell, you have misunderstood the rsync algorithm. The
> algorithm does require two copies, of course. It does not require the
> _transfer_ of the second copy over a [slow] link to generate a diff.
> In our case, both copies are local (reside in the same proxy process).

Given that cache A generated a digest (and the rsync signatures for the
digest), it then transfers that first digest to it's neighbours, and holds
only the signatures. When it comes to sending a diff, it uses the signatures,
and it's own digest, to generate the diffs.

Thus, no need for a second local copy to be held on the cache.

> The fundamental difference with rsync environment is that we have a
> single source of modification (the digest generating proxy). Rsync has
> to deal with modifications on both ends.

Well, no, not really - it deals with differences _between_ the files. Doesn't
matter which end made those changes - and if you know the other end won't
change the cache-digest, you can simply hold the signatures over from the
last diff-generation point.

> > It
> > would seem that if diffs can be generated without holding an entire second
> > copy of the cache-digest in memory, it might be a big win...
>
> I do not consider an additional 1-2MB of RAM a big deal for this
> particular purpose. The are alternative techniques that avoid keeping
> two digests in memory (Pei Cao's Summary Cache is one of them). However,
> they have similar, if not larger, memory overhead.

Fair enough - I seemed to recall last time I looked at cache digests that
they were fairly large beasties on decent-sized caches. If not, then my
mistake, and obviously not worth the effort. Hate to be solving the
wrong problem :(

> Also, the diff format we have proposed does not require holding the
> entire diff file in memory on the receiving end to "patch" the stale
> digest. This may be an nice feature because (as opposed to generating a
> diff) a receiving proxy may have many digests that it has to update
> simultaneously.

*nod* The streaming nature of rsync was one of the other things that seemed
to make it a nice option - again, no need to hold an entire diff in memory.
Your approach and the rsync stuff does share a bit in that respect...

Ah well, so much for that idea ;)

KevinL

--------------- qnevhf@obsu.arg.nh ---------------
Kevin Littlejohn,
Technical Architect, Connect.com.au
Don't let the Govt censor our access to the 'net -
http://www.efa.org.au/Campaigns/stop.html
Received on Tue Jul 29 2003 - 13:15:59 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:16 MST