Re: Rv: Why not BerkeleyDB based object store?

From: Mark Nottingham <mnot_at_yahoo-inc.com>
Date: Thu, 27 Nov 2008 10:55:21 +1100

Just a tangental thought; has there been any investigation into
reducing the amount of write traffic with the existing stores?

E.g., establishing a floor for reference count; if it doesn't have n
refs, don't write to disk? This will impact hit rate, of course, but
may mitigate in situations where disk caching is desirable, but
writing is the bottleneck...

On 26/11/2008, at 9:14 AM, Kinkie wrote:

> On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti
> <pablorosatti_at_yahoo.com.ar> wrote:
>> Amazon uses BerkeleyDB for several critical parts of its website.
>> The Chicago Mercatile Exchange uses BerkeleyDB for backup and
>> recovery of its trading database. And Google uses BerkeleyDB to
>> process Gmail and Google user accounts. Are you sure BerkeleyDB is
>> not a good idea to replace the Squid filesystems even COSS?
>
> Squid3 uses a modular storage backend system, so you're more than
> welcome to try to code it up and see how it compares.
> Generally speaking, the needs of a data cache such as squid are very
> different from those of a general-purpose backend storage.
> Among the other key differences:
> - the data in the cache has little or no value.
> it's important to know whether a file was corrupted, but it can
> always be thrown away and fetched from the origin server at a
> relatively low cost
> - workload is mostly writes
> a well-tuned forward proxy will have a hit-rate of roughly 30%,
> which means 3 writes for every read on average
> - data is stored in incremental chunks
>
> Given these characteristics, a long list of mechanisms database-like
> systems have such as journaling, transactions etc. are a waste of
> resources.
> COSS is explicitly designed to handle a workload of this kind. I would
> not trust any valuable data to it, but it's about as fast as it gets
> for a cache.
>
> IMHO BDB might be much more useful as a metadata storage engine, as
> those have a very different access pattern than a general-purpose
> cache store.
> But if I had any time to devote to this, my priority would be in
> bringing 3.HEAD COSS up to speed with the work Adrian has done in 2.
>
> --
> /kinkie

--
Mark Nottingham       mnot_at_yahoo-inc.com
Received on Wed Nov 26 2008 - 23:55:48 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 27 2008 - 12:00:04 MST