Re: Rv: Why not BerkeleyDB based object store?

From: Adrian Chadd <adrian_at_squid-cache.org>
Date: Thu, 27 Nov 2008 02:00:18 -0500

I thought about it a while ago but i'm just out of time to be honest.
Writing objects to disk only if they're popular or you need the RAM to
handle concurrent accesses for large objects for some reason would
probably way way improve disk performance as the amount of writing
would drop drastically.

Sponsorship for investigating and developing this is gladly accepted :)

Adrian

2008/11/26 Mark Nottingham <mnot_at_yahoo-inc.com>:
> Just a tangental thought; has there been any investigation into reducing the
> amount of write traffic with the existing stores?
>
> E.g., establishing a floor for reference count; if it doesn't have n refs,
> don't write to disk? This will impact hit rate, of course, but may mitigate
> in situations where disk caching is desirable, but writing is the
> bottleneck...
>
>
> On 26/11/2008, at 9:14 AM, Kinkie wrote:
>
>> On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti
>> <pablorosatti_at_yahoo.com.ar> wrote:
>>>
>>> Amazon uses BerkeleyDB for several critical parts of its website. The
>>> Chicago Mercatile Exchange uses BerkeleyDB for backup and recovery of its
>>> trading database. And Google uses BerkeleyDB to process Gmail and Google
>>> user accounts. Are you sure BerkeleyDB is not a good idea to replace the
>>> Squid filesystems even COSS?
>>
>> Squid3 uses a modular storage backend system, so you're more than
>> welcome to try to code it up and see how it compares.
>> Generally speaking, the needs of a data cache such as squid are very
>> different from those of a general-purpose backend storage.
>> Among the other key differences:
>> - the data in the cache has little or no value.
>> it's important to know whether a file was corrupted, but it can
>> always be thrown away and fetched from the origin server at a
>> relatively low cost
>> - workload is mostly writes
>> a well-tuned forward proxy will have a hit-rate of roughly 30%,
>> which means 3 writes for every read on average
>> - data is stored in incremental chunks
>>
>> Given these characteristics, a long list of mechanisms database-like
>> systems have such as journaling, transactions etc. are a waste of
>> resources.
>> COSS is explicitly designed to handle a workload of this kind. I would
>> not trust any valuable data to it, but it's about as fast as it gets
>> for a cache.
>>
>> IMHO BDB might be much more useful as a metadata storage engine, as
>> those have a very different access pattern than a general-purpose
>> cache store.
>> But if I had any time to devote to this, my priority would be in
>> bringing 3.HEAD COSS up to speed with the work Adrian has done in 2.
>>
>> --
>> /kinkie
>
> --
> Mark Nottingham mnot_at_yahoo-inc.com
>
>
>
Received on Thu Nov 27 2008 - 07:00:26 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 27 2008 - 12:00:04 MST