Re: Large Rock Store

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 18 Oct 2012 12:06:39 +1300

On 18.10.2012 03:34, Alex Rousskov wrote:
> On 10/16/2012 04:45 PM, Amos Jeffries wrote:
>> On 17.10.2012 11:02, Alex Rousskov wrote:
>>> Hello,
>>>
>>> We have started working on caching "large" (i.e., multi-slot)
>>> responses in Rock Store. Initial design overview is available at
>>> the
>>> following wiki page, along with several design alternatives we have
>>> considered. Feedback is welcomed.
>>>
>>> http://wiki.squid-cache.org/Features/LargeRockStore
>>>
>>>
>>> As a part of Large Rock work, I will also try to fix at least one
>>> of
>>> Store API quality problems that complicate any Store-related
>>> improvements:
>>>
>>> 1) Store class is a parent of too many, barely related classes:
>>> StoreController, StoreHashIndex, SwapDir, MemStore (and TestStore).
>>> This
>>> "top heavy" design forces us to add pure virtual methods to Store
>>> and
>>> then supply dummy implementations in many Store kids. And, more
>>> importantly, it makes it difficult to understand which part of the
>>> storage API each Store kid is responsible for, leading to
>>> boundaries
>>> violation and other problems.
>>>
>>> 2) There is no class dedicated to a non-shared memory cache.
>>> StoreController currently implements most of non-shared memory
>>> cache
>>> logic while MemStore implements shared memory cache.
>>> StoreController
>>> non-shared memory sharing code should be moved to a dedicated class
>>> instead.
>>>
>>> 3) StoreHashIndex should probably become responsible for all disk
>>> caches
>>> (SwapDirs) as a whole, while StoreController will coordinate disk
>>> and
>>> memory caching (using StoreHashIndex and MemStore objects).
>>> Currently,
>>> some disk-related manipulations reside in StoreHashIndex and some
>>> in
>>> StoreController.
>>>
>>> 4) Get rid of the global store_table. Make it local to non-shared
>>> caching code. As we have discussed previously, the existence of
>>> this
>>> global makes shared caching life very difficult because we cannot
>>> share
>>> such a complex table and yet some older code expects to find all
>>> entries
>>> there. It also leads to problems with entry locking where the code
>>> assumes that an idle entry will remain in the global table at least
>>> for
>>> a while (I believe there are a few bugzilla reports about
>>> associated
>>> core dumps). I doubt I can solve all the StoreEntry locking
>>> problems (it
>>> may require significant client side rewrite), but removing
>>> store_table
>>> global is a step in the right direction.
>>>
>>>
>>> If you have suggestions on priorities or want to add other
>>> large-scale
>>> Store annoyances to the above list, please let me know.
>
>
>
>> Thank you, architecture clarification for store would be very
>> welcome.
>>
>> IMHO;
>> - "Store" should be the namespace - or the name of a base class for
>> shared API accessors to a *single* store regardless of type.
>
> Good idea! A common namespace (and source directory?) for core
> "object
> store" classes would be a nice addition. If we need a single store
> interface class, it should probably be named "Store". The namespace
> would then be "Cache" (too general?), "Swap" (too disk-specific), or
> "Storage" (too long?).

Or reverse it, "Store" for the namespace" "ObjectCache" for the
interface class.

I agree "Cache" is too general, but "ObjectCache" is tighter and still
general enough to cover all the store cache types.

>
>> - Low-level virtual methods should be in another class specific to
>> the
>> type of store (disk, memory, shared, ...). Which the particular
>> store
>> controller inherit from alongside "Store" interface class.
>
> Yes, of course, except I am not sure a common parent for all classes
> is
> actually needed. This will become clear during polishing.
>

Ack. When and IF needed. Sort of like *Data in the libacl design. I'm
thinking one for disk I/O management, one for memory read/write
management, one for DB slot management, one for network I/O (cloud
storage? yuck), etc.. by generic types. If we find it useful only to
have one instance class for a type there is no need to separate other
than design symmetry.

Implementing this class implements the "how" part of storage I/O
management - not the "what" or "where".

>
>> - Index hash object/s should be independent of the store backend
>> and
>> the class re-used by all (separate instances though).
>
> Actually, indexing objects is store's prerogative and its details are
> invisible to callers. Today, all indexing classes are required by
> some
> stores and cannot be reused by other stores, and I do not expect that
> to
> change. There is one serious challenge here. I will discuss it in a
> separate email.

I was thinking for a shared API on URL-lookup, MetaVaryObject
interpretation etc. Not involving the deeper layer particulars of slots
vs files vs other.
One of these per cache area. ie the main Store API accessor method /
lookup Job can schedule parallel calls to each instance of these
effectively saying "lookup this URL" - and manage sending the best found
response to the client-side.

>
>> - store controller - managing the set of available stores for
>> startup/reload/clean/dirty/up/down/search/object-move handling?
>
> Yes, it already more-or-less does that. We just need to polish it by
> removing code that is meant to be elsewhere and bringing
> Controller-specific code into that class. It will manage/coordinate
> two
> objects: "memory store" and "disk stores".
>

Cool. Sounds like the easy part (famous last words).

>
>> Since you mention a difficulty identifying what each of the store
>> API
>> classes does, can you suggest anything better than by arch
>> brainstorm
>> above?
>
> I think we are in agreement on the high-level stuff. I do not think
> it
> we should try to document most of the polishing/reshuffling details
> now
> because there are too many of them, they are determined by high-level
> class roles, the changes should be pretty clear in the patches, and I
> may not fix everything anyway. There is one exception that I will
> discuss separately.
>
>
>> For example; its not clear to me in the slightest why StoreHashIndex
>> is
>> relevant to disks but not in-memory caches. Surely the in-memory
>> have a
>> hash index of their content as well? (or should after global
>> store_table
>> is gone) - or is the class name irrelevant to what it actually does
>> nowdays?
>
> Please ignore the current StoreHashIndex class name. It will not
> survive
> this polishing.

Okay. :-)

>
> This class is needed to represent all disk caches taken together and
> coordinate activities among individual disk caches (e.g., cache_dir
> selection for a given miss object).
>
> Why separate the memory store from disk stores? While there is a lot
> in
> common between disk and memory caches, there are a few significant
> differences. For example:
>
> * There may be many disk caches (that need coordination) but there is
> at
> most one memory cache. That is why disk caches need a "disk cache
> coordination" class and memory cache does not.

There are three memory cache areas that I know of we do or will be
needing in future. Each having different internal needs:
  * shared memory cache (small objects)
  * local memory cache (larger objects)
  * active transactions (for collapsed-forwarding we will need an index
of these with lookups same as if it were a cache).

We may need to cater for groups of workers in future as well. Where a
worker would be part of one or more groups each sharing a mmap area
common to only the group. ie enforcing different ISP clients dont share
cached data is one use-case we have been asked about.
  I *dont* propose implementing that anytime soon, but a design that
allows for multiple memory caches opens a lot of possibilities that are
currently blocked.

>
> * Memory cache access methods are synchronous. Most disk cache access
> methods are not (or should not be).
>

Ack. Which is why I separate the hash lookup interface from the I/O
management interface - suggesting shared lookup interface and per-type
I/O interface. I'm talking hash lookup and VaryMetaObject handling at
this point.

> * An object may be placed in at most one disk cache, in at most one
> memory cache, but it can be cached both on disk and in memory. The
> first
> algorithm is (or will be) implemented by the StoreHashIndex
> replacement,
> the second by MemStore itself, and the third by StoreController.
>

For now yes. In future who knows - multiple memory areas could bring
multiple memory copies. Then there is the I/O difficulty layering of
RAM, SSD, HDD, network and tape to consider. I don't think we should
make the distinction hard-coded about where something exists and when it
is allowed to have duplicates in a *separate* area. Only that we prevent
duplicates within one single cache area as much as possible and treat
duplicated data as easily over-writable space.

>
> One of the biggest challenges would be to handle a combination of
> original non-shared caches and modern shared caches. I will discuss
> this
> in a separate email.
>

Ack.

Amos
Received on Wed Oct 17 2012 - 23:06:43 MDT

This archive was generated by hypermail 2.2.0 : Thu Oct 18 2012 - 12:00:06 MDT