Re: [RFC] cache architecture

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 24 Jan 2012 23:41:28 +1300

On 24/01/2012 11:10 p.m., Pieter De Wit wrote:
> Sorry - my mail client messes with the layout, I will double space
> from now on :)
>
> Spotted a few mistakes with my suggestion:
>
> On 24/01/2012 23:02, Pieter De Wit wrote:
>> <snip>
>>>> Perhaps a 9) Implement dual IO queues - I *think* the IO has been
>>>> moved into it's own thread, if not, the queuing can still be
>>>> applied. Any form of checking the cache is going to effect squid,
>>>> so how do we ensure we are idle, dual queues :) Queue 1 holds the
>>>> requests for squid, queue 2 holds the admin/clean up requests. The
>>>> IO "thread" (if not threaded), before handling an admin/clean up
>>>> request checks Queue 1 for requests, empties is *totally before*
>>>> heading into Queue 2. This will allow you to have the same caching
>>>> as now, relieving the start-up problems ? Might lead to the same
>>>> double cache of objects as above (if you make the cache writable
>>>> before the scan is done)
>>>
>>> I wonder about priority queues every now and then. It is an
>>> interesting idea. The I/O is currently done with pluggable modules
>>> for various forms. DiskThreads and AIO sort of do this but are FIFO
>>> queued in N parallel queues. Prioritised queues could be an
>>> interesting additional DiskIO module.
>> Hard to implement given the current "leg work" is already done ? How
>> well does the current version of squid handle multicores and can this
>> take advantage of cores ?

Should be easy. We have not exactly checked and cocumented the DiskIO
library API. But
The current AIO handles SMP exactly as well as the system AIO library
can, same for pthreads library behind DiskThreads.

>>>
>>> What I'm looking for is a little bit more abstracted towards the
>>> architecture level across cache type and implementation. At that
>>> scale we can't use any form of "totally empty" queue condition
>>> because on caches that receive much traffic the queue would be quite
>>> full, maybe never actually empty. Several of the problems we have
>>> now are waiting on the cache load completed (ie the load action
>>> queue empty) before a cache is even considered for use.
>>>
>>> Amos
>> At that scale, no matter what you do, you will impact
>> performance/your "wanted" outcome. It's about reaching an acceptable
>> balance which I think, you, as a dev, will have a hard time
>> predicting for any real life usage out there. Perhaps "we" (in "
>> since I am yet to contrib a single line of code :) ) can make it
>> "Weighted Priority" and as such, have squid.conf options to tune it.
>> The Admin has to decide how aggresive squid must be at rebuilding
>> (makes me think of the raid rebuild options in HP RAID controllers)
>> the cache. I am thinking of:
>>
>> cache_rebuild_weight <0-"max int"> ?
>
> Can't be zero since we won't rebuild then, but what if we want have
> more than 1 per 1, maybe we should have 2 options:
>
> cache_rebuild_weight <1-max int>
> cache_request_weight <1-max int>
>
> ?

Each cache maintains its own order and block loading algorithm(s). It
would need to be a per-cache_dir setting. It should be relatively easy
to add there.

The cache_dir can report this up to the top layer via their loading
factor when they are not servicing requests. I was considering it to
prioritise CLEAN builds before DIRTY ones or cache_dir by the speed of
its storage type and loading factor.

>>
>> For every x requests, action an "admin/clean up" request, unless
>> "Queue 1" is empty, then drain "Queue 2"
>>
>> I am also thinking of a "third" queue, something like:
>>
>> Queue 1 - Write requests (depends on cache state, but has the most
>> impact - writes are slow)
>> Queue 2 - Read requests (as above, but less of an impact)
>> Queue 3 - Admin/Clean up
>>
>> The only problem I have so far is Queue 1 is above Queue 2.....they
>> might be swapped since you are reading more than writing ? Perhaps
>> another config option.....
>>
>> cache_dir /var/dir1 128G 128 128 Q1=read Q2=write (cache_dir syntax
>> wrong....)
>> cache_dir /var/dir2 32G 128 128 Q1=write Q2=read (as above, but this
>> might be on ssd)
>>
>> I think this might be going too far ?
>>
>> Cheers,
>>
>> Pieter
>>
> Also, if we have the "squid.state" loaded, what stops us from writing
> objects in free space, if there is ? We know how big the cache is/was
> and how big it's allowed to be ? As before, this will lead to the
> double storage of objects, but, this can be free'd

This would be (5) and (6). Permitting the 'done' areas of
partially-loaded caches to be used while the rest is still loading.

Amos
Received on Tue Jan 24 2012 - 10:41:35 MST

This archive was generated by hypermail 2.2.0 : Tue Jan 24 2012 - 12:00:08 MST