Re: [RFC] cache architecture from Pieter De Wit on 2012-01-24 (squid-dev)

From: Pieter De Wit <pieter_at_insync.za.net>
Date: Tue, 24 Jan 2012 23:10:45 +1300

Sorry - my mail client messes with the layout, I will double space from
now on :)

Spotted a few mistakes with my suggestion:

On 24/01/2012 23:02, Pieter De Wit wrote:
> <snip>
>>> Perhaps a 9) Implement dual IO queues - I *think* the IO has been
>>> moved into it's own thread, if not, the queuing can still be
>>> applied. Any form of checking the cache is going to effect squid, so
>>> how do we ensure we are idle, dual queues :) Queue 1 holds the
>>> requests for squid, queue 2 holds the admin/clean up requests. The
>>> IO "thread" (if not threaded), before handling an admin/clean up
>>> request checks Queue 1 for requests, empties is *totally before*
>>> heading into Queue 2. This will allow you to have the same caching
>>> as now, relieving the start-up problems ? Might lead to the same
>>> double cache of objects as above (if you make the cache writable
>>> before the scan is done)
>>
>> I wonder about priority queues every now and then. It is an
>> interesting idea. The I/O is currently done with pluggable modules
>> for various forms. DiskThreads and AIO sort of do this but are FIFO
>> queued in N parallel queues. Prioritised queues could be an
>> interesting additional DiskIO module.
> Hard to implement given the current "leg work" is already done ? How
> well does the current version of squid handle multicores and can this
> take advantage of cores ?
>>
>> What I'm looking for is a little bit more abstracted towards the
>> architecture level across cache type and implementation. At that
>> scale we can't use any form of "totally empty" queue condition
>> because on caches that receive much traffic the queue would be quite
>> full, maybe never actually empty. Several of the problems we have now
>> are waiting on the cache load completed (ie the load action queue
>> empty) before a cache is even considered for use.
>>
>> Amos
> At that scale, no matter what you do, you will impact performance/your
> "wanted" outcome. It's about reaching an acceptable balance which I
> think, you, as a dev, will have a hard time predicting for any real
> life usage out there. Perhaps "we" (in " since I am yet to contrib a
> single line of code :) ) can make it "Weighted Priority" and as such,
> have squid.conf options to tune it. The Admin has to decide how
> aggresive squid must be at rebuilding (makes me think of the raid
> rebuild options in HP RAID controllers) the cache. I am thinking of:
>
> cache_rebuild_weight <0-"max int"> ?

Can't be zero since we won't rebuild then, but what if we want have more
than 1 per 1, maybe we should have 2 options:

cache_rebuild_weight <1-max int>
cache_request_weight <1-max int>

?
>
> For every x requests, action an "admin/clean up" request, unless
> "Queue 1" is empty, then drain "Queue 2"
>
> I am also thinking of a "third" queue, something like:
>
> Queue 1 - Write requests (depends on cache state, but has the most
> impact - writes are slow)
> Queue 2 - Read requests (as above, but less of an impact)
> Queue 3 - Admin/Clean up
>
> The only problem I have so far is Queue 1 is above Queue 2.....they
> might be swapped since you are reading more than writing ? Perhaps
> another config option.....
>
> cache_dir /var/dir1 128G 128 128 Q1=read Q2=write (cache_dir syntax
> wrong....)
> cache_dir /var/dir2 32G 128 128 Q1=write Q2=read (as above, but this
> might be on ssd)
>
> I think this might be going too far ?
>
> Cheers,
>
> Pieter
>
Also, if we have the "squid.state" loaded, what stops us from writing
objects in free space, if there is ? We know how big the cache is/was
and how big it's allowed to be ? As before, this will lead to the double
storage of objects, but, this can be free'd

Cheers,

Pieter
Received on Tue Jan 24 2012 - 10:10:58 MST

This archive was generated by hypermail 2.2.0 : Tue Jan 24 2012 - 12:00:08 MST