Re: [squid-users] Re: What are recommended settings for optimal sharing of cache between SMP workers?

From: Rajiv Desai <rajiv_at_maginatics.com>
Date: Wed, 19 Feb 2014 01:08:48 -0800

When the cache is nearing capacity, will the collision rate increase
dramatically?
Is there any special handling to reduce collisions and follow lru eviction?

On Tue, Feb 18, 2014 at 10:30 PM, Rajiv Desai <rajiv_at_maginatics.com> wrote:
> On Tue, Feb 18, 2014 at 9:28 PM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 02/18/2014 04:11 PM, Rajiv Desai wrote:
>>
>>> It seems like there is a bug where an object overwrites previously
>>> written object:
>>
>> This is not really a bug -- hash collisions do happen, as you have
>> discovered and Rock store resolves them by overwriting older entries if
>> possible (or not caching the newer ones otherwise). The smaller your
>> cache, the more collisions you will have for the same number of unique
>> cache keys.
>>
>> How many total slots does your rock cache_dir have?
>
> 13107198 (200 GB with 16KB slots).
>
> I think I was thrown off by the hit % in mgr:info stats. Are these
> stats accurate when using 8 SMP workers?
> Is the % underreported? I couldnt find an absolute value of
> hits/misses from stats. :/
> Any pointers?
>
>>
>>
>>> Needs a better hash as pointed out by the TODO :)
>>> I will send out a patch.
>>
>> Thank you. No hash function will eliminate collisions though. If
>> somebody wants to eliminate side-effects of hash collisions on recently
>> added objects, they would have to implement a different anchor
>> allocation algorithm that resolves hash collisions through chains (for
>> example). I doubt that would be easy though!
>>
>>
>
> Yes, I realized that as I read the code further. I will try to give it
> a shot but I agree that this will be non trivial.
> Currently sharding data across multiple cache_dirs will reduce
> collisions, correct?
>
>>> Is the key guaranteed to be a null terminated string?
>>
>> No. The key is a 128-bit opaque buffer, essentially.
>>
>>
>>> I intend to use std::tr1::hash
>>
>> Please keep in mind that the key is an MD5 sum already. See
>> store_key_md5.cc. I am not a hashing expert by any means, but the MD5
>> sum should not really need much further hashing.
>>
>> It is possible that the k[0]+k[1] sum in the code below should be
>> removed in favor of either k[1] or k[0] alone, but again, this may not
>> have any significant effect (except for any specific case where the two
>> given URLs used to collide).
>>
>
> I tried k[0] ^ k[1] which seemed to provide fewer collisions (67
> collisions in 39129 GETs).
> Haven't quantified and compared i detail though.
>
>
>>
>>> <code>
>>> sfileno
>>> Ipc::StoreMap::anchorIndexByKey(const cache_key *const key) const
>>> {
>>> const uint64_t *const k = reinterpret_cast<const uint64_t *>(key);
>>> // TODO: use a better hash function
>>> return (k[0] + k[1]) % shared->limit;
>>> }
>>> </code>
>>
>>
>> HTH,
>>
>> Alex.
>>
Received on Wed Feb 19 2014 - 09:09:02 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 19 2014 - 12:00:06 MST