Re: [squid-users] is there a squid "cache rank" value available for statistics? from Amos Jeffries on 2009-04-20 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 20 Apr 2009 22:44:27 +1200

Gavin McCullagh wrote:
> Hi,
>
> On Mon, 20 Apr 2009, Amos Jeffries wrote:
>
>> Squid suffers from a little bit of an anochronism in the way it stores
>> object. the classic ufs systems essentially use round-robin and hash to
>> determine storage location for each object separately. This works wonders
>> on ensuring no clashes, but not so good for retrieval optimization.
>> Adrian Chadd has done a lot of study and some work in this area
>> particularly for Squid-2.6/2.7. His paper for FreeBSD conference is a good
>> read on how disk storage relates to Squid.
>> http://www.squid-cache.org/~adrian/talks/20081007%20-%20NYCBSDCON%20-%20Disk%20IO.pdf
>
> Thanks, I'll take a look.
>
>>> I realise there are rules of thumb for cache size, it would be interesting
>>> to be able to analyse a particular squid installation.
>> Feel free. We would be interested in any improvements you can come up with.
>
> I didn't say anything about being able to improve anything! ;-)
>
> I was hoping if a given user could plot a report of %age of hits coming
> from each successive GB of the cache (or a rough guide to that), they could
> more easily work out, based on their own workload, what the best value
> cache size would be. Obviously I don't mean physical disk location, I mean
> "if my cache were eg halved in size, how many hits would I lose?".

Ah, in my experience that is gained from long term monitoring of the hit
rates and tweaking.
For example on the wiki cache we had a small outage with the disk and
flipped it over to RAM-only for a few weeks. The munin graphs showed a
~20% reduction in byte-hit ratio and ~15% drop in request-hit ratio. On
just 2 req/sec.

>
> I was naively hoping that the code might easily be leveraged to capture
> this "cache rank" and log it for each hit, but the more I think about it,
> the less likely this seems.
>
>> IIRC there is a doubly-linked list with tail pointer for LRU.
>
> That was my guess alright, so requeueing is O(1). I presume traversal is
> not necessary for a HIT though, which means the position in the removal
> queue may not be easy to determine.

No thats hashed, and the HIST gets immediately cut out and pasted at the
start. So still around O(1) or similar for the list actions.

>
> Looking over the HP paper, heap LRU seems to have been created so that the
> comparison between LRU, GSDF and LFUDA could be more easily made. I'm
> guessing LRU is as quick (and simpler) as a list than as a heap.
>
>>> In the case of simple LRU, if the queue must be traversed to find each
>>> element and requeue it (perhaps this isn't the case?), I suppose one could
>>> count the position in the queue and divide by the total length.
>> Yes, same big problems with that in LRU as displaying all objects in the
>> cache ( >1 million is not uncommon cache sizes) and regex purges.
>
> Definitely not something you want to do often :-)
>
>>> With a heap, things are more complex. I guess you could give an
>>> indication
>>> of the depth in the heap but there would be so many objects on the lowest
>>> levels, I don't suppose this would be a great guide. Is there some better
>>> value available, such as the key used in the heap maybe?
>> There is fileno or hashed value rather than URL. You still have the same
>> issues of traversal though.
>
> Hmmm. I guess I need to read up a little more on how heap LFUDA is
> implemented.
>
>> If you want to investigate. I'll gently nudge you towards Squid-3 where
>> the rest of the development is going on and improvements have the best
>> chance of survival.
>>
>> For further discussion you may want to bring this up in squid-dev where
>> the developers hang out.
>
> Yeah, fair points. Thanks.
>
> Gavin
>

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE14
   Current Beta Squid 3.1.0.7

Received on Mon Apr 20 2009 - 10:44:27 MDT

This archive was generated by hypermail 2.2.0 : Mon Apr 20 2009 - 12:00:02 MDT