Re: [squid-users] Re: How to use tcp_outgoing_address with cache_peer from Amos Jeffries on 2013-05-01 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 01 May 2013 21:42:41 +1200

On 1/05/2013 10:21 a.m., babajaga wrote:
> Amos,
>
> although a bit off topic:
>
>> It does not work the way you seem to think. 2x 200GB cache_dir entries
> have just as much space as 1x 400GB. Using two cache_dir allows Squid to
> balance teh I/O loading on teh disks while simultaenously removing all
> processing overheads from RAID. <
>
> Am I correct in the following:
> The selection of one of the 2 cache_dirs is not deterministic for same URL
> at different times, both for round-robin or least load.
> Which might have the consequence of generating a MISS, although the object
> ist cached in the other cache_dir.
> Or, in other words: There is the finite possibility, that a cached object is
> stored in one cache_dir, and because of the result of the selection algo,
> when the object should be fetched,
> the decision to check the wrong cache_dir generates a MISS.
> In case, this is correct, one 400GB cache would have a higher HIT rate per
> se. AND, it would avoid double caching, therefore increasing effectice
> cache space, resulting in an increase in HIT rate even more.
>
> So, having one JBOD instead of multiple cache_dirs (one cache_dir per disk)
> would result in better performance, assuming even distribution of (hashed)
> URLs.
> Parallel access to the disks in the JBOD is handled on lower level, instead
> with multiple aufs, so this should not create a real handicap.

You are not.

Your whole chain of logic above depends on the storage areas (cache_dir)
being separate entities. This is a false assumption. They are only
separate to the operating system. They are merged into a collective
"cache" index model in Squid memory - a single lookup to this unified
store indexing system finds the object no matter where it is (disk or
local memory) with the same HIT/MISS result based on whether it exists
*anywhere* in at least one of the storage areas.

It takes the same amount of time to search through N index entries for
one giant cache_dir as it does for the same N index entries for M
cache_dir. The difference comes when Squid is aware of the individual
disk I/O loading and sizes it can calculate accurate loading values to
optimize read/write latency on individual disks.

Amos
Received on Wed May 01 2013 - 09:42:53 MDT

This archive was generated by hypermail 2.2.0 : Mon May 13 2013 - 12:00:05 MDT