Re: [squid-users] Re: How to use tcp_outgoing_address with cache_peer

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 13 May 2013 11:03:08 +1200

On 13/05/2013 7:27 a.m., Alex Domoradov wrote:
> On Wed, May 1, 2013 at 12:42 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> On 1/05/2013 10:21 a.m., babajaga wrote:
>>> Amos,
>>>
>>> although a bit off topic:
>>>
>>>> It does not work the way you seem to think. 2x 200GB cache_dir entries
>>> have just as much space as 1x 400GB. Using two cache_dir allows Squid to
>>> balance teh I/O loading on teh disks while simultaenously removing all
>>> processing overheads from RAID. <
>>>
>>> Am I correct in the following:
>>> The selection of one of the 2 cache_dirs is not deterministic for same URL
>>> at different times, both for round-robin or least load.
>>> Which might have the consequence of generating a MISS, although the object
>>> ist cached in the other cache_dir.
>>> Or, in other words: There is the finite possibility, that a cached object
>>> is
>>> stored in one cache_dir, and because of the result of the selection algo,
>>> when the object should be fetched,
>>> the decision to check the wrong cache_dir generates a MISS.
>>> In case, this is correct, one 400GB cache would have a higher HIT rate per
>>> se. AND, it would avoid double caching, therefore increasing effectice
>>> cache space, resulting in an increase in HIT rate even more.
>>>
>>> So, having one JBOD instead of multiple cache_dirs (one cache_dir per
>>> disk)
>>> would result in better performance, assuming even distribution of (hashed)
>>> URLs.
>>> Parallel access to the disks in the JBOD is handled on lower level,
>>> instead
>>> with multiple aufs, so this should not create a real handicap.
>>
>> You are not.
>>
>> Your whole chain of logic above depends on the storage areas (cache_dir)
>> being separate entities. This is a false assumption. They are only separate
>> to the operating system. They are merged into a collective "cache" index
>> model in Squid memory - a single lookup to this unified store indexing
>> system finds the object no matter where it is (disk or local memory) with
>> the same HIT/MISS result based on whether it exists *anywhere* in at least
>> one of the storage areas.
>>
>> It takes the same amount of time to search through N index entries for one
>> giant cache_dir as it does for the same N index entries for M cache_dir. The
>> difference comes when Squid is aware of the individual disk I/O loading and
>> sizes it can calculate accurate loading values to optimize read/write
>> latency on individual disks.
>>
>> Amos
>>
>>
> And what would be if we have 2 cache_dir
>
> cache_dir aufs /var/spool/squid/ssd1 200000 16 256
> cache_dir aufs /var/spool/squid/ssd2 200000 16 256
>
> /var/spool/squid/ssd1 - /dev/sda
> /var/spool/squid/ssd2 - /dev/sdb
>
> User1 download BIG psd file and squid save file on /dev/sda (ssd1).
> Then sda is failed and user2 try to download the same file. What would
> be in that situation? Does squid download file again and place file on
> /dev/sdb and then rebuild "cache" index in memory?

Unfortunately when a UFS cache_dir dies Squid halts. This happens
whether or not RAID is used. The exception being RAID-1 (but not
RAID-10) which provides a bit more protection than Squid at present.

With multiple directories though you are in a position to quickly remove
the dead cache_dir and restart Squid with the second cache_dir while you
work on a fix, with RAID 0, 10, or 5 you are forced to rebuild the disk
structure while Squid is either offline or running without *any* disk cache.

Amos
Received on Sun May 12 2013 - 23:03:18 MDT

This archive was generated by hypermail 2.2.0 : Mon May 13 2013 - 12:00:05 MDT