Re: [squid-users] Re: does rock type deny being dedicated to specific process ?? from Amos Jeffries on 2013-10-29 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 30 Oct 2013 01:05:21 +1300

On 29/10/2013 10:28 p.m., Dr.x wrote:
>
> hi amos ,
> ===============================================
> Question 1 :
>
> about aufs dir ,
> about smp ,
> you say that it dont work with smp
> but my question is more accurate ,
> i c that it dont share workers , but it works if we assign specific
> process to specific cache dir .
>
> in my opinion we could solve our problem of cpu load partially ??!!!
> because without smp , all cache dir aufs or not aufs , were sharing the same
> process , and this process was not getting beneift of all cores ""uptill now
> this is my understanding "
>
> again , my question is ,
> we should not say that auf dir dont work with smp completely

Yes we should.

SMP means "symmetric multi-processing". It is achieved in Squid by
sharing memory blocks between workers, and inter-process messages about
processing state.
AUFS memory index contents and changes are *not* shared between workers.
=> Therefore "AUFS is not SMP-enabled" ... as we keep saying.

Also, if you attempt to use one AUFS cache_dir for two workers, each
worker handles the AUFS disk cache as if it were the *only* process
accessing it:

a) Worker 1 will save or update content on disk without informing worker
2. Causing a random URL in worker 2 to point at the on-disk object saved
by worker 1 from another URL.
This also happens the other way around as worker 2 over-write objects
of worker 1.

b) Worker 1 and 2 will occasionally save different data to the same disk
file at the same time. Causing disk file corruption and also leading to
problem (a).

c) URL content stored by worker 1 will not be notified to worker 2, so
it will never HIT on that content until it has downloaded a copy of its
own. Which will be stored in cache, either resulting in problem (a)
immediately or causing worker 2 to add duplicate object in the cache
until such time as problem (a) causes it to get corrupted by worker 1.

Squid contains some protection against disk corruption like this. What
you will see is that when Squid starts up you get some HITs from that
cache by each worker. As the cache contents ages and gets revalidated
the HITs slowly all become SWAPFAIL_MISS and most of your existing disk
contents get erased.

> , but to be
> accurate we can say that aufs dont work with shared workers

You can only say this because workers and SMP are different things.
Worker is the type of process instance.
SMP is a design mechanism for how workers might interoperate.

AUFS is multi-threaded. Which is very similar to SMP but without the P.

> , but we can
> get benefit of smp to load cpu cores and set each core to instance aufs dir
> and as a result we solved our cpu load partially ,

The "A" stands for Asynchronous and is an algorithm for scalable
multi-threaded I/O. The OS kernel decides which core each thread runs on.

AUFS is designed currently to benefit from multiple cores with only
single-process. This does not change despite how many workers you use.

> i just want to understand this issue and with to correct me about the info i
> posted here ,

Complexity.

Each component in Squid has the three different concept properties
(multi-process Y/N, multi-threaded Y/N, shared-memory Y/N).
AUFS in Squid right now is the particular combination of (No, Yes, No).

Amos

>
> ===============================================
>
> Amos Jeffries-2 wrote
>> PS: if you want to experiment, you could try given the frontend and
>> backend config two slightly different cache_dir lines. So the frontend
>> has a "read-only" flag but otherwise identical settings. In theory that
>> would make the frontend able to HIT on the rock cache, but only the
>> backends able to store things there.
>>
>> Amos
> not understanding yet ,
> why add to front end a cahe dir ???

So it can respond with HIT immediately.
Each request passed to the backend needs serializing into HTTP text
protocol, sending to the backend, then re-parsing by the backend. The
reply coming back also has to go through the same number of extra
parse+serialize steps on its way back. This is slower than a
shared-memory lookup and sending the HIT back directly.

Note: The main benefit of CARP is to de-duplicate cached contents while
simultaneously sharing traffic load between processes. These same
properties are gained by workers in parallel sharing a rock cache. The
remaining problem in Squid is that the rock cache max-size limit of
32KB. Once that is gone (expected in Squid-3.5) there should not be any
need for CARP frontend/backend separation, all workers can be caching
and frontend.

> did u mean to put cache dir into process squid # 1??

I mean have process #1 have read-only access to the rock cache. So it
can benefit from quick HIT, but not cause problems with both frontend
and backend storing same transaction to different places in the cache.

Amos
Received on Tue Oct 29 2013 - 12:05:32 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 29 2013 - 12:00:06 MDT