Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

From: <david_at_lang.hm>
Date: Wed, 4 May 2011 11:49:01 -0700 (PDT)

I don't know how many developers are working on squid, so I don't knwo if
you are the only person who can do this sort of work or not.

do you think that I should join the squid-dev list?

David Lang

On Wed, 4 May 2011, Alex Rousskov wrote:

> On 05/04/2011 11:41 AM, david_at_lang.hm wrote:
>
>> anything new on this issue? (including any patches for me to test?)
>
> If you mean the "ACLs do not scale well" issue, then I do not have any
> free cycles to work on it right now. I was happy to clarify the new SMP
> architecture and suggest ways to triage the issue further. Let's hope
> somebody else can volunteer to do the required legwork.
>
> Alex.
>
>
>> On Mon, 25 Apr 2011, david_at_lang.hm wrote:
>>
>>> Date: Mon, 25 Apr 2011 17:14:52 -0700 (PDT)
>>> From: david_at_lang.hm
>>> To: Alex Rousskov <rousskov_at_measurement-factory.com>
>>> Cc: Marcos <mczueira_at_yahoo.com.br>, squid-users_at_squid-cache.org,
>>> squid-dev_at_squid-cache.org
>>> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
>>>
>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>
>>>> On 04/25/2011 05:31 PM, david_at_lang.hm wrote:
>>>>> On Mon, 25 Apr 2011, david_at_lang.hm wrote:
>>>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>>>> On 04/14/2011 09:06 PM, david_at_lang.hm wrote:
>>>>>>>
>>>>>>>> In addition, there seems to be some sort of locking betwen the
>>>>>>>> multiple
>>>>>>>> worker processes in 3.2 when checking the ACLs
>>>>>>>
>>>>>>> There are pretty much no locks in the current official SMP code. This
>>>>>>> will change as we start adding shared caches in a week or so, but
>>>>>>> even
>>>>>>> then the ACLs will remain lock-free. There could be some internal
>>>>>>> locking in the 3rd-party libraries used by ACLs (regex and such),
>>>>>>> but I
>>>>>>> do not know much about them.
>>>>>>
>>>>>> what are the 3rd party libraries that I would be using?
>>>>
>>>> See "ldd squid". Here is a sample based on a randomly picked Squid:
>>>>
>>>> libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol
>>>>
>>>> Please note that I am not saying that any of these have problems in SMP
>>>> environment. I am only saying that Squid itself does not lock anything
>>>> runtime so if our suspect is SMP-related locks, they would have to
>>>> reside elsewhere. The other possibility is that we should suspect
>>>> something else, of course. IMHO, it is more likely to be something else:
>>>> after all, Squid does not use threads, where such problems are expected.
>>>
>>>
>>>> BTW, do you see more-or-less even load across CPU cores? If not, you may
>>>> need a patch that we find useful on older Linux kernels. It is discussed
>>>> in the "Will similar workers receive similar amount of work?" section of
>>>> http://wiki.squid-cache.org/Features/SmpScale
>>>
>>> the load is pretty even across all workers.
>>>
>>> with the problems descripted on that page, I would expect uneven
>>> utilization at low loads, but at high loads (with the workers busy
>>> serviceing requests rather than waiting for new connections), I would
>>> expect the work to even out (and the types of hacks described in that
>>> section to end up costing performance, but not in a way that would
>>> scale with the ACL processing load)
>>>
>>>>> one thought I had is that this could be locking on name lookups. how
>>>>> hard would it be to create a quick patch that would bypass the name
>>>>> lookups entirely and only do the lookups by IP.
>>>>
>>>> I did not realize your ACLs use DNS lookups. Squid internal DNS code
>>>> does not have any runtime SMP locks. However, the presence of DNS
>>>> lookups increases the number of suspects.
>>>
>>> they don't, everything in my test environment is by IP. But I've seen
>>> other software that still runs everything through name lookups, even
>>> if what's presented to the software (both in what's requested and in
>>> the ACLs) is all done by IPs. It's a easy way to bullet-proof the
>>> input (if it's a name it gets resolved, if it's an IP, the IP comes
>>> back as-is, and it works for IPv4 and IPv6, no need to have logic that
>>> looks at the value and tries to figure out if the user intended to
>>> type a name or an IP). I don't know how squid is working internally
>>> (it's a pretty large codebase, and I haven't tried to really dive into
>>> it) so I don't know if squid does this or not.
>>>
>>>> A patch you propose does not sound difficult to me, but since I cannot
>>>> contribute such a patch soon, it is probably better to test with ACLs
>>>> that do not require any DNS lookups instead.
>>>>
>>>>
>>>>> if that regains the speed and/or scalability it would point fingers
>>>>> fairly conclusively at the DNS components.
>>>>>
>>>>> this is the only think that I can think of that should be shared
>>>>> between
>>>>> multiple workers processing ACLs
>>>>
>>>> but it is _not_ currently shared from Squid point of view.
>>>
>>> Ok, I was assuming from the description of things that there would be
>>> one DNS process that all the workers would be accessing. from the way
>>> it's described in the documentation it sounds as if it's already a
>>> separate process, so I was thinking that it was possible that if each
>>> ACL IP address is being put through a single DNS process, I could be
>>> running into contention on that process (and having to do name lookups
>>> for both IPv6 and then falling back to IPv4 would explain the severe
>>> performance hit far more than the difference between IPs being 128 bit
>>> values instead of 32 bit values)
>>>
>>> David Lang
>>>
>>>
>
>
Received on Wed May 04 2011 - 18:49:10 MDT

This archive was generated by hypermail 2.2.0 : Thu May 05 2011 - 12:00:02 MDT