Re: When can we make Squid using multi-CPU? from Adrian Chadd on 2009-01-05 (squid-dev)

From: Adrian Chadd <adrian_at_squid-cache.org>
Date: Mon, 5 Jan 2009 11:48:28 -0500

I've been looking into what would be needed to thread squid as part of
my cacheboy squid-2 fork.

Basically, I've been working on breaking out a bunch of the core code
into libraries, which I can then check and verify are thread-safe. I
can then use these bits in threaded code.

My first goal was probably to break out the ACL and internal URL
rewriter code into threads, but the current use of the callback data
setup in Squid makes passing cbdata pointers into other threads quite
uhm, "tricky".

The basic problem is that although a given chunk of memory backing a
cbdata pointer will remain valid for as long as the reference exists,
the -data itself- may not be valid at any point. So if thread A
creates a cbdata pointer and passes it into thread B to do something
(say an ACL lookup), there's no way (at the moment) for thread B to
guarantee at any/all points during its execution that the data in B
will stay valid without a whole lot of pissing around with locking,
which I'd absolutely like to avoid doing in a high performance network
application even the apparent wonderful performance current hardware
has w/ lots of locking. :)

So for the time being, I'm looking at what would be needed for a basic
inter-thread "batch" event/callback message queue, sort of like
AsyncCalls in squid-3 but minus 100% of the legacy cruft; and then
I'll see what kind of tasks can be pushed out to the threads.

Hopefully a bunch of stuff can be easily pushed out to threads with a
minimum amount of effort, such as some/all of the ACL lookups, some
URL rewriting, some GZIP and other kind of basic content mainpulation,
and the freakishly simple (comparitively) server-side HTTP code
(src/http.c). But doing that requires making sure a bunch of the low
level code is suitably re-enterant/thread-safe/etc, and this includes
a -lot- of stuff (lib/, debug, logging, memory allocation, some
statistics gathering, chunks of the HTTP parsing and packing routines,
the packer routines, membufs, etc.)

Thankfully (in Cacheboy) I've broken out almost all of the needed
stuff into top-level libraries which can be independently audited for
thread-happiness. There's just some loose ends which need tidying up.
For example, almost all of the code in libhttp/ in cacheboy (ie, basic
http header and header entry stuff, parsing, range request headers,
cc, headers, etc) are thread-safe, but the functions -they- call (such
as the base64 functions) use static buffers which may or may not be
thread-safe. Stuff which calls the legacy non-safe inet_* routines, or
perhaps the non thread-safe strtok() and other string.h functions, all
need to be fixed.

Threading the rest of it would take a lot, -lot- more time. A
thread-aware storage backend (disk, memory, store index) is definitely
an integral part of making a threaded Squid, and a whole lot more code
modularity and reorganisation would have to take place for that to
occur.

Want to help? :)

Adrian

2009/1/4 ShuXin Zheng <zhengshuxin_at_gmail.com>:
> I've ever do this to run multi-squid on one machine which can use multi-CPU,
> but can't share the same store-fs, and must configure multi-IP on the same
> machine. Can we rewrite squid as follow:
>
> thread0(client side, no block, can accept many connections) thread1
> ..threadn(n=CPU number)
> |
> |
> v
> v
> access check
> access check
> |
> |
> v
> v
> http header parse
> http header parse
> |
> |
> v
> v
> acl filter
> acl filter
> |
> |
> v
> v
> check local cache
> check local cache
> |
> |
> v
> v
> -----------------------------------------------------------------------------------------------
> |
> neighbor----| | |-----ufs
> webserver--|------- forward --------- |--------store fs ----|-----aufs
> | |
> |-----coss
> -----------------------------------------------------------------------------------------------
> |(thread0)
> |(thread1) ......
> v
> v
> ...
> ....
>
>
>
> 2009/1/4 <anesthes_at_cisdi.com>:
>>
>> I've found the best way is to run multiple copies of squid on a single
>> machine, and use LVS to load balance between the squid processes.
>>
>> -- Joe
>>
>> Quoting Adrian Chadd <adrian_at_squid-cache.org>:
>>
>>> when someone decides to either help code it up, or donate towards the
>>> effort.
>>>
>>>
>>>
>>> adrian
>>>
>>> 2009/1/3 ShuXin Zheng <zhengshuxin_at_gmail.com>:
>>>>
>>>> Hi, Squid current can only use one CPU, but multi-CPU hardware
>>>> machines are so popular. These are so greatly wastely. How can we use
>>>> the multi-CPU? Can we separate some parallel sections which are CPU
>>>> wasting to run on different CPU? OMP(http://openmp.org/wp/) gives us
>>>> some thinking about using multi-CPU, so can we use these technology in
>>>> Squid?
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> zsxxsz
>>>>
>>>>
>>>
>>
>>
>>
>>
>
>
>
> --
> zsxxsz
>
>
Received on Mon Jan 05 2009 - 16:48:38 MST

This archive was generated by hypermail 2.2.0 : Mon Jan 05 2009 - 12:00:03 MST