Re: squid-smp from Adrian Chadd on 2009-10-15 (squid-dev)

From: Adrian Chadd <adrian_at_squid-cache.org>
Date: Thu, 15 Oct 2009 17:56:04 +0800

Oh, I can absolutely give you guys food for thought. I was just hoping
someone else would already try to do a bit of legwork.

Things to think about:

* Do you really, -really- want to reinvent the malloc wheel? This is
separate from caching results and pre-made class instances. There's
been a lot of work in well-performing, thread-aware malloc libraries
* Do you want to run things in multiple processes or multiple threads?
Or support both?
* How much of the application do you want to push out into separate
threads? run lots of "copies" of Squid concurrently, with some locking
going on? Break up individual parts of the processing pipeline into
threads? (eg, what I'm going to be experimenting with soon - handling
ICP/HTCP in a separate thread for some basic testing)
* Survey the current codebase and figure out what depends upon what -
in a way that you can use for figuring out what needs to be made
re-entrant and what may need locking. Think about how to achieve all
of this. Best example of this - you're going to need to figure out how
to do concurrent debug logging and memory allocation - so see what
that code uses, what that codes' code uses, etc
* 10GE cards are dumping individual PCIe channels to CPUs; which means
that the "most efficient" way of pumping data around will be to
somehow throw individual connections onto specific CPUs, and keep them
there. There's no OS support for this yet, but OSes may be "magical"
(ie, handing you sockets in specific threads via accept() and hoping
that the NIC doesn't reorganise its connection->PCIe channel hash)
* Do you think its worth being able to migrate specific connections
between threads? Or once they're in a thread they're there for good
* If you split up squid into "lots of threads running the whole app",
what and where would you envisage locking and blocking? What about
data sharing? How would that scale given a handful of example
workloads? What about in abnormal situations? How well will things
degrade?
* What about using message passing and message queues? Where would it
be appropriate? Where wouldn't it be appropriate? Why?

Here's an example:

* Imagine you're doing store lookups using message passing with your
"store" being a separate thread with a message queue. Think about how
you'd handle say, ICP peering between two caches doing > 10,000
requests a second. What repercussions does that have for the locking
of the message queues between other threads. What are the other
threads doing?

With that in mind, survey the kinds of ways that current network apps
"do" threading:

* look at the various ways apache does it - eg, the per-connection
thread+process hybrid model, the event-worker thread model, etc
* look at memcached - one thread doing accept'ing, farming requests
off to other threads that just run a squid-like event loop. Minimal
inter-thread communication for the most part
* investigate what the concurrency hooks for various frameworks do -
eg, the boost asio library stuff has "colours" which you mark thread
events with. These colours dictate which events need to be run
sequentially and which can run in parallel
* look at all of the random blogs written by windows networking coders
- they're further ahead of the massively-concurrent network
application stack because Windows has had it for a number of years.

Now. You've mentioned you've looked at the others and you think major
replumbing is going to be needed. Here's a hint - its going to be
needed. Thinking you can avoid it is silly. Figuring out what you can
do right now that doesn't lock you into a specific trajectory is -not-
silly. For example, figuring out what APIs need to be changed to make
them re-enterant is not silly. Most of the stuff in lib/ with static
char buffers that they return need to be changed. That can be done
-now- without having to lock yourself into a particular concurrency
model.

2c,

Adrian

2009/10/15 Amos Jeffries <squid3_at_treenet.co.nz>:
> Adrian Chadd wrote:
>>
>> 2009/10/15 Sachin Malave <sachinmalave_at_gmail.com>:
>>
>>> Its not like we want to make project bad. Squid was not deployed on
>>> smp before because we did not have shared memory architectures
>>> (multi-cores), also the library support for multi-threading was like
>>> nightmare for people. Now things are changed, it is very easy to
>>> manage threads, people have multi-core machines at their desktops, and
>>> as hardware is available now or later somebody has to try and build
>>> SMP support. think about future.......
>>>
>>> To cop with internet speed & increase in number of users, Squid must
>>> use multi-core architecture and distribute its work............
>>
>> I 100% agree with your comments. I agree 100% that Squid needs to be
>> made scalable on multi-core boxes.
>>
>> Writing threaded code may be easier now than in the past, but the ways
>> of screwing stability, debuggability, performance and such -haven't-
>> changed.. This is what I'm trying to get across. :)
>
> Aye, understood. Which is why I've made sure all this discussion is done in
> squid-dev. So those like yourself who might have anything to point at as
> good/bad examples can do so.
>
> Sure, Squid can be re-written from the group up yet again. But none of us
> want the ten year delay that will cause. The answer is to drop eight years
> of improvements and use the Squid-2 code, or go ahead with a somewhat
> incompletely upgraded Squid-3 code. Leveraging some of the SMP work to
> further upgrade the remaining sections, while just slipping SMP into the
> currently upgraded components.
>
> Do you actually have any relevant implementations you in your infinite
> wisdom and foresight want to point us at? Or just diss us for not knowing
> enough?
>
> I'm already aware of the overall models Varnish, Oops, Apache, and Polipo,
> and Nginx are documented as using. Without looking at the code it's clear
> that their approaches are not beneficial to Squid without major re-plumbing.
>
> The solution we have to use is a mix, possibly unique to Squid, which
> retains Squids features and niche coverage. The right mix of tools for each
> task to be performed: child processes, IPC, and events. Now adding threads
> for the pieces that are applicable. There is order in the chaos.
>
> Amos
> --
> Please be using
> Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
> Current Beta Squid 3.1.0.14
>
>
Received on Thu Oct 15 2009 - 09:57:02 MDT

This archive was generated by hypermail 2.2.0 : Thu Oct 15 2009 - 12:00:05 MDT