Re: Squid-smp : Please discuss

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 15 Sep 2009 14:27:24 +1200

On Mon, 14 Sep 2009 16:14:12 -0600 (MDT), Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On Mon, 14 Sep 2009, Amos Jeffries wrote:
>
>> I think we should take this on-list so the others with more detailed
>> knowledge can give advice in case I have my facts wrong about
>> AsyncCalls...
>
> I am afraid this discussion focuses on a small part of a much bigger
> problem so finalizing the design decisions here may be counter
> productive until we have an agreement on how to split Squid into
> threads in general.

Aye. Thats the idea.

>
> There are a few different ways to partition Squid and most of them
> have been discussed in the past. I am not sure whether the discussions
> have ever reached a consensus point. I am also not sure there is
> consensus whether we should design for 2 cores, 8, cores, 16 cores, 32
> cores, or more, or all of the above?

The partitioning discussion must have happened well before my time. The
last few years its been consensus that the components get partitioned into
SourceLayout components and AsyncCalls codepaths.
Further partitioning we have not discussed recently.

>
> There is also a question on how configurable everything should be. For
> example, if the box has only two cores, will the user be able to
> specify which threads are created and which created threads run on
> which core? Also, will the user be able to specify whether memory or
> disk cache is shared?

This can be decided and added after the initial experiments and locking
tests yes? When squid is starting to push CPU intensive stuff into threads
other than the current main one.

>
> I also agree that the major difficulty here is not implementing the
> threading code itself, but making sure that no shared data is accessed
> unlocked. This is easy when you start from scratch, but we have a lot
> of globals and static objects accessed all around the code. Protecting
> each of them individually by hand would require a lot of coding and
> risk. IMO, we need to add a few classes that would make sharing global
> data simple and safe. This is where C++ would help a lot.
>
> And even the protection of globals requires a high-level design plan:
> do we protect global objects like Config or individual data types like
> SquidString?

IMHO, objects/types which may be written to or deleted while a thread is
trying to read from it.

RefCounting done properly forms a lock on certain read-only types like
Config. Though we are currently handling that for Config by leaking the
memory out every gap.

SquidString is not thread-safe. But StringNG with its separate refcounted
buffers is almost there. Each thread having a copy of StringNG sharing a
SBuf equates to a lock with copy-on-write possibly causing issues we need
to look at if/when we get to that scope.

>
> Finally, I agree that making accepting code into a thread may lead to
> "too aggressive" incoming stream of requests that would overwhelm the
> other parts of the system. I have recently observed that kind of
> behavior in another threaded performance-focused application. This
> does not mean that accept must not be a thread, but it means that we
> should think how the overall design would prevent one thread from
> overloading others with work.
>
> Cheers,
>
> Alex.
>

The biggest question underlying SMP before we can even look at locking and
resources is whether AsyncCalls is a suitable interface between threads
(thread A schedules call, thread B runs it...) or do we need yet another
queuing mechanism.

Locking can be iteratively/incrementally done as things are converted. Each
resource will have its own challenges and requirements. We can't face them
all now and expect to get it right. When the async question is answered
we look at exactly what the best way to lock the queue is.

This is a small scope though suitable for experimenting with some easily
adapted area of the code as a starting point with a clearly measurable
works/fails. It's only locking dependence is on the immediate data and what
queue to add the next event/call/whatever to.

Then as you point out, how to prevent the accept()'s overloading the main
queue as multi-threads funnel down to the old central one. The 'safe'
approach is to convert the hard way. From server-facing code out to
client-facing. Unfortunately that approach does involve a lot more locking
design problems very early on.

Amos

>
>> Sachin Malave wrote:
>>> On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries <squid3_at_treenet.co.nz>
>>> wrote:
>>>> On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave
>>>> <sachinmalave_at_gmail.com>
>>>> wrote:
>>>>> Hello Amos,
>>>>>
>>>>>
>>>>> As discussed before, I am analyzing codes for the changes as on
>>>>> http://wiki.squid-cache.org/Features/SmpScale, and have concentrated
>>>>> on epoll ( select ) implementations in squid. It is found that epoll
>>>>> is polling all the descriptors & processing them one by one. There is
>>>>> an important FD used by http port which is always busy, but has to
>>>>> wait for other descriptors in queue to be processed.
>>>>>
>>>>> Then, I also found that it is possible to separateworking of all fd
>>>>> handlers , e.g fd used by http port.(tried)
>>>>> This can be done by making some changes in codes.................
>>>>> i have been trying to code & test these changes since last few days,
>>>>> of course this may not be correct or need some improvements to meet
>>>>> our requirements, Please give me feedback and tell me dependencies i
>>>>> might have not considered,
>>>>>
>>>>> Again one important issue, I know that, doing changes as mentioned
>>>>> below will create and kill thread after each timeout but we can
extend
>>>>> it further, and make a separate thread that will never exit, we will
>>>>> discuss on this issue later, before everything, please check proposed
>>>>> changes so that we can move further.
>>>>
>>>> You mean the main http_port listener (port 3128 etc)?
>>>> This is currently not handled specially, due to there being more than
>>>> one
>>>> listener FD in many Squid setups (multiple http_port and https_port
>>>> then
>>>> other protocols like HTTPS, ICP, HTCP, DNS), any threading solution
>>>> needs
>>>> to handle the listeners agnostic of what they do. Though splitting
>>>> listener
>>>> FD accepts into a separate loop from other FD does seem sound.
>>>>
>>>> Special pseudo-thread handling is already hacked up in a pseudo-thread
>>>> poller for DNS replies. Which is complicating the FD handling there.
>>>> What
>>>> I'd like to see is resource-locking added to the Async queue when
>>>> adding
>>>> new queue entries.
>>>>
>>>> That allows making the whole select loop(s) happen in parallel to the
>>>> rest
>>>> of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into
the
>>>> main squid processing queue.
>>>>
>>>> Workable?
>>>>
>>>>
>>>>> *Changes are tagged with "NEW"
>>>>>
>>>>> 1.> inside client_side.cc
>>>>>
>>>>> void clientHttpConnectionsOpen(void)
>>>>> {
>>>>> .
>>>>> httpfd=fd; //httfd now holding http file descriptor (NEW)
>>>>> .
>>>>> .
>>>>> .
>>>>> comm_accept(fd, httpAccept, s);
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> 2.> inside comm_epoll.cc
>>>>>
>>>>> int kdpfdHttp;
>>>>> int useHttpThread = 1;
>>>>>
>>>>> void comm_select_init(void)
>>>>> {
>>>>>
>>>>> peventsHttp = (struct epoll_event *) xmalloc(1 *
>>>>> sizeof(struct epoll_event)); //NEW
>>>>>
>>>>> kdpfdHttp = epoll_create(1); //NEW
>>>>>
>>>>>
>>>>> }
>>>>>
>>>>> void commSetSelect(int fd, unsigned int type, PF * handler, void
>>>>> *client_data, time_t timeout)
>>>>> {
>>>>>
>>>>>
>>>>>
>>>>> if (!F->flags.open) {
>>>>> if (useHttpThread) //NEW
>>>>> epoll_ctl(kdpfdHttp,
>>>>> EPOLL_CTL_DEL, fd, &ev); //NEW
>>>>> else
>>>>> epoll_ctl(kdpfd, EPOLL_CTL_DEL,
>>>>> fd,
>>>>> &ev);
>>>>> return;
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> if (fd==getHttpfd()) { //NEW
>>>>> printf("********Setting epoll_ctl for
>>>>> httpfd=%d\n",getHttpfd());
>>>>> if (epoll_ctl(kdpfdHttp, epoll_ctl_type, fd,
>>>>> &ev) <
>>>>> 0) {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8,
>>>>> "commSetSelect: epoll_ctl(," <<
>>>>> epolltype_atoi(epoll_ctl_type) << ",,): failed on FD " << fd << ": "
>>>>> << xstrerror());
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>> comm_err_t
>>>>> comm_select(int msec)
>>>>> {
>>>>> int num, i,fd=-1;
>>>>> fde *F;
>>>>> PF *hdl;
>>>>>
>>>>> //SHM: num2
>>>>> int num2=-1; //NEW
>>>>> //SHM: End
>>>>>
>>>>> struct epoll_event *cevents;
>>>>> struct epoll_event *ceventsHttp;
>>>>> printf("Inside comm_select:comm_epoll.cc\n");
>>>>> //PROF_start(comm_check_incoming);
>>>>>
>>>>> if (msec > max_poll_time)
>>>>> msec = max_poll_time;
>>>>>
>>>>>
>>>>> for (;;) {
>>>>> printf("(for(;;):Inside comm_select:comm_epoll.cc\n");
>>>>> num = epoll_wait(kdpfd, pevents, 1, msec);
>>>>>
>>>>> //SHM: epoll_wait for kpdfdHttp
>>>>> if (useHttpThread) { //NEW
>>>>> printf("(for(;;):USEHTTP:Inside
>>>>> comm_select:comm_epoll.cc\n");
>>>>> num2 = epoll_wait(kdpfdHttp, peventsHttp, 1,
>>>>> msec);
>>>>> //NEW
>>>>> printf("\n\n\n num2=%d\n\n\n", num2);
>>>>> }
>>>>> //SHM: End
>>>>>
>>>>> ++statCounter.select_loops;
>>>>>
>>>>> if (num >= 0 || num2 >= 0) //NEW
>>>>> break;
>>>>>
>>>>> if (ignoreErrno(errno))
>>>>> break;
>>>>>
>>>>> getCurrentTime();
>>>>>
>>>>> //PROF_stop(comm_check_incoming);
>>>>>
>>>>> return COMM_ERROR;
>>>>> }
>>>>>
>>>>> // PROF_stop(comm_check_incoming); //PLEASE DISCUSS
>>>> THIS...........
>>>>
>>>> The PROF_* bits are rarely used. Removing them from here is acceptable
>>>> as
>>>> a
>>>> temporary measure when its initially threaded. Long-term they need
>>>> looking
>>>> at making thread-safe. IMO the entire time spent in one of the poll
>>>> threads
>>>> counts as comm_check_incoming so maybe a new API to PROF_* needs to be
>>>> developed to account for thread times. This is another SMP job though,
>>>> slightly out of scope of your comm loop upgrade.
>>>>
>>>>
>>>>> getCurrentTime();
>>>>>
>>>>> statHistCount(&statCounter.select_fds_hist, num);
>>>>>
>>>>> if (num == 0 && num2 == 0) //NEW (taken lots of my time to
fix
>>>>> this)
>>>>> return COMM_TIMEOUT; /* No error.. */
>>>>>
>>>>> // PROF_start(comm_handle_ready_fd);
>>>>>
>>>>> printf("\nChoosing thread..............................\n");
>>>>> //NEW
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> //HERE THE HTTP FD IS HANDLED SEPARATELY This could be our new
>>>>> thread //NEW START
>>>>>
>>>>> if (num2 > 0) {
>>>>> printf("\nINSIDE Thread now\n\n");
>>>>> ceventsHttp = peventsHttp;
>>>>> fd = ceventsHttp->data.fd;
>>>>> F = &fd_table[fd];
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): got FD
>>>>> " << fd << " events=" <<
>>>>> std::hex << ceventsHttp->events << " monitoring=" <<
>>>>> F->epoll_state <<
>>>>> " F->read_handler=" << F->read_handler << "
>>>>> F->write_handler=" << F->write_handler);
>>>>>
>>>>>
>>>>>
>>>>> if (ceventsHttp->events & (EPOLLIN|EPOLLHUP|EPOLLERR) ||
>>>>> F->flags.read_pending) {
>>>>> if ((hdl = F->read_handler) != NULL) {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select():
Calling
>>>>> read handler on FD " << fd);
>>>>> PROF_start(comm_write_handler);
>>>>> F->flags.read_pending = 0;
>>>>> F->read_handler = NULL;
>>>>> hdl(fd, F->read_data);
>>>>> PROF_stop(comm_write_handler);
>>>>> statCounter.select_fds++;
>>>>> } else {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no
read
>>>>> handler for FD " << fd);
>>>>> // remove interest since no handler exist for this
>>>>> event.
>>>>> commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0);
>>>>> }
>>>>> }
>>>>>
>>>>> if (ceventsHttp->events & (EPOLLOUT|EPOLLHUP|EPOLLERR)) {
>>>>> if ((hdl = F->write_handler) != NULL) {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select():
Calling
>>>>> write handler on FD " << fd);
>>>>> PROF_start(comm_read_handler);
>>>>> F->write_handler = NULL;
>>>>> hdl(fd, F->write_data);
>>>>> PROF_stop(comm_read_handler);
>>>>> statCounter.select_fds++;
>>>>> } else {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no
>>>>> write handler for FD " << fd);
>>>>> // remove interest since no handler exist for this
>>>>> event.
>>>>> commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0);
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> //NEW ENDS HERE
>>>>>
>>>>>
>>>>> for (i = 0, cevents = pevents; i < num; i++, cevents++) {
>>>>>
>>>>>
>>>>> fd = cevents->data.fd;
>>>>> if ( fd == getHttpfd() )
>>>>> continue;
>>>>> F = &fd_table[fd];
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): got FD " << fd
>>>>> << " events=" <<
>>>>> std::hex << cevents->events << " monitoring=" <<
>>>>> F->epoll_state <<
>>>>> " F->read_handler=" << F->read_handler << "
>>>>> F->write_handler=" << F->write_handler);
>>>>>
>>>>> // TODO: add EPOLLPRI??
>>>>>
>>>>> if (cevents->events & (EPOLLIN|EPOLLHUP|EPOLLERR) ||
>>>>> F->flags.read_pending) {
>>>>> if ((hdl = F->read_handler) != NULL) {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select():
Calling
>>>>> read handler on FD " << fd);
>>>>> PROF_start(comm_write_handler);
>>>>> F->flags.read_pending = 0;
>>>>> F->read_handler = NULL;
>>>>> hdl(fd, F->read_data);
>>>>> PROF_stop(comm_write_handler);
>>>>> statCounter.select_fds++;
>>>>> } else {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no
read
>>>>> handler for FD " << fd);
>>>>> // remove interest since no handler exist for this
>>>>> event.
>>>>> commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0);
>>>>> }
>>>>> }
>>>>>
>>>>> if (cevents->events & (EPOLLOUT|EPOLLHUP|EPOLLERR)) {
>>>>> if ((hdl = F->write_handler) != NULL) {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select():
Calling
>>>>> write handler on FD " << fd);
>>>>> PROF_start(comm_read_handler);
>>>>> F->write_handler = NULL;
>>>>> hdl(fd, F->write_data);
>>>>> PROF_stop(comm_read_handler);
>>>>> statCounter.select_fds++;
>>>>> } else {
>>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no
>>>>> write handler for FD " << fd);
>>>>> // remove interest since no handler exist for this
>>>>> event.
>>>>> commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0);
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> // PROF_stop(comm_handle_ready_fd);
>>>>>
>>>>> return COMM_OK;
>>>>> }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>> Amos
>>>
>>>
>>>
>>>
>>> --------------------------------------------
>>> --------------------------------------------
>>> Yes u r right, when there are multiple ports then the things will be
>>> different but, they can be embedded easily and all descriptors can be
>>> handled in the same way, I guess . This change will not be effective
>>> much, as codes are not optimized for all descriptors and is not
>>> generalized solution for them.
>>>
>>> I found that there is delay between two select loops which can be
>>> minimized to delay of its own descriptor, so concentrated upon....
>>>
>>
>> I'm not sure exactly what you are saying there. Two select loops being
>> two
>> runs of the FD checking loop? or delay when two polling loops are run?
>>
>> I read your earlier explanation to mean there was a delay doing accept()
>> on
>> FD 1 (for example) when FD 1,2,3,4,5,6,7 had reads pending.
>>
>> And your solution to be:
>> thread #1 - looping on read() FD 2,3,4,5,6,7 + other stuff
>> thread #2 - looping on accept() FD 1
>>
>>> And trying to find the locking mechanism for async queues also , seems
>>> it taking time.
>>
>> I mean some mutex/lock on AsyncCallQueue so the multiple threads can do
>> AsyncCallQueue::schedule(call) without pointer collisions with theTail
>> and
>> theHead, when they setup the first read of an accepted()'d FD.
>>
>> For example something like this...
>>
>> thread #1:
>> while ( <events to dial> ) {
>> queuedEventX.dial()
>> }
>>
>> thread #2:
>> while( <listening for new connections> ) {
>> newFD = accept()
>> readFromNewFd = new AsyncCallPointer...
>> AsyncCallQueue::schedule(readFromNewFd);
>> }
>>
>>
>>>
>>> Need some time for further analysis...........
>>>
>>> Thank you so much......
>>>
>>
>> Amos
>> --
>> Please be using
>> Current Stable Squid 2.7.STABLE6 or 3.0.STABLE19
>> Current Beta Squid 3.1.0.13
>>
Received on Tue Sep 15 2009 - 02:27:39 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 15 2009 - 12:00:04 MDT