Re: Squid-smp : Please discuss

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 14 Sep 2009 16:14:12 -0600 (MDT)

On Mon, 14 Sep 2009, Amos Jeffries wrote:

> I think we should take this on-list so the others with more detailed
> knowledge can give advice in case I have my facts wrong about AsyncCalls...

I am afraid this discussion focuses on a small part of a much bigger
problem so finalizing the design decisions here may be counter
productive until we have an agreement on how to split Squid into
threads in general.

There are a few different ways to partition Squid and most of them
have been discussed in the past. I am not sure whether the discussions
have ever reached a consensus point. I am also not sure there is
consensus whether we should design for 2 cores, 8, cores, 16 cores, 32
cores, or more, or all of the above?

There is also a question on how configurable everything should be. For
example, if the box has only two cores, will the user be able to
specify which threads are created and which created threads run on
which core? Also, will the user be able to specify whether memory or
disk cache is shared?

I also agree that the major difficulty here is not implementing the
threading code itself, but making sure that no shared data is accessed
unlocked. This is easy when you start from scratch, but we have a lot
of globals and static objects accessed all around the code. Protecting
each of them individually by hand would require a lot of coding and
risk. IMO, we need to add a few classes that would make sharing global
data simple and safe. This is where C++ would help a lot.

And even the protection of globals requires a high-level design plan:
do we protect global objects like Config or individual data types like
SquidString?

Finally, I agree that making accepting code into a thread may lead to
"too aggressive" incoming stream of requests that would overwhelm the
other parts of the system. I have recently observed that kind of
behavior in another threaded performance-focused application. This
does not mean that accept must not be a thread, but it means that we
should think how the overall design would prevent one thread from
overloading others with work.

Cheers,

Alex.

> Sachin Malave wrote:
>> On Sun, Sep 13, 2009 at 7:52 PM, Amos Jeffries <squid3_at_treenet.co.nz>
>> wrote:
>>> On Sun, 13 Sep 2009 07:12:56 -0400, Sachin Malave <sachinmalave_at_gmail.com>
>>> wrote:
>>>> Hello Amos,
>>>>
>>>>
>>>> As discussed before, I am analyzing codes for the changes as on
>>>> http://wiki.squid-cache.org/Features/SmpScale, and have concentrated
>>>> on epoll ( select ) implementations in squid. It is found that epoll
>>>> is polling all the descriptors & processing them one by one. There is
>>>> an important FD used by http port which is always busy, but has to
>>>> wait for other descriptors in queue to be processed.
>>>>
>>>> Then, I also found that it is possible to separateworking of all fd
>>>> handlers , e.g fd used by http port.(tried)
>>>> This can be done by making some changes in codes.................
>>>> i have been trying to code & test these changes since last few days,
>>>> of course this may not be correct or need some improvements to meet
>>>> our requirements, Please give me feedback and tell me dependencies i
>>>> might have not considered,
>>>>
>>>> Again one important issue, I know that, doing changes as mentioned
>>>> below will create and kill thread after each timeout but we can extend
>>>> it further, and make a separate thread that will never exit, we will
>>>> discuss on this issue later, before everything, please check proposed
>>>> changes so that we can move further.
>>>
>>> You mean the main http_port listener (port 3128 etc)?
>>> This is currently not handled specially, due to there being more than one
>>> listener FD in many Squid setups (multiple http_port and https_port then
>>> other protocols like HTTPS, ICP, HTCP, DNS), any threading solution needs
>>> to handle the listeners agnostic of what they do. Though splitting
>>> listener
>>> FD accepts into a separate loop from other FD does seem sound.
>>>
>>> Special pseudo-thread handling is already hacked up in a pseudo-thread
>>> poller for DNS replies. Which is complicating the FD handling there. What
>>> I'd like to see is resource-locking added to the Async queue when adding
>>> new queue entries.
>>>
>>> That allows making the whole select loop(s) happen in parallel to the rest
>>> of Squid. Simply accepts and spawns AsyncJob/AsyncCall entries into the
>>> main squid processing queue.
>>>
>>> Workable?
>>>
>>>
>>>> *Changes are tagged with "NEW"
>>>>
>>>> 1.> inside client_side.cc
>>>>
>>>> void clientHttpConnectionsOpen(void)
>>>> {
>>>> .
>>>> httpfd=fd; //httfd now holding http file descriptor (NEW)
>>>> .
>>>> .
>>>> .
>>>> comm_accept(fd, httpAccept, s);
>>>>
>>>> }
>>>>
>>>>
>>>> 2.> inside comm_epoll.cc
>>>>
>>>> int kdpfdHttp;
>>>> int useHttpThread = 1;
>>>>
>>>> void comm_select_init(void)
>>>> {
>>>>
>>>> peventsHttp = (struct epoll_event *) xmalloc(1 *
>>>> sizeof(struct epoll_event)); //NEW
>>>>
>>>> kdpfdHttp = epoll_create(1); //NEW
>>>>
>>>>
>>>> }
>>>>
>>>> void commSetSelect(int fd, unsigned int type, PF * handler, void
>>>> *client_data, time_t timeout)
>>>> {
>>>>
>>>>
>>>>
>>>> if (!F->flags.open) {
>>>> if (useHttpThread) //NEW
>>>> epoll_ctl(kdpfdHttp,
>>>> EPOLL_CTL_DEL, fd, &ev); //NEW
>>>> else
>>>> epoll_ctl(kdpfd, EPOLL_CTL_DEL, fd,
>>>> &ev);
>>>> return;
>>>> }
>>>>
>>>>
>>>>
>>>> if (fd==getHttpfd()) { //NEW
>>>> printf("********Setting epoll_ctl for
>>>> httpfd=%d\n",getHttpfd());
>>>> if (epoll_ctl(kdpfdHttp, epoll_ctl_type, fd, &ev) <
>>>> 0) {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8,
>>>> "commSetSelect: epoll_ctl(," <<
>>>> epolltype_atoi(epoll_ctl_type) << ",,): failed on FD " << fd << ": "
>>>> << xstrerror());
>>>> }
>>>> }
>>>>
>>>>
>>>> comm_err_t
>>>> comm_select(int msec)
>>>> {
>>>> int num, i,fd=-1;
>>>> fde *F;
>>>> PF *hdl;
>>>>
>>>> //SHM: num2
>>>> int num2=-1; //NEW
>>>> //SHM: End
>>>>
>>>> struct epoll_event *cevents;
>>>> struct epoll_event *ceventsHttp;
>>>> printf("Inside comm_select:comm_epoll.cc\n");
>>>> //PROF_start(comm_check_incoming);
>>>>
>>>> if (msec > max_poll_time)
>>>> msec = max_poll_time;
>>>>
>>>>
>>>> for (;;) {
>>>> printf("(for(;;):Inside comm_select:comm_epoll.cc\n");
>>>> num = epoll_wait(kdpfd, pevents, 1, msec);
>>>>
>>>> //SHM: epoll_wait for kpdfdHttp
>>>> if (useHttpThread) { //NEW
>>>> printf("(for(;;):USEHTTP:Inside
>>>> comm_select:comm_epoll.cc\n");
>>>> num2 = epoll_wait(kdpfdHttp, peventsHttp, 1, msec);
>>>> //NEW
>>>> printf("\n\n\n num2=%d\n\n\n", num2);
>>>> }
>>>> //SHM: End
>>>>
>>>> ++statCounter.select_loops;
>>>>
>>>> if (num >= 0 || num2 >= 0) //NEW
>>>> break;
>>>>
>>>> if (ignoreErrno(errno))
>>>> break;
>>>>
>>>> getCurrentTime();
>>>>
>>>> //PROF_stop(comm_check_incoming);
>>>>
>>>> return COMM_ERROR;
>>>> }
>>>>
>>>> // PROF_stop(comm_check_incoming); //PLEASE DISCUSS
>>> THIS...........
>>>
>>> The PROF_* bits are rarely used. Removing them from here is acceptable as
>>> a
>>> temporary measure when its initially threaded. Long-term they need
>>> looking
>>> at making thread-safe. IMO the entire time spent in one of the poll
>>> threads
>>> counts as comm_check_incoming so maybe a new API to PROF_* needs to be
>>> developed to account for thread times. This is another SMP job though,
>>> slightly out of scope of your comm loop upgrade.
>>>
>>>
>>>> getCurrentTime();
>>>>
>>>> statHistCount(&statCounter.select_fds_hist, num);
>>>>
>>>> if (num == 0 && num2 == 0) //NEW (taken lots of my time to fix
>>>> this)
>>>> return COMM_TIMEOUT; /* No error.. */
>>>>
>>>> // PROF_start(comm_handle_ready_fd);
>>>>
>>>> printf("\nChoosing thread..............................\n");
>>>> //NEW
>>>>
>>>>
>>>>
>>>>
>>>> //HERE THE HTTP FD IS HANDLED SEPARATELY This could be our new
>>>> thread //NEW START
>>>>
>>>> if (num2 > 0) {
>>>> printf("\nINSIDE Thread now\n\n");
>>>> ceventsHttp = peventsHttp;
>>>> fd = ceventsHttp->data.fd;
>>>> F = &fd_table[fd];
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): got FD
>>>> " << fd << " events=" <<
>>>> std::hex << ceventsHttp->events << " monitoring=" <<
>>>> F->epoll_state <<
>>>> " F->read_handler=" << F->read_handler << "
>>>> F->write_handler=" << F->write_handler);
>>>>
>>>>
>>>>
>>>> if (ceventsHttp->events & (EPOLLIN|EPOLLHUP|EPOLLERR) ||
>>>> F->flags.read_pending) {
>>>> if ((hdl = F->read_handler) != NULL) {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling
>>>> read handler on FD " << fd);
>>>> PROF_start(comm_write_handler);
>>>> F->flags.read_pending = 0;
>>>> F->read_handler = NULL;
>>>> hdl(fd, F->read_data);
>>>> PROF_stop(comm_write_handler);
>>>> statCounter.select_fds++;
>>>> } else {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no read
>>>> handler for FD " << fd);
>>>> // remove interest since no handler exist for this event.
>>>> commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0);
>>>> }
>>>> }
>>>>
>>>> if (ceventsHttp->events & (EPOLLOUT|EPOLLHUP|EPOLLERR)) {
>>>> if ((hdl = F->write_handler) != NULL) {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling
>>>> write handler on FD " << fd);
>>>> PROF_start(comm_read_handler);
>>>> F->write_handler = NULL;
>>>> hdl(fd, F->write_data);
>>>> PROF_stop(comm_read_handler);
>>>> statCounter.select_fds++;
>>>> } else {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no
>>>> write handler for FD " << fd);
>>>> // remove interest since no handler exist for this event.
>>>> commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0);
>>>> }
>>>> }
>>>> }
>>>>
>>>> //NEW ENDS HERE
>>>>
>>>>
>>>> for (i = 0, cevents = pevents; i < num; i++, cevents++) {
>>>>
>>>>
>>>> fd = cevents->data.fd;
>>>> if ( fd == getHttpfd() )
>>>> continue;
>>>> F = &fd_table[fd];
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): got FD " << fd
>>>> << " events=" <<
>>>> std::hex << cevents->events << " monitoring=" <<
>>>> F->epoll_state <<
>>>> " F->read_handler=" << F->read_handler << "
>>>> F->write_handler=" << F->write_handler);
>>>>
>>>> // TODO: add EPOLLPRI??
>>>>
>>>> if (cevents->events & (EPOLLIN|EPOLLHUP|EPOLLERR) ||
>>>> F->flags.read_pending) {
>>>> if ((hdl = F->read_handler) != NULL) {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling
>>>> read handler on FD " << fd);
>>>> PROF_start(comm_write_handler);
>>>> F->flags.read_pending = 0;
>>>> F->read_handler = NULL;
>>>> hdl(fd, F->read_data);
>>>> PROF_stop(comm_write_handler);
>>>> statCounter.select_fds++;
>>>> } else {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no read
>>>> handler for FD " << fd);
>>>> // remove interest since no handler exist for this event.
>>>> commSetSelect(fd, COMM_SELECT_READ, NULL, NULL, 0);
>>>> }
>>>> }
>>>>
>>>> if (cevents->events & (EPOLLOUT|EPOLLHUP|EPOLLERR)) {
>>>> if ((hdl = F->write_handler) != NULL) {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): Calling
>>>> write handler on FD " << fd);
>>>> PROF_start(comm_read_handler);
>>>> F->write_handler = NULL;
>>>> hdl(fd, F->write_data);
>>>> PROF_stop(comm_read_handler);
>>>> statCounter.select_fds++;
>>>> } else {
>>>> debugs(5, DEBUG_EPOLL ? 0 : 8, "comm_select(): no
>>>> write handler for FD " << fd);
>>>> // remove interest since no handler exist for this event.
>>>> commSetSelect(fd, COMM_SELECT_WRITE, NULL, NULL, 0);
>>>> }
>>>> }
>>>> }
>>>>
>>>> // PROF_stop(comm_handle_ready_fd);
>>>>
>>>> return COMM_OK;
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>> Amos
>>
>>
>>
>>
>> --------------------------------------------
>> --------------------------------------------
>> Yes u r right, when there are multiple ports then the things will be
>> different but, they can be embedded easily and all descriptors can be
>> handled in the same way, I guess . This change will not be effective
>> much, as codes are not optimized for all descriptors and is not
>> generalized solution for them.
>>
>> I found that there is delay between two select loops which can be
>> minimized to delay of its own descriptor, so concentrated upon....
>>
>
> I'm not sure exactly what you are saying there. Two select loops being two
> runs of the FD checking loop? or delay when two polling loops are run?
>
> I read your earlier explanation to mean there was a delay doing accept() on
> FD 1 (for example) when FD 1,2,3,4,5,6,7 had reads pending.
>
> And your solution to be:
> thread #1 - looping on read() FD 2,3,4,5,6,7 + other stuff
> thread #2 - looping on accept() FD 1
>
>> And trying to find the locking mechanism for async queues also , seems
>> it taking time.
>
> I mean some mutex/lock on AsyncCallQueue so the multiple threads can do
> AsyncCallQueue::schedule(call) without pointer collisions with theTail and
> theHead, when they setup the first read of an accepted()'d FD.
>
> For example something like this...
>
> thread #1:
> while ( <events to dial> ) {
> queuedEventX.dial()
> }
>
> thread #2:
> while( <listening for new connections> ) {
> newFD = accept()
> readFromNewFd = new AsyncCallPointer...
> AsyncCallQueue::schedule(readFromNewFd);
> }
>
>
>>
>> Need some time for further analysis...........
>>
>> Thank you so much......
>>
>
> Amos
> --
> Please be using
> Current Stable Squid 2.7.STABLE6 or 3.0.STABLE19
> Current Beta Squid 3.1.0.13
>
Received on Mon Sep 14 2009 - 22:14:15 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 15 2009 - 12:00:04 MDT