[MERGE] SMP implementation, part 1, take2

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 01 Jul 2010 17:38:37 -0600

On 07/01/2010 06:28 AM, Amos Jeffries wrote:
>> A brief list of changes and to-dos is provided below. Besides the
>> acceptance review, I need help with these two questions:
>>
>> 1. Should we rename the squid.conf option specifying the number of
>> processes from "main_processes" to "worker_processes" or even "workers"?
>> The forked processes do almost everything the old daemon process did,
>> but that will change as we add more specialized processes. There is also
>> a special "Coordinator" process that is launched automatically in SMP
>> mode. It does not handle regular HTTP transactions.

> Workers brings to mind the right meaning from use by apache and other
> software. +1 from me on calling it worker_processes (we will have
> threads later).

OK. Would you prefer "worker_strands"? Strand is sometimes used for
something that is either a process or a thread. We used Strand in the
code, but I am not sure it is recognizable enough for squid.conf use.

Or should we just use "workers" (with no _suffix) and avoid the
process-versus-thread distinction altogether?!

>> 2. We had to clone comm_openex into comm_open_uds because Unix Domain
>> Sockets do not use IP addresses. They use file names. We can unify and
>> simplify related code if we add the file name (struct sockaddr_un) to
>> Ip::Address. IIRC, Amos and I discussed this on IRC and decided that it
>> is OK to add sockaddr_un to Ip::Address (because it is used as a general
>> "network address" anyway), but I would prefer to hear more opinions
>> before altering Ip::Address.
>>
>>
>> Change log:
>>
>> * Added main_processes squid.conf option to specify how many worker
>> processes to fork and maintain. Zero means old no-deamon mode. One means
>> the old non-SMP mode.
>>
>> * Added support for process_name and process_number macros and
>> if-statement conditionals in squid.conf. Search for .pre changes for
>> documented details. These features allow the admin to configure each
>> worker process differently if needed.
>
> Regarding the parser, use of un-bracketed assignment in conditionals as
> Henrik keeps reminding me these fail on some compile modes.
> You are using a number of if(a=b) and when(a=b) statements,
> particularly in the parser. These will need to become if((a=b)) etc.

Sorry, I do not see any "if (a = b)". Please note that "if (type a = b)"
is a different (and highly recommended!) use. Here are all the
assignments that I could find in the patch:

> + } else if (const char* equation = strchr(expr, '=')) {
> + if (const char* expr = FindStatement(tmp_line, "if")) {
> + if ((found = HttpSockets[i] < 0))
> + if ((new_socket = socket(AI.ai_family, AI.ai_socktype, AI.ai_protocol)) < 0
> + if (StrandCoord* found = findStrand(strand.kidId))
> + if (const char *idStart = strrchr(processName, '-')) {
> + if ((pid = fork()) == 0) {

All seem kosher to me. Did I miss some?

>> * Support multiple workers listening on the same HTTP[S] port (port
>> sharing). This allows multiple workers to split the load without any
>> special rules.
>
> You add a comment question:
> fatal("No HTTP or HTTPS ports configured"); // defaults prohibit this?
>
> ... not quite. The defaults make an entry in the config for new
> installs, if its removed or an old 2.5 config without port is used that
> error case still occurs.
>
> The MacOS people have an open bug requesting the Mac-service -I option
> get ported to 3.x which will mean no port configured by default, but an
> open socket passed in on stdin to the master process later.

Removed. Thank you for answering the question.

>> * Support or prohibit port sharing for WCCP, DNS, ICP, HTCP, SMP, and
>> Ident protocols, depending on protocol-specific restrictions. Sharing is
>> implemented by registering listening socket descriptors with the
>> Coordinator process and obtaining them from the Coordinator as needed.
>> Here are protocol-specific notes:
>>
>> WCCP: Restricted to the Coordinator process due to how WCCP works.
>> Workers do not need access to the WCCP code.
>>
>> DNS: Done by each worker with no sharing. Fixed source ports not
>> supported unless each worker is given its own outgoing address
>> because we do not want to match outgoing queries and incoming
>> responses across processes.
>
> This is a good reason for adding the often pondered dns_outgoing_address
> directive. It solves a few issues in other low-priority use cases as well.
>
>>
>> SNMP: Workers share incoming and outgoing sockets.
>
> Does this really make sense?
> SNMP stats will be different for each worker and particularly in the
> client table where things are indexed by the particular client IPs which
> connected to each worker.
> I think better a single port managed by the master with a new OID for
> child process somewhere. But the design will need a good thinking out
> separate to this.

You are correct that more work is needed for SNMP stats to present a
"whole Squid" picture. I should have said that explicitly. This work is
similar to the cache manager problem. Neither problem is addressed with
this patch.

I am not sure whether we should make the Coordinator process responsible
for receiving SNMP requests. There are arguments for and against that
design. This question is not tightly related to the "whole Squid" stats
issue, and I will discuss it separately/later.

>> ICP and HTCP _clients_: Cannot be supported in SMP environment
>> unless each process has its own address (i.e., unique IP address
>> and/or unique [ICP] port) because we do not want to match outgoing
>> queries and incoming responses across processes.
>
> Huh? "or unique port" this is entirely doable right?

Yes, except HTCP does not officially support custom ports (which was a
surprise for me).

>> * Support management signals (squid -k ...) in SMP mode, acting as a
>> single Squid instance.
>>
>> * Refork dying workers, similar to how we reforked dying process in
>> non-SMP daemon mode.
>
> Um, have you checked if this new process structure obsoletes the old
> kill-parent hack? That would be nice to kill off cleanly.

I have not investigated that hack, and I do not think we touched the
related code. However, I do not think we can just remove it because not
all Squids will run in SMP mode -- we worked hard to preserve old
non-SMP semantics for 1-worker daemon mode to the extent possible.

>> Detailed change descriptions are at
>> https://code.launchpad.net/~rousskov/squid/smp
>>
>>
>> To do (all these projects are in-progress now):
>>
>> - SMP-aware cache manager.
>
> This may be impacted by the local file server extension changes (PAC
> stuff). Making the internal server requests accept http(s):// namespace
> will mean cachemgr requests can also be requested direct from a browser.

The work-in-progress code will handle that fine: Workers accept cache
manager requests and then route them to Coordinator for fulfillment,
including stats aggregation.

> Now the code...
>
> * Note: Careful with allocation methods for addrinfo. IIRC there were
> some problems if the sockaddr field was on the stack. Been a while so I
> can't point my finger at specifics any more, but it's worth watching for
> when this code goes mainstream.

I have very vague recollection of those problems, but I think in the end
you concluded that the bugs were actually elsewhere. Also, if we extend
your Ip::Address to handle UDS, the related custom addrinfo code will no
longer be needed.

> * debugLogKid() looks like it results in the static buffer being filled
> with identical content on every debug write.

Optimized.

> - If the process is called a "worker" the log output should probably
> be "worker%d" to avoid confusion. Even if the class is Kid internally.

Coordinator is not a worker, so we cannot use the "worker" label for all
processes/kids. Also, there will be more specialized processes/kids in
the future.

> - given the first point the coordinator could return its "coordinator"
> string or somesuch for debugging.

It may be slightly better to use _one_ label for all processes in the
debugging log. We could use "process" but that is long and
implementation-dependent. "Kid" is better. A more complex naming scheme
makes searching and processing the log file more difficult.

FWIW, here is how the current SMP debugging log looks:

> 2010/07/01 17:30:50.741 kid3| event.cc(343) schedule: schedule: Adding 'Maintain
> 2010/07/01 17:30:50.741 kid3| leaving MaintainSwapSpace()
> 2010/07/01 17:30:50.741 kid3| Engine 0x7fffd9aa9a30 is idle.
> 2010/07/01 17:30:50.741 kid3| Engine 0x7fffd9aa9a20 is idle.
> 2010/07/01 17:30:50.780 kid1| Engine 0x7fff9e4285b0 is idle.
> 2010/07/01 17:30:50.780 kid1| event.cc(251) checkEvents: checkEvents
> 2010/07/01 17:30:50.780 kid1| The AsyncCall fqdncache_purgelru constructed, this
> 2010/07/01 17:30:50.780 kid1| event.cc(260) will call fqdncache_purgelru() [call
> 2010/07/01 17:30:50.780 kid1| Engine 0x7fff9e4285a0 is idle.
> 2010/07/01 17:30:50.780 kid1| entering fqdncache_purgelru()
> 2010/07/01 17:30:50.780 kid1| AsyncCall.cc(32) make: make call fqdncache_purgelr
> 2010/07/01 17:30:50.780 kid1| event.cc(343) schedule: schedule: Adding 'fqdncach
> 2010/07/01 17:30:50.780 kid1| fqdncache_purgelru: removed 0 entries
> 2010/07/01 17:30:50.780 kid1| leaving fqdncache_purgelru()
> 2010/07/01 17:30:50.780 kid1| Engine 0x7fff9e4285b0 is idle.
> 2010/07/01 17:30:50.780 kid1| Engine 0x7fff9e4285a0 is idle.
> 2010/07/01 17:30:50.780 kid2| Engine 0x7fff81643580 is idle.
> 2010/07/01 17:30:50.780 kid2| event.cc(251) checkEvents: checkEvents
> 2010/07/01 17:30:50.780 kid2| The AsyncCall fqdncache_purgelru constructed, this
> 2010/07/01 17:30:50.780 kid2| event.cc(260) will call fqdncache_purgelru() [call

If others prefer a collection of "workerN", "coordinator", and similar
specific labels, instead of the current "kidN" for all processes, I will
change the code.

> * Ipc::Coordinator::findStrand() should be const.

It should not if we want to follow STL conventions. The method returns a
non-const address of the strand array member. That is why
std::vector::at() and similar methods have two versions: const and
regular. Our code does not need the const version, for now.

> * CBDATA_CLASS2(Coordinator) needs to be the last entry in the class
> definition for consistency and long-term safety the way it plays with
> public/private.

Moved.

> * Kids::find() and Kids::get() methods should be const.

They should not. See Ipc::Coordinator::findStrand() comment above.

> * Ipc::UdsSender::UdsSender contains TODO about retries and timeout
> being configurable.... Yes please. sub-options to worker_processes N
> seems appropriate. With default less than 10s on timeout please. We
> already get complaints if total startup takes more than ~15 seconds.

Please note that the timeout does not affect _successful_ startups.

Since we are forking processes, it might take more than a few seconds to
[re]start a kid, especially on a box under heavy load. Default timeouts
should be permissive, not aggressive, IMO, so that things "work out of
the box", (albeit suboptimally) rather than require careful tunning.

I will leave these as TODOs for now, primarily because it is not yet
clear whether the current brute force "send until the other process is
started" algorithm is good enough. It is kind of annoying and I wonder
if it can be improved, which may result in a different set of cfg options.

I do agree that startups should be fast, and we have lots of work to do
in that area, but they are outside this project scope.

> * code formatting is needed in quite a few places. Do you have a machine
> with the right version of astyle available? Or you can checkout to
> squid-cache.org and format there please. It will likely help the update
> merge.

I did not format on purpose to minimize annoying code non-changes due to
whitespace/indentation changes. Checkout on squid-cache.org is a great
idea! Will do just before merging.

> * src/main.cc - while doing this process rework would you mind merging
> all those main.cc signals and option flag globals into a single struct
> or public class with one static global? or at least make the new flags
> you need to add use one such structure which we can later migrate the
> old flags into?

I do not see much value in moving simple static variables into a common
structure. If others want it, I am happy to do it some time after the
code is merged. I really want to minimize the number of changes here
because I may have to port this to older Squid versions...

> * src/main.cc - prog = xstrdup(argv[0]); /* XXX: leak */
> prog should be a stack field of the function yes? which gets strcpy()
> filled out. POSIX limits on filename and path length can be used to
> define the variable size.

This is old code. I just added a possibly incorrect comment (can there
be a leak if we call execvp()?!).

If you want to fix it, I think we can just copy argv[0] pointer without
xstrdup-ing it:

    const char *prog = argv[0];
    argv[0] = kid.name().termedBuf();
    execvp(prog, argv);

Would that work?

> * src/protos.h - please do not add new stuff to this file.

The "new stuff" is from tools.cc which does not have a dedicated header
other than protos.h, right? Which existing header file the new tools.cc
code should be declared in? Seems like we may need to create tools.h or
rename protos.h but that is outside this project scope.

> * test-suite/testheaders.sh - whats with those bits as well?

Unrelated change, but I do not know how to exclude it from the merge
bundle :-(. I may have posted it as a separate [PATCH]. If not, I will.

> Apart from those small bits it looks good.

The attached patch contains all of the changes discussed above.

I also noticed that Descriptor put/get IPC messages in the patch are not
used. That temporary code existed to test descriptor passing between
processes and will be removed. Please ignore it.

> Big Picture bits...
>
> * How does the socket sharing mesh with the worker Comm::ListenStateData
> accept() classes so far in practice? particularly the AcceptLimiter.

After the shared socket is set up, the process does not do anything
special to share it. It is the OS responsibility to handle accept race
conditions and such. Thus, there should be no effect on accept limiter.
This is just a theory though. Please let me know if I missed some broken
dependencies.

> * Do all workers share all http_port directives still?

Yes, by default. If you want worker-specific directives, use squid.conf
conditionals and/or macro substitutions. This is the same for all old
squid.conf options, not just http_port.

> I'm contemplating ways to expand a client-contactable port for
> ERR_AGENT_CONFIGURE/ERR_AGENT_PACFILE page macros and correct internal
> request URLs. The best contender is walking the http_port config lines
> to find one without special-mode flags. But this depends on having
> access to that global info in the worker.

No changes there, as far as I can tell. If you apply your patches after
the SMP code is committed, one of the workers will happily serve the PAC
file or any other locally hosted file (by default). In other words, your
code will become SMP-scalable...

Thank you,

Alex.

Received on Thu Jul 01 2010 - 23:39:30 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 02 2010 - 12:00:07 MDT