Re: Squid-3.2 status update

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 04 Jul 2012 23:47:35 -0600

On 07/04/2012 05:34 PM, Amos Jeffries wrote:
>>> 3124 - Cache manager stops responding when multiple workers used
>>> ** requires implementing non-blocking IPC packets between workers and
>>> coordinator.
>>
>> Has this been discussed somewhere? IPC communication is already
>> non-blocking so I suspect some other issue is at play here. The specific
>> examples of mgr commands in the bug report (userhash, sourcehash,
>> client_list, and netdb) seem like non-essential in most environments
>> and, hence, not justifying the "major" designation, but perhaps they
>> indicate some major implementation problem that must be fixed.

> UNIX sockets apparently guarantee the write() is blocked until recipient
> process has read() the packet.

That is not true in general. I just wrote a basic UDS client and server
to test this (attached), and I can send packets much faster than the
server reads them. Linux keeps a queue of messages. The I/O may become
blocking if the queue is full, but I suspect select(2) or equivalent
will not let us send a new message under that condition (or the send
will fail rather than block).

There is a sysctrl option (net.unix.max_dgram_qlen in recent kernels)
that controls the number of messages that can be queued between the
client and server.

It is possible that UDS sockets behave differently in some environments
that I have not tested, but I doubt.

Why do you think that UNIX sockets block write() until recipient has
read() the packet?

> Last I
> looked the coordinator handling function also called component handler
> functions synchronously for them to create the response IPC packet.

Ipc::Coordinator::handleCacheMgrRequest() starts an async job to satisfy
the received cache manager request.

There are some Ipc::Coordinator::handle*() methods that create the final
response synchronously, but they should all be very fast and not worth
creating an async job.

Are you talking about some other coordinator handling functions that
block for a long time?

> AFAIK this is waiting on the Subscription and generic (immediate-ACK)
> IPC packets, which will free up the coordinator and workers for other
> async operations even if a large process is underway.

IIRC, subscription was needed to resolve IPC linking problems. It is
possible that it is needed for this bug as well, but since I cannot tell
what this bug is, I do not know whether subscription is the solution. I
thought you knew because of your "requires implementing non-blocking IPC
packets" solution summary. That is why I started asking questions...

Alex.
P.S. Output of the attached UDS server that sleeps to be slower than the
client. All sent messages are received, some after the client is gone:

> $ ./uds-server.pl /tmp/uds
> 1341466849 waiting for messages
> 1341466853 got msg #01 after 4.00 seconds ... sleeping for 3.00 seconds
> 1341466856 got msg #02 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466859 got msg #03 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466862 got msg #04 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466865 got msg #05 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466868 got msg #06 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466871 got msg #07 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466874 got msg #08 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466877 got msg #09 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466880 got msg #10 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466883 got msg #11 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466886 got msg #12 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466889 got msg #13 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466892 got msg #14 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466895 got msg #15 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466898 got msg #16 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466901 got msg #17 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466904 got msg #18 after 3.00 seconds ... sleeping for 3.00 seconds
> 1341466907 got msg #19 after 3.00 seconds ... sleeping for 3.00 seconds

UDS client that sends as fast as it can. Note the blocking after the
queue gets full around msg #12 (we are using blocking I/O here):

> $ ./uds-client.pl /tmp/uds
> 1341466853 sending with max queue length of 10 messages
> 1341466853 sent msg #01 after 0.00 seconds
> 1341466853 sent msg #02 after 0.00 seconds
> 1341466853 sent msg #03 after 0.00 seconds
> 1341466853 sent msg #04 after 0.00 seconds
> 1341466853 sent msg #05 after 0.00 seconds
> 1341466853 sent msg #06 after 0.00 seconds
> 1341466853 sent msg #07 after 0.00 seconds
> 1341466853 sent msg #08 after 0.00 seconds
> 1341466853 sent msg #09 after 0.00 seconds
> 1341466853 sent msg #10 after 0.00 seconds
> 1341466853 sent msg #11 after 0.00 seconds
> 1341466853 sent msg #12 after 0.00 seconds
> 1341466856 sent msg #13 after 3.00 seconds
> 1341466859 sent msg #14 after 3.00 seconds
> 1341466862 sent msg #15 after 3.00 seconds
> 1341466865 sent msg #16 after 3.00 seconds
> 1341466868 sent msg #17 after 3.00 seconds
> 1341466871 sent msg #18 after 3.00 seconds
> 1341466874 sent msg #19 after 3.00 seconds
> ^C

Received on Thu Jul 05 2012 - 05:47:39 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 05 2012 - 12:00:03 MDT