Re: [PATCH] CacheManager subscription to IPC

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 31 Aug 2011 21:32:08 +1200

On 31/08/11 18:02, Alex Rousskov wrote:
> On 08/13/2011 05:59 AM, Amos Jeffries wrote:
>> This migrates Cache Manager to using the new IPC subscription API for
>> handling message events.
>>
>> To do this the CacheManager is converted to an AsyncJob which runs from
>> the point of first action registration until shutdown. The existing
>> event handlers are move into its scope as-is.
>
>
>> === modified file 'src/CacheManager.h'
>
>> +class CacheManager : public AsyncJob
>
> If CacheManager is a job now, we should not communicate with it using
> Instance() and most public methods because it may not be there (or may
> not be ready to talk) when we want to talk to it. Jobs should
> communicate using async calls.
>
> Please note that I do think CacheManager should be a job [eventually],
> but we should change its usage accordingly or, at the very least,
> provide temporary controls that bad things do not happen (e.g.,
> CacheManager::Instance() returning a stale pointer to the job that has
> been destroyed).
>
>
>> === modified file 'src/cache_manager.cc'
>
>
>> +void
>> +CacheManager::swanSong()
>> +{
>> + Ipc::ClearListenFor(Ipc::mtCacheMgrRequest);
>> + Ipc::ClearListenFor(Ipc::mtCacheMgrResponse);
>> +}
>
>> +bool
>> +CacheManager::doneAll() const
>> +{
>> + // Once started, only exit on shutdown.
>> + return shutting_down != 0;
>> +}
>
> Please call the parent's method at the end of the above methods (even
> though the current parent's method may not do anything).
>
>
>
>> + Subscription::Pointer sub;
>> + RefCount<IpcMsgCall> call;
>
> It is better to avoid creating Pointer-like variables until they are
> needed as it saves CPU cycles and simplifies code.
>
> Also, If you can make them const, please do so.
>
>
>> + debugs(16, 1, "Startup: CacheManager: Accepting manager IPC responses");
>> + debugs(16, 1, "Startup: CacheManager: Accepting manager IPC requests");
>
> Use DBG_IMPORTANT if you think these debugging messages warrant level-1
> reporting. IMO, they should use level 2.
>
>
>> + // XXX: Just run the action? we are inside an async manager call already.
>> + Mgr::Action::Pointer action = createRequestedAction(request.params);
>> + if (Ipc::Coordinator::Instance())
>> + AsyncJob::Start(new Mgr::Inquirer(action, request, Ipc::Coordinator::Instance()->strands()));
>> + // else throw? coordinator scheduled a call and abandoned us.
>
>
> This feels very awkward or wrong, but perhaps I am just missing something:
>

Probably me who is missing something (or a lot). This is my first
attempt to make SMP messages work after all :)

> 1) Running the action in Coordinator requires a lot of messages. So the
> XXX comment cannot apply to the in-Coordinator code. That code does need
> a Mgr::Inquirer job!
>

K. XXX was to get your attention and it worked :). Will drop it now.

That code is in CacheManager::, at this point I'm not entirely sure what
actualy runs where.

   I am assuming that the request was scheduled by an Ipc::Coordinator
object in the current process. Otherwise what else would SMP use to
schedule Mgr::Request calls for CacheManager to pick up?

> 2) If Ipc::Coordinator::Instance() is nil, I assume the code is running
> inside a regular kid. Regular kids should not even know about
> Coordinator class and all the stuff that comes with it. The only thing
> they should know about Coordinator is how to send it an IPC message. It
> is wrong, IMO, for the CacheManager class code to depend on Coordinator
> class.

It has assumed the code snippet from src/ipc/Coordinator for handling of
a Mgr::Request. I am assuming that if Coordinator exists in the current
process to schedule *this call. If not, what did? I need to merge that
codes reply bits into here too.

>
> 3) The "else throw?" comment looks strange because regular kids do not
> instantiate a Coordinator object (I hope!). They should actually perform
> the request. I do not see the code that actually runs the action in the
> new CacheManager::handleIpcRequest(), which is exactly the place I would
> expect that code to be.
>

Aha! One more likely cause of my hung response problem.

>
> Overall, there seems to be some confusion (possibly on my part!) about
> what kind of process/class would run the above code. IIRC, it should be
> Coordinator process/class, not the general kid process and the general
> cache manager class. The cache manager class should have the code to
> actually run the action locally, at Coordinator request.
>
> I will stop reviewing here as it will probably be a waste of time
> whether I am confused or the code.

Okay. You have brought up a few things I need to ask about before going
further.

I was under the dubious assumption that each SMP process had an
Ipc::Coordinator _Job_ which read UDS packets and called the appropriate
function based on type. With the coordinator _process_ version of that
Job doing more shovelling around stuff in its actions than the workers.
   - my change #1 was to make that function an AsyncCall and schedule it
instead of calling synchronously. All fine. Nothing that could cross
processes there.

  - my questionable change (#2) was to make the CacheManager a Job which
was instantiated during reconfigure/startup. (that would be one for each
worker yes?)

    - during setup that Manager Job uses Ipc::Coordinator::Instance()
and registers its Mgr::Request/Mgr::Reply actions AsyncCalls to receive
mgr: packets. (that might start in Ipc::Coordinator in each worker?)

What I am trying to achieve is:
   - worker A received mgr:action URL. Sends Mgr::Request (UDS packet)
to ?1?.
   - ?2? schedules Manager:: subscription call to handle the
packet/TypedMsgHdr
   - Question: manager ACKs receipt with Mgr::Response immediately?
   - CacheManager handling it schedules UDS requests for all the workers
through ?2?
   - ?3? in the worker B-F receives UDS packet and produces a reply, and
passes it back to ?3?
   - ?2? passes the worker B-F replies to manager which merged and send
the reply via UDS to worker A.

Halfway sane? or batting off left field?

?1? and ?2? would be the same Ipc::Coordinator Job right? or is ?1? a
worker UDS handler and ?2? the coordinator process UDS handler?

(gah, looks like a mess, hope you can read that.)

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.15
   Beta testers wanted for 3.2.0.10
Received on Wed Aug 31 2011 - 09:32:31 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 31 2011 - 12:00:03 MDT