Re: [PATCH] Master transaction state object

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 10 Jun 2013 00:50:28 +1200

New patch attached for review.

This excludes the XXX/TODO and pipeline parts, most of which are already
merged and the remainder being deferred until a later update.

Amos

On 11/05/2013 4:31 p.m., Amos Jeffries wrote:
> On 11/05/2013 5:07 a.m., Alex Rousskov wrote:
>> On 05/10/2013 09:05 AM, Amos Jeffries wrote:
>>
>>> Bumping this old design conversation back into gear. I would like to
>>> start implementation and rollout of this master transaction code,
>>> effective immediately.
>> I think this is a good start, but I do not think the proposed patch
>> should go in immediately.
>>
>> First of all, pipelining changes are a totally separate can of worms
>> IMO. If you insist on merging MasterXact changes and pipelining change
>> in one patch, we will have to accept that, but it would be best to
>> separate the two issues into two different email threads/patches.
>
> I'm already separating it out. The SQUID_PIPELINE_MAX and associated
> XXX in a standalone patch will be submitted later today I think. The
> array stuff is just to kee the Client* pipeline objects list separate
> from eth Xaction at this point and I can drop easily.
>
>
>> I will focus on higher-level issues for pipelining and then provide a
>> more detailed review of MasterXact changes below.
>>
>>
>> Pipelining:
>>
>> * If you are going to allow more than two transactions, the depth of the
>> pipeline should probably be configurable via squid.conf, replacing or
>> deprecating the on/off switch we already have. You may still have a
>> hard-coded maximum, but only if there is a good reason to have it. I do
>> not see one.
>
> Agreed. However at this pint I'm not even happy making it generally
> configurable. Thus the developers-only setting while we figure out for
> certain where the flaws actually are.
>
>> * Please use an std::queue instead of C array for the
>> ConnStateData::xaction pipeline. Arrays are poor underlying structures
>> for FIFO queues.
>
> Ack. queue or the current list management does seem better. I will
> drop it from the initial patch for now, we will have to have a final
> decision by the time the parser updates start to happen.
>
>> * Please remove references to pipelining from MasterXaction.h. Nothing
>> in MasterXaction.h should expose users to details relevant to only a
>> small part of Squid.
>
> Done. It is left in client_side.h for now.
>
>> Master Transaction:
>>
>>> === modified file 'src/CommCalls.h'
>> ...
>>> + /// Transaction which this call is part of.
>>> + MasterXaction::Pointer xaction;
>>
>> It feels wrong to place that object inside every Comm call. A Comm call
>> is a very low-level event. What would it do with a Master Transaction?
>> It would not even know which part of the Master Transaction corresponds
>> to the call! In fact, there are Comm calls that are not associated with
>> a master transaction. Let's not try to cram master transactions
>> everywhere just because it is needed somewhere. It will force us to
>> either create fake master transactions or deal with possibly nil master
>> transaction pointers.
>
> Right now the TcpAcceptor AsyncJob is using it in AcceptCbParams to
> pass the new transaction out to its child.
> I am envisioning that the CloseCbParams will need to use it to pass it
> to the Comm layer close handlers, and that the IoCbParms will need to
> use it to pass to low level I/O handlers - for each of those
> components to write the stats accounting details into the xaction
> data. For now it is a NULL pointer on these parameters.
>
>>
>>> +#ifndef SQUID_SRC_MASTERXACTION_H
>>> +#define SQUID_SRC_MASTERXACTION_H
>>> +
>>> +#include "anyp/PortCfg.h"
>> Please avoid exposing every master transaction user to PortCfg.
>
> Will do. Planning to add anyp/forward.h to replace this.
> FWIW all the callers I can find *do* need the port details (and
> Comm::Connection details) in one way or another. But I agree
> consistency on this will be useful.
>
>>
>>> +/** Master transaction state object.
>> Do we really want this class to store the current state of the master
>> transaction? Or should it only contain history/stats? The difference is
>> critical for the design of this class IMO:
>>
>> If master transaction reflects the current state, it may have pointers
>> to client-side, server-side, and adaptation-side jobs (to name just a
>> few), facilitating their communication. It will be a powerful tool, but
>> will probably increase the dependencies among various "sides" and may
>> ultimately lead to messy and buggy code where one side is trying to
>> figure out what other sides are currently active and communicate with
>> them. We should be very careful if we decide to move down this path.
>>
>> If master transaction collects the history and stats, it becomes very
>> useful for annotations and logging, but would not help with inter-side
>> communication.
>
> Yes. I am thinking this be the stats and history up to the present. As
> we roll it out we need to make clear decisions about what state is
> shared and goes in here and what state is private to the AsyncJob.
> With an API on the Job (Calls? Subscription?) for accessing the
> private details.
>
> Communication between the "sides" should be done via client-streams
> API or whatever we adapt it into. I have alternative designs going
> through my head based on the HTTP/2 frame-based messaging which can
> pass blocks between components. For now that means the current methods
> of accessing other sides Jobs will need to be left in place alongside
> this MasterXaction state rollout.
> The only reasons we have today for HttpStateData for example needing
> to access something as far away as ConnStateData is to bypass all the
> store systems lack of 1xx support (broken store API), or to access the
> PortCfg details (lack of master transaction data in HttpStateData).
>
>> We will need to adjust the description of the MasterXaction class
>> accordingly. For now, I suggest adding three important pieces of
>> information:
>>
>> * What is a "client transaction"? Should we define that as anything
>> worth logging into access.log? Or does that extend to ICP clients? SNMP?
>>
>> * What event starts/creates a master transaction?
>
> I am thinking these conditions create a new transaction:
> 1. New TCP connection, New UDP connection, New SCTP connection, etc. etc.
> 2. Parsing a new request on an existing persistent connection of any
> above type.
> 3. starting a Squid internally generated request
>
> We get into some tricky semantics with ssl-bump and ESI that still
> need to be worked out.
>
>
>> * What event ends/destroys a master transaction?
>
> The completion of the logging of a transaction should destroy it. We
> may need to alter it to CbcPointer instead of RefCount at some point -
> for now RefCount is sufficient.
>
> I am thinking that we log at these points:
> * completed delivery of a HTTP "final status"
> * connection closure / abort.
>
> NP: which logs are used to log it will depend on the transport type as
> you mention below.
>
>> While working on this, please keep in mind that not all clients are
>> going to be HTTP clients. We already have ICP and SNMP clients, and the
>> FTP gateway project adds FTP clients. While we might want to exclude
>> things like SNMP from MasterXaction scope, I am not sure that is a good
>> idea, and I am sure this needs to be a conscious decision rather than an
>> accident.
>
> I agree, and this was one of the key deciding factors in me going with
> your proposal to have MasterXaction just a set of pointers to other
> state objects.
>
>> Once finalized, the above additions will clarify master transaction
>> scope and will have long-lasting effects. This is important, even though
>> the initial changes will not cover the entire scope.
>>
>>> +
>>> + if (xaction != NULL)
>>> + os << ", xaction.id=" << xaction->id;
>> The InstanceId object will print the right prefix. Please drop the
>> "xaction.id=" prefix if this code stays at all.
>>
>>
>>> ConnStateData();
>>> + ConnStateData(const MasterXaction::Pointer &xact);
>> Should the old constructor be removed now?
>>
>> The new constructor is missing an "explicit" keyword.
>
> Oops. right. fixed.
> The old constructor is there as a "// do not implement."
>
>>
>>> + MasterXaction::Pointer xaction[SQUID_PIPELINE_MAX];
>> The new member should be called "xactions" because there are many of
>> them.
>>
>>
>>> + params.xaction = new MasterXaction;
>>> + params.xaction->squidPort =
>>> static_cast<AnyP::PortCfg*>(params.data); // do this here or in the
>>> caller?
>> The question should be asked on the previous line: If we decide that
>> master transaction starts at accept time (which is what the current code
>> implies), then we must fill all the available details at that time. If
>> we decide that master transactions are started by ConnStateData, after
>> TcpAcceptor is done, then we should create and populate xaction there.
>>
>>
>>> Attached is a patch implementing a basic MasterXaction class. It is
>>> created by TcpAcceptor when a new client connects and gets passed out
>>> for use when constructing a ConnStateData object to manage the
>>> connection. The ConnStateData object does need to retain its own
>>> pointers to the data received in that initial transaction - so that
>>> when
>>> receiving pipelined requests on a persistent connection it can spawn
>>> new
>>> MasterXaction objects with the same TCP details as the original.
>> Right. The question is whether it would be better to start the master
>> transaction from ConnStateData, especially since you still need to keep
>> all the tcp-level info independent of the MasterXaction anyway. There
>> are several advantages to creating MasterXaction inside ConnStateData:
>>
>> * MasterXaction creation code becomes uniform. Currently, there are two
>> different cases: One in TCP acceptor and one in ConnStateData.
>
> There will be more creation point cases as other alternative to
> ConnStateData are added for non-TCP protocol ports and for squid
> internally generated requests/transations. So this is a non-gain IMO.
>
>> * MasterXaction relationship with ConnStateData becomes simpler.
>> Currently, we are first creating ConnStateData using MasterXaction and
>> then are creating MasterXactions using ConnStateData, which is rather
>> messy, and forces us to explain why we want to keep some TCP data, etc.
>
> See above. We will have a complex case anyway as ConnStateData creates
> xaction on parsing request and also creates a transaction immediately
> to log aborted connections, or continues to hide the aborted
> connections from everyone.
>
>
>> * TcpAcceptor may be accepting a connection for something that is not
>> related to MasterXaction.
>
> Such as? and why?
>
>> * Different protocol/port code may create different MasterXaction
>> objects (e.g., HTTP, HTTPS, and FTP acceptors might all want different
>> MasterXactions). I do not know whether we will need HttpMasterXaction
>> and FtpMasterTransaction in the future, but this design allows for it.
>
> That is the protocol-specific pointer in MasterXaction. TcpAcceptor
> Job scope is managing setup of the tcpClient and squidPort pointers.
>
> Another non-TCP protocol acceptor would setup squidPort and its own
> snmpClient/udpClient pointers for example.
>
> This was my thinking when I moved it from the initial location of
> ConnStateDate to TcpAcceptor. We also then have the consistency of
> new-connection == new transaction. ConnStateData is one of the sepcial
> cases which need their own copy of *some* details on the first xaction
> received for creating future ones - I'm expecting all Jobs
>
>
>> * There is no pressure to stuff MasterXact into all Comm callbacks where
>> it does not belong.
>
> If the state shared between all Jobs is going to be in here logically
> we will end up placing the transaction in/out buffers in here as well.
> In which case the Comm systems will all be using it for source/sink.
> And by intention the read/write at least will be adding their stats here.
>
>> What are the advantages of creating MasterXact earlier, at TCP accept
>> level?
>
> We gain the guarantee that a transaction object exists as soon as
> possible, and fully describes the history so far. It can therefore be
> logged as a transaction with no request (client error/abort?) using a
> single MasterXaction based API to logging and comm - even if
> ConnStateData setup fails for any reason.
>
> We can plug a TcpAcceptor into some form of TunnelStateData directly
> and have a SOCKS-like proxy, or a port-X to HTTP gateway. (port 1
> intercepted, converted to CONNECT and delivered via peers?).
>
> It also demonstrates to you and others the mechanism for passing
> between Jobs. Whether we move this back to ConnStateData for the final
> commit or not you can see here the model I'm proposing to roll through
> the rest of Squid. Like SBuf, once the core and some examplar code is
> in we can all work on continuing the rollout.
>
>
>>
>>> I've marked all the branching points where MasterXaction could extend
>>> out of ConnStateData with XXX right now instead of rolling
>>> MasterXaction
>>> down through the entire codebase. As you can see from this patch
>>> already
>>> simply rolling it fully into ConnStateData is causing a lot of
>>> polishing
>>> changes to happen. That is likely to be the norm rather than the
>>> exception, so I'm intending to submit each individual step as a
>>> separate
>>> patch on trunk with SourceLayout bundled alongside Xaction rollout.
>> Sounds good. Please do not commit all those XXXs/TODOs though. After the
>> MasterXact skeleton is finished and committed (this thread), just post
>> the new MasterXact-using code as you have time to work on it.
>
> Do you not think it better to mark the missing API alterations for
> others to be aware of and maybe chose to join in working on?
>
> Amos

Received on Sun Jun 09 2013 - 12:50:47 MDT

This archive was generated by hypermail 2.2.0 : Mon Jun 17 2013 - 12:00:13 MDT