Re: [PATCH] Master transaction state object from Amos Jeffries on 2013-05-10 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 11 May 2013 16:31:51 +1200

On 11/05/2013 5:07 a.m., Alex Rousskov wrote:
> On 05/10/2013 09:05 AM, Amos Jeffries wrote:
>
>> Bumping this old design conversation back into gear. I would like to
>> start implementation and rollout of this master transaction code,
>> effective immediately.
> I think this is a good start, but I do not think the proposed patch
> should go in immediately.
>
> First of all, pipelining changes are a totally separate can of worms
> IMO. If you insist on merging MasterXact changes and pipelining change
> in one patch, we will have to accept that, but it would be best to
> separate the two issues into two different email threads/patches.

I'm already separating it out. The SQUID_PIPELINE_MAX and associated XXX
in a standalone patch will be submitted later today I think. The array
stuff is just to kee the Client* pipeline objects list separate from eth
Xaction at this point and I can drop easily.

> I will focus on higher-level issues for pipelining and then provide a
> more detailed review of MasterXact changes below.
>
>
> Pipelining:
>
> * If you are going to allow more than two transactions, the depth of the
> pipeline should probably be configurable via squid.conf, replacing or
> deprecating the on/off switch we already have. You may still have a
> hard-coded maximum, but only if there is a good reason to have it. I do
> not see one.

Agreed. However at this pint I'm not even happy making it generally
configurable. Thus the developers-only setting while we figure out for
certain where the flaws actually are.

> * Please use an std::queue instead of C array for the
> ConnStateData::xaction pipeline. Arrays are poor underlying structures
> for FIFO queues.

Ack. queue or the current list management does seem better. I will drop
it from the initial patch for now, we will have to have a final decision
by the time the parser updates start to happen.

> * Please remove references to pipelining from MasterXaction.h. Nothing
> in MasterXaction.h should expose users to details relevant to only a
> small part of Squid.

Done. It is left in client_side.h for now.

> Master Transaction:
>
>> === modified file 'src/CommCalls.h'
> ...
>> + /// Transaction which this call is part of.
>> + MasterXaction::Pointer xaction;
>
> It feels wrong to place that object inside every Comm call. A Comm call
> is a very low-level event. What would it do with a Master Transaction?
> It would not even know which part of the Master Transaction corresponds
> to the call! In fact, there are Comm calls that are not associated with
> a master transaction. Let's not try to cram master transactions
> everywhere just because it is needed somewhere. It will force us to
> either create fake master transactions or deal with possibly nil master
> transaction pointers.

Right now the TcpAcceptor AsyncJob is using it in AcceptCbParams to pass
the new transaction out to its child.
I am envisioning that the CloseCbParams will need to use it to pass it
to the Comm layer close handlers, and that the IoCbParms will need to
use it to pass to low level I/O handlers - for each of those components
to write the stats accounting details into the xaction data. For now it
is a NULL pointer on these parameters.

>
>> +#ifndef SQUID_SRC_MASTERXACTION_H
>> +#define SQUID_SRC_MASTERXACTION_H
>> +
>> +#include "anyp/PortCfg.h"
> Please avoid exposing every master transaction user to PortCfg.

Will do. Planning to add anyp/forward.h to replace this.
FWIW all the callers I can find *do* need the port details (and
Comm::Connection details) in one way or another. But I agree consistency
on this will be useful.

>
>> +/** Master transaction state object.
> Do we really want this class to store the current state of the master
> transaction? Or should it only contain history/stats? The difference is
> critical for the design of this class IMO:
>
> If master transaction reflects the current state, it may have pointers
> to client-side, server-side, and adaptation-side jobs (to name just a
> few), facilitating their communication. It will be a powerful tool, but
> will probably increase the dependencies among various "sides" and may
> ultimately lead to messy and buggy code where one side is trying to
> figure out what other sides are currently active and communicate with
> them. We should be very careful if we decide to move down this path.
>
> If master transaction collects the history and stats, it becomes very
> useful for annotations and logging, but would not help with inter-side
> communication.

Yes. I am thinking this be the stats and history up to the present. As
we roll it out we need to make clear decisions about what state is
shared and goes in here and what state is private to the AsyncJob. With
an API on the Job (Calls? Subscription?) for accessing the private details.

Communication between the "sides" should be done via client-streams API
or whatever we adapt it into. I have alternative designs going through
my head based on the HTTP/2 frame-based messaging which can pass blocks
between components. For now that means the current methods of accessing
other sides Jobs will need to be left in place alongside this
MasterXaction state rollout.
The only reasons we have today for HttpStateData for example needing to
access something as far away as ConnStateData is to bypass all the store
systems lack of 1xx support (broken store API), or to access the PortCfg
details (lack of master transaction data in HttpStateData).

> We will need to adjust the description of the MasterXaction class
> accordingly. For now, I suggest adding three important pieces of
> information:
>
> * What is a "client transaction"? Should we define that as anything
> worth logging into access.log? Or does that extend to ICP clients? SNMP?
>
> * What event starts/creates a master transaction?

I am thinking these conditions create a new transaction:
1. New TCP connection, New UDP connection, New SCTP connection, etc. etc.
2. Parsing a new request on an existing persistent connection of any
above type.
3. starting a Squid internally generated request

We get into some tricky semantics with ssl-bump and ESI that still need
to be worked out.

> * What event ends/destroys a master transaction?

The completion of the logging of a transaction should destroy it. We may
need to alter it to CbcPointer instead of RefCount at some point - for
now RefCount is sufficient.

I am thinking that we log at these points:
* completed delivery of a HTTP "final status"
* connection closure / abort.

NP: which logs are used to log it will depend on the transport type as
you mention below.

> While working on this, please keep in mind that not all clients are
> going to be HTTP clients. We already have ICP and SNMP clients, and the
> FTP gateway project adds FTP clients. While we might want to exclude
> things like SNMP from MasterXaction scope, I am not sure that is a good
> idea, and I am sure this needs to be a conscious decision rather than an
> accident.

I agree, and this was one of the key deciding factors in me going with
your proposal to have MasterXaction just a set of pointers to other
state objects.

> Once finalized, the above additions will clarify master transaction
> scope and will have long-lasting effects. This is important, even though
> the initial changes will not cover the entire scope.
>
>> +
>> + if (xaction != NULL)
>> + os << ", xaction.id=" << xaction->id;
> The InstanceId object will print the right prefix. Please drop the
> "xaction.id=" prefix if this code stays at all.
>
>
>> ConnStateData();
>> + ConnStateData(const MasterXaction::Pointer &xact);
> Should the old constructor be removed now?
>
> The new constructor is missing an "explicit" keyword.

Oops. right. fixed.
The old constructor is there as a "// do not implement."

>
>> + MasterXaction::Pointer xaction[SQUID_PIPELINE_MAX];
> The new member should be called "xactions" because there are many of them.
>
>
>> + params.xaction = new MasterXaction;
>> + params.xaction->squidPort = static_cast<AnyP::PortCfg*>(params.data); // do this here or in the caller?
> The question should be asked on the previous line: If we decide that
> master transaction starts at accept time (which is what the current code
> implies), then we must fill all the available details at that time. If
> we decide that master transactions are started by ConnStateData, after
> TcpAcceptor is done, then we should create and populate xaction there.
>
>
>> Attached is a patch implementing a basic MasterXaction class. It is
>> created by TcpAcceptor when a new client connects and gets passed out
>> for use when constructing a ConnStateData object to manage the
>> connection. The ConnStateData object does need to retain its own
>> pointers to the data received in that initial transaction - so that when
>> receiving pipelined requests on a persistent connection it can spawn new
>> MasterXaction objects with the same TCP details as the original.
> Right. The question is whether it would be better to start the master
> transaction from ConnStateData, especially since you still need to keep
> all the tcp-level info independent of the MasterXaction anyway. There
> are several advantages to creating MasterXaction inside ConnStateData:
>
> * MasterXaction creation code becomes uniform. Currently, there are two
> different cases: One in TCP acceptor and one in ConnStateData.

There will be more creation point cases as other alternative to
ConnStateData are added for non-TCP protocol ports and for squid
internally generated requests/transations. So this is a non-gain IMO.

> * MasterXaction relationship with ConnStateData becomes simpler.
> Currently, we are first creating ConnStateData using MasterXaction and
> then are creating MasterXactions using ConnStateData, which is rather
> messy, and forces us to explain why we want to keep some TCP data, etc.

See above. We will have a complex case anyway as ConnStateData creates
xaction on parsing request and also creates a transaction immediately to
log aborted connections, or continues to hide the aborted connections
from everyone.

> * TcpAcceptor may be accepting a connection for something that is not
> related to MasterXaction.

Such as? and why?

> * Different protocol/port code may create different MasterXaction
> objects (e.g., HTTP, HTTPS, and FTP acceptors might all want different
> MasterXactions). I do not know whether we will need HttpMasterXaction
> and FtpMasterTransaction in the future, but this design allows for it.

That is the protocol-specific pointer in MasterXaction. TcpAcceptor Job
scope is managing setup of the tcpClient and squidPort pointers.

Another non-TCP protocol acceptor would setup squidPort and its own
snmpClient/udpClient pointers for example.

This was my thinking when I moved it from the initial location of
ConnStateDate to TcpAcceptor. We also then have the consistency of
new-connection == new transaction. ConnStateData is one of the sepcial
cases which need their own copy of *some* details on the first xaction
received for creating future ones - I'm expecting all Jobs

> * There is no pressure to stuff MasterXact into all Comm callbacks where
> it does not belong.

If the state shared between all Jobs is going to be in here logically we
will end up placing the transaction in/out buffers in here as well. In
which case the Comm systems will all be using it for source/sink. And by
intention the read/write at least will be adding their stats here.

> What are the advantages of creating MasterXact earlier, at TCP accept level?

We gain the guarantee that a transaction object exists as soon as
possible, and fully describes the history so far. It can therefore be
logged as a transaction with no request (client error/abort?) using a
single MasterXaction based API to logging and comm - even if
ConnStateData setup fails for any reason.

We can plug a TcpAcceptor into some form of TunnelStateData directly and
have a SOCKS-like proxy, or a port-X to HTTP gateway. (port 1
intercepted, converted to CONNECT and delivered via peers?).

It also demonstrates to you and others the mechanism for passing between
Jobs. Whether we move this back to ConnStateData for the final commit or
not you can see here the model I'm proposing to roll through the rest of
Squid. Like SBuf, once the core and some examplar code is in we can all
work on continuing the rollout.

>
>> I've marked all the branching points where MasterXaction could extend
>> out of ConnStateData with XXX right now instead of rolling MasterXaction
>> down through the entire codebase. As you can see from this patch already
>> simply rolling it fully into ConnStateData is causing a lot of polishing
>> changes to happen. That is likely to be the norm rather than the
>> exception, so I'm intending to submit each individual step as a separate
>> patch on trunk with SourceLayout bundled alongside Xaction rollout.
> Sounds good. Please do not commit all those XXXs/TODOs though. After the
> MasterXact skeleton is finished and committed (this thread), just post
> the new MasterXact-using code as you have time to work on it.

Do you not think it better to mark the missing API alterations for
others to be aware of and maybe chose to join in working on?

Amos
Received on Sat May 11 2013 - 04:32:03 MDT

This archive was generated by hypermail 2.2.0 : Sat May 11 2013 - 12:00:12 MDT