Re: [RFC] transaction state from Alex Rousskov on 2011-05-20 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 20 May 2011 11:16:06 -0600

On 05/20/2011 01:31 AM, Amos Jeffries wrote:

> For starters. This is probably 3.3. We can continue to hack our way
> around data passing limitations in 3.2. Although with Alex emphasis on
> optimization going into 3.2, some of this may help that.
>
>
> THE RANT:
>
> The more I've been looking at the client-side the more I see duplicated
> copies and re-copies of transaction details. I know you all agree there
> is too much of it.
>
> What I've seen myself...
>
> ACLChecklist - storing a copy of as many currently known transaction
> details as people have asked to be checked so far.
>
> HttpRequest - storing copy of *almost* all transaction details.
>
> ClientHttpRequest - copy of the transaction details needed by client-side
>
> ConnSateData - copy of the transaction TCP details and some
> HttpRequest details needed by pinning, 1xx and other weird state handling.
>
> AccessLogEntry - copy of all most known transaction details.

There is also the opposite side of the same coin -- code that does not
care about the HTTP request details but cares about some master
transaction information is currently forced to deal with the whole
HttpRequest structure, increasing the number of dependencies and
complications associated with changing the request.

> ... and by "copy" I mean complete duplicate. xstrdup() galore etc.
> With a bunch of code doing nothing but checking the local copy is up to
> date with whatever other copy or function parameter it sourced the
> detail from. A bunch of other code *assuming* that its getting the right
> details (sometimes wrongly).
>
> * Have not yet got a good look at the reply handling path in detail yet.
> Overall it seems to be using the request-path objects in reverse. So no
> worse or better.

Yes, the XXXs and TODOs associated with the lack of "master transaction"
class exist on all Squid sides.

> IDEAS:
>
> Note that ClientHttpRequest has a member copy of AccessLogEntry. This is
> *already* available and unique on a per-request basis from the very
> start of the HTTP request arrival and parsing. Persists across the whole
> transaction lifetime and is used for logging at the end.

I suspect it would be easier and overall better to implement and review
a good master transaction class from scratch than to modify and existing
class, especially such rotten class as AccessLogEntry.

The focus of the master transaction class is on persistence
(refcounting) and isolation of information (each module does not need to
know about all others). This is very different from AccessLogEntry.

> I propose that the first thing we do is clean up its internal structure
> design. To make sure it has all the fields we will need in the net step.
>
> I propose then to rename as a general-purpose transaction storage area
> (TransactionDetails?). To avoid people ignoring it as a "logging-only"
> thing.
>
> I propose then to roll each step/object along the transaction pathway
> to using it as their primary storage area for transaction details and
> history.
>
> - incremental so can be done in the background for low impact starting
> immediately.
> - will soon lead to removal of several useless copies.
> - will mean component/Jobs updated are guaranteed to have *all* details
> for the current state of the transaction available should they need it.

I would rather see a new MasterXact class added. It may have
AccessLogEntry as a member, but it will not immediately inherit all the
AccessLogEntry problems.

> NOTE: little fine-detail processing pathways like ident will only need
> a selected refcount/cbdata/locked sub-child of the whole slab object.
> This is fine and will help drop dependencies. Thus the proposed modular
> hierarchy structure below.

The proposed structure does not work for isolating the components from
each other. And one cannot refcount/cbdata/lock a sub-structure.

If we want to truly isolate components, we should probably follow this
sketch:

class Component1;
class Component2;
class Component3;
class Component4;

class MasterXaction: public RefCountable {
public:
    Component1 *component1;
    Component2 *component2;
    Component3 *component3;
    Component4 *component4;
    ...

const InstanceId<MasterXact> id;
};

The above can be compiled and linked without knowing anything about
individual Component classes. The code that needs access to componentN,
will include ComponentN.h and MasterXact.h. "ComponentN" is not a real
name; the actual name will be specific to the component purpose, of
course. For example, there may be AdaptationDetails, AuthenticationInfo,
and similar components.

Simple components that do not use complex data member types can be
inlined to avoid additional memory allocation, like you show in your
sketch below.

An important goal here would be to eventually move everything not
related to a single HTTP request from HttpRequest to MasterXaction and
change most of the code not related to handling HTTP requests to use
MasterXaction as the primary source of information.

With the above sketch, it is not necessary to lock individual
components. We can pass MasterXaction objects around instead.

I am not sure exactly how to make the master transaction both refcounted
and cbdata-safe, but it is probably possible.

> To kick-start things this is what I've been thinking we need its
> structure to look like:
>
> class TransactionDetails {
>
> class TimeDetails {
> // all the timing and wait stats we can dream up.
> // for the transaction as a whole.
> // specific times stay in their own component.
> } time;
>
> // Details about the TCP links used by this transaction.
> class TcpDetails {
> struct { FD, ip, port, eui, ident } client;
> struct s_ { FP, ip, port, eui, ident } server;
> vector<s_> serverHistory;
> // NP: not sure if we want a server-side history
> // if so it would go here listing all outbound attempts.
> } tcp;
>
> class SquidInternalDetails {
> // which worker/disker served this request?
> ext_acl; // details from external ACL tested
> auth; // details from proxy-auth helpers
> status; // status flags hit/miss/peer/aborted/timeout etc
> hier; // heirarchy details, HierarchyLogEntry
> } squid;
>
> // Details about the ICP used by this transaction.
> class IcpDetails {
> icp_opcode opcode;
> }
>
> // Details about the HTCP used by this transaction.
> class HtcpDetails {
> htcp_opcode? opcode;
> }
>
> // Details about the HTTP used by this transaction.
> class HttpDetails {
> // currently to be used request/reply.
> // points to the later specific objects
> HttpRequestPointer request;
> HttpReplyPointer reply;
>
> // specific state objects
> HttpRequestPointer original_request; // original received
> HttpRequestPointer adapted_request; // after adaptation
>
> HttpRequestPointer original_reply; // original received
> HttpRequestPointer adapted_reply; // after adaptation
> // NP: original reply may be nil if non-HTTP source.
> // in which case...
> HttpRequestPointer generated_reply; // pre-adaptation.
> } http;
>
> // Details about the adaptation used by this transaction.
> class AdaptationDetails {
> { ...} icap; // icap state and history, pretty much as-is
> {...} ecap; // ecap state if we find anythig to log.
> } adapt;
>
> // Details about the FTP used by this transaction.
> class FtpDetails {
> vector<String> protoLog; // FTP msgs used in this fetch.
> }
>
> // ... other entries similar to FTP for gopher, wais, etc.
> }
>
> NOTE that "headers", "private" and "cache" are gone.
> - "headers" blobs are part of HttpRequest (or should be)
> - "private" is duplicate of HttpRequest details
> - "cache" is split into whichever component is actually relevant for
> the particular field.

Individual parts sound like a good starting point to me, with the
exception of SquidInternalDetails which has too vague scope and should
probably be split or merged with others.

Some fields may need stricter descriptions (e.g., there can be many
adapted requests if there is an adaptation chain). And we will probably
add more (e.g., request_sent and response_sent?).

We should probably discuss specific fields later.

Cheers,

Alex.
Received on Fri May 20 2011 - 17:16:31 MDT

This archive was generated by hypermail 2.2.0 : Fri May 20 2011 - 12:00:04 MDT