[RFC] transaction state from Amos Jeffries on 2011-05-20 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 20 May 2011 19:31:21 +1200

For starters. This is probably 3.3. We can continue to hack our way
around data passing limitations in 3.2. Although with Alex emphasis on
optimization going into 3.2, some of this may help that.

THE RANT:

The more I've been looking at the client-side the more I see duplicated
copies and re-copies of transaction details. I know you all agree there
is too much of it.

What I've seen myself...

ACLChecklist - storing a copy of as many currently known transaction
details as people have asked to be checked so far.

HttpRequest - storing copy of *almost* all transaction details.

ClientHttpRequest - copy of the transaction details needed by client-side

ConnSateData - copy of the transaction TCP details and some
HttpRequest details needed by pinning, 1xx and other weird state handling.

AccessLogEntry - copy of all most known transaction details.

... and by "copy" I mean complete duplicate. xstrdup() galore etc.
With a bunch of code doing nothing but checking the local copy is up
to date with whatever other copy or function parameter it sourced the
detail from. A bunch of other code *assuming* that its getting the right
details (sometimes wrongly).

* Have not yet got a good look at the reply handling path in detail yet.
Overall it seems to be using the request-path objects in reverse. So no
worse or better.

IDEAS:

Note that ClientHttpRequest has a member copy of AccessLogEntry. This is
*already* available and unique on a per-request basis from the very
start of the HTTP request arrival and parsing. Persists across the whole
transaction lifetime and is used for logging at the end.

I propose that the first thing we do is clean up its internal
structure design. To make sure it has all the fields we will need in the
net step.

I propose then to rename as a general-purpose transaction storage area
(TransactionDetails?). To avoid people ignoring it as a "logging-only"
thing.

I propose then to roll each step/object along the transaction pathway
to using it as their primary storage area for transaction details and
history.

  - incremental so can be done in the background for low impact starting
immediately.
  - will soon lead to removal of several useless copies.
  - will mean component/Jobs updated are guaranteed to have *all*
details for the current state of the transaction available should they
need it.

NOTE: little fine-detail processing pathways like ident will only need
a selected refcount/cbdata/locked sub-child of the whole slab object.
This is fine and will help drop dependencies. Thus the proposed modular
hierarchy structure below.

To kick-start things this is what I've been thinking we need its
structure to look like:

class TransactionDetails {

  class TimeDetails {
    // all the timing and wait stats we can dream up.
    // for the transaction as a whole.
    // specific times stay in their own component.
  } time;

  // Details about the TCP links used by this transaction.
  class TcpDetails {
    struct { FD, ip, port, eui, ident } client;
    struct s_ { FP, ip, port, eui, ident } server;
    vector<s_> serverHistory;
    // NP: not sure if we want a server-side history
    // if so it would go here listing all outbound attempts.
  } tcp;

  class SquidInternalDetails {
     // which worker/disker served this request?
     ext_acl; // details from external ACL tested
     auth; // details from proxy-auth helpers
     status; // status flags hit/miss/peer/aborted/timeout etc
     hier; // heirarchy details, HierarchyLogEntry
  } squid;

  // Details about the ICP used by this transaction.
  class IcpDetails {
     icp_opcode opcode;
  }

  // Details about the HTCP used by this transaction.
  class HtcpDetails {
     htcp_opcode? opcode;
  }

  // Details about the HTTP used by this transaction.
  class HttpDetails {
    // currently to be used request/reply.
    // points to the later specific objects
    HttpRequestPointer request;
    HttpReplyPointer reply;

    // specific state objects
    HttpRequestPointer original_request; // original received
    HttpRequestPointer adapted_request; // after adaptation

    HttpRequestPointer original_reply; // original received
    HttpRequestPointer adapted_reply; // after adaptation
    // NP: original reply may be nil if non-HTTP source.
    // in which case...
    HttpRequestPointer generated_reply; // pre-adaptation.
  } http;

  // Details about the adaptation used by this transaction.
  class AdaptationDetails {
     { ...} icap; // icap state and history, pretty much as-is
     {...} ecap; // ecap state if we find anythig to log.
  } adapt;

  // Details about the FTP used by this transaction.
  class FtpDetails {
    vector<String> protoLog; // FTP msgs used in this fetch.
  }

// ... other entries similar to FTP for gopher, wais, etc.
}

NOTE that "headers", "private" and "cache" are gone.
  - "headers" blobs are part of HttpRequest (or should be)
  - "private" is duplicate of HttpRequest details
  - "cache" is split into whichever component is actually relevant for
the particular field.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.7 and 3.1.12.1

Received on Fri May 20 2011 - 07:31:28 MDT

This archive was generated by hypermail 2.2.0 : Fri May 20 2011 - 12:00:04 MDT