Re: [PATCH] Comm::TcpReceiver - part 1, 2, and 3 from Alex Rousskov on 2014-02-08 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 08 Feb 2014 00:05:39 -0700

On 02/07/2014 09:26 PM, Amos Jeffries wrote:
> On 8/02/2014 1:54 p.m., Alex Rousskov wrote:
>> On 02/05/2014 07:55 AM, Amos Jeffries wrote:
>>> * Kept the TcpReceiver name for now since Alex is disagreeing with
>>> callign the interface an "agent".

>> I do not recall objecting to Agent.

> You suggested Agent. Then went on to describe an object which was
> self-contained. TcpReceiver has pure virtuals so cannot be what you seem
> to describe as Agent (its childs would be that).

I do not recall describing Agent as a self-contained object either.
Agent will have pure virtual functions, of course. Event the existing
ClientAgent (i.e., ServerStateData) does!

>>> + MemBuf inBuf;
>>
>> The above undocumented TcpReceiver data member may be critical to
>> defining TcpReceiver and related classes correctly. What does that
>> buffer contain? Raw TCP bytes? I do not think so. Do we read raw TCP
>> bytes into that buffer? Not quite. Those bytes did not necessarily got
>> there from a TCP connection. They may have been through an SSL decoding
>> layer using ssl_read_method() and Comm I/O API (at least).
>>
>
> The intention is to have a separate TlsWhatever class with same API as
> this one for the SSL connections reading and writing.

I still think that would be a wrong design. The complex pieces of code
you have moved to TcpReceiver have nothing to do with TCP or SSL. If you
implement your plan, those pieces would be identical in TcpWatever and
TlsWhatever. That's wrong.

Let other layers handle the TCP and SSL specifics. The class you are
writing now does not have any TCP- or SSL-specific code (unless I missed
some minor bits) and that is good! Why make it TCP specific?

> As you know one cannot safely or cleanly mix randomly read(2) and
> ssl_read_method() on one connection. Any solution we need to do soo
> would be a third interface class.

The actual read(2) and ssl_read_method() calls will be done by other
layers and classes, orthogonal to the class you are working on now. The
existing Comm API is already doing that (for now; Peek and Splice needs
will require more work on those layers, but that is out of scope here).

>> That buffer actually contains "final protocol" bytes, for whatever
>> protocol the kid agent is implementing (HTTP, Gopher, etc.). The whole
>> TcpReceiver class is not about TCP at all!
>
> It's about bytes going to/from a TCP connection.

Nope. The transport and any encodings/encryptions on top of it are
pretty much irrelevant to the current (and future) TcpReceiver code. If
you disagree, please show me important TCP-specific code in your current
TcpReceiver implementation.

> The bufer has nothing
> to do with the format of the bytes inside it.

Actually, the data format or protocol are critical. Virtually no code
would be able to use that buffer for something useful without knowing
what it contains! More on that below, with examples.

> '''Common base for classes reading (and writing) from (and to) a single
> TCP Comm Connection. Contains basic connection management code and
> complex algorithms for coordinating I/O activity with message receiving
> (and sending) states.'''

Please show me the TCP specific code in your TcpReceiver implementation.

> "protocol messages" are nothing to do with this class (they are the
> child business and may not even exist for cases like tunnel.cc, or may
> be several abstraction levels away).

I agree that we _can_ remove messages from the specs of the current
TcpReceiver class. The current implementation deals with a stream of
bytes only. I see no important code that deals with message boundaries
and such.

I doubt that the class will stay that way because all the child classes
I know about do deal with messages and connection persistency (in a
general sense). I would expect the notion of a message to trickle down
to TcpReceiver class sooner or later because the kids will want to share
that code.

Please note that supporting a concept of a message and connection
persistency across messages is different from knowing what the message
protocol is. I am not advocating for adding HTTP or FTP code to the
TcpReceiver, of course.

In other words (and simplifying a bit):

/// raw TCP bytes (WRONG: could be SSL-decrypted TCP)
MemBuf inBuf;

/// opaque received bytes (OK but a little vague)
MemBuf inBuf;

  /// received bytes that are yet-unprocessed by our kids;
  /// opaque for us, but kids know the protocol these bytes
  /// are using (e.g., HTTP, FTP, etc) (Better)
  MemBuf inBuf;

The actual final description may also mention what we expect the kids to
do with the buffer (copy, leave just the unparsed bytes, empty on exit,
etc.).

>>> +bool
>>> +Comm::TcpReceiver::doneAll() const
>>> +{
>>> + return stoppedSending() && stoppedReceiving() && !inBuf.hasContent() && AsyncJob::doneAll();
>>> +}
>>
>> If we are done sending and receiving why would having buffered content
>> (the third guard above) matter? What can we do with that content (in the
>> scope of this class) that does not involve sending or receiving?
>
> Because until the child class has cleared/consumed the content out of
> our buffer we cannot guarantee that it is finished with.

That is kids' problem/scope, not ours. Just like we cannot guarantee
that kids are finished with the buffer, we cannot guarantee that they
are _not_ finished with it.

> I am trying to
> make the buffer private, so the child class doneAll() does not have
> direct access to it except when passed in.

I think it would be a mistake to deprive kids from input buffer access.
They may need to have access to the buffer during events not triggered
by I/O that TcpReceiver controls or knows about. For example, Store,
ICAP, or even an async eCAP adapter need to be able to get more body
bytes without any network I/O happening.

> - For now I've added a note about that and put a matching inBuf.clean()
> in keepaliveNextRequest() to clear it once the active requests queue has
> been drained.

This is just one bad side effect of the layering violation. Let the kids
determine whether they are done with the data or not, without forcing
them to clear the buffer to indicate that they are done (a pretty
meaningless action for them since an empty buffer on its own often does
not mean that they are done; just that they do not have any unprocessed
input right now).

>>> +void
>>> +Comm::TcpReceiver::releaseConnection(const char *reason)
>>> +{
>>> + // Used by clients to release the connection before
>>> + // storing it in a Pconn pool for reuse.
>>> + comm_remove_close_handler(tcp->fd, closed_);
>>> + closed_->cancel(reason);
>>
>> s/clients/callers/ or s/clients/kids/ to avoid the impression that this
>> code is specific to Client agents. It is not.
>>
>> "release" is kind of undefined in this context. I think you are talking
>> about the need to stop monitoring connection for I/O. Consider
>> rephrasing the comment to be more precise.
>
> I have been considering making this return the Comm::Connection object
> and set the tcp member to NULL. Somewhat like pop() semantics. Since it
> is used to *move* control of the TCP connection from this object to
> somewhere else (ie to a pconn pool or maybe a TlsWhatever agent on
> ssl-bump or Upgrade).
>
> Does that sound good?

Does not sound bad, but I am not sure that fully resetting the tcp
member itself is a good idea, unfortunately. Since your Comm::Connection
class has two roles (storage of endpoints with other info AND,
optionally, an active socket descriptor), deleting access to that first
"info" part may not work well, even if we no longer control the second
"socket" part. However, you can try and see what happens if you prefer,
of course. Perhaps that info is already copied elsewhere...

>>> +bool
>>> +Comm::TcpReceiver::injectFakeRequestXXX(MemBuf &fake)
>>> +{
>>> + if (inBuf.hasContent())
>>> + fake.append(inBuf.content(), inBuf.contentSize());
>>> +
>>> + inBuf.reset();
>>> + inBuf.append(fake.content(), fake.contentSize());
>>> +
>>> + return processReadBuffer(inBuf);
>>> +}
>>
>> Let's not combine injection with processing. Let the caller call
>> processReadBuffer() if/when they are ready for that. It would make the
>> method more powerful but simpler. And you would not have to fix its
>> documentation to describe the return value, etc.
>>
>> Another layer of problems with this method are two similar asserts: One
>> when the "fake" buffer is too small and the other one is when the inBuf
>> is too small to accommodate the combined content. I will focus on the
>> former.
>>
>> We should not assume that the fake MemBuf can accommodate the entire
>> inBuf. MemBufs have limited capacity and it is too much to ask the
>> caller to give us the right buffer. If you disagree, please adjust the
>> method description to explain the requirements on the buffer parameter.
>> If you agree, make the buffer parameter constant and use a temporary
>> buffer (easy) or some kind of new MemBuf method with memmove() to inject
>> content.
>
> I was trying to reduce data copies here.

Note that you are already increasing the number of data copies: The
official code does not have any copies in this context AFAICT, but
suffers from many other problems. Some copying may be necessary to fix
those problems. It will go away when/if we start supporting multi-buffer
parsing and I/O.

> Oh well we will have to go with
> 3x data copy and two re-allocations on the common case for intercepted
> HTTPS.

You do not have to use three copies if you do not want to. You can add a
method to MemBuf to insert a prefix with one memmove() followed by one
memcpy().

On the other hand, I suspect these copies are all minor compared to
other SSL overheads so it may make sense to keep this code simple rather
than make it fast.

>> The code should not assert (because I might be able to trigger such an
>> assert by stuffing the input buffer with a lot of content). A throwing
>> Must() would be OK in this hypothetical case IMO because it will just
>> terminate a single transaction.
>
> Dangerous to call Must() as these are possibly outside of AsyncJob
> scope. IDENT and WHOIS lookup etc will likely not be AsyncJob by the
> time this could be rolled out there.

Neither IDENT nor WHOIS are going to use fake message injection, I hope.

More importantly, Must() is never more dangerous than assert. More
inconvenient if triggered outside of job protections (no stack trace),
yes, but not more dangerous.

>> Overall, I am not sure this method is needed. The caller can do the
>> manipulations on its own, right? It may be easier for the caller to add
>> the required capacity checks than implement this general code correctly.
>>
>
> Ideally not. I am trying to move the inBuf to a private member so the
> childs should only be playing with the buffer they get passed by
> processReadBuffer(MemBuf &)

As you know by now, I do not think that hiding input buffer from kids is
going to work unless you force kids to copy it. TcpReceiver does not
control all the events in a transaction life and, hence, cannot prohibit
access to the buffered message which may be waiting for those events.

Cheers,

Alex.
Received on Sat Feb 08 2014 - 07:05:47 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 08 2014 - 12:00:11 MST