Re: [DRAFT][MERGE] Cleanup comm outgoing connections in trunk

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sun, 23 May 2010 14:36:26 -0600

Hi Amos,

    Here are a few comments regarding comm outgoing connections cleanup.
There are several high-level issues that need to be addressed and a few
low-level bugs that must be fixed. Please read the whole thing first
because many issues are related.

* Comm::Connection class is not documented. We need to be very clear
about that class scope and purpose, IMO. Please see below for related
problems and specific suggestions.

* Comm::Connection(Connection &c) copy constructor lacks const for c.

* Comm::Connection lacks an assignment operator declaration. The current
default/generated ones are wrong. If Vector<> is using them, we are in
trouble.

* Comm::Connection copy constructor does not copy peer correctly OR
Comm::Connection destructor does not clean peer correctly. It may be a
good idea to hide peer_ from public use to ensure it is always
locked/unlocked properly.

* Where do we set Comm::Connection::peer_ to something useful? (Just a
question; I could not find any assignment but I probably missed it).

* Please remove the "Create with a list of possible links" comment
because it is not very accurate and because it just duplicates
documentation of _one_ of the constructors.

* I am seriously worried about the raw paths pointer being passed to
ConnectStateData, async callbacks, and being manipulated in several
concurrent objects without any protection. For example, how do we
guarantee that FwdState does not get destructed while ConnectStateData
or the callback is alive and manipulating FwdState::paths? Paths must
have cbdata locking or refcounting unless you can clearly proof that no
protection is needed.

* Please remove ConnectStateData solo connections as
exceptional/different code path. It has already bitten you (see below).
One alternative is to create paths with a single connection instead
(inside the solo constructor which will be the only solo-specific code
then). Another alternative is to add path/solo access methods that hide
the difference from the rest of the ConnectStateData code without the
Paths creation overhead.

* Paths are not really initialized for solo ConnectStateData:
 + paths(paths),

* When the above is fixed, the following will fail for solo connections.
 In general, do not assert something you until you actually use it.

+ assert(paths != NULL);

* Whenever possible, please declare Pointer parameters and Pointer
return types as const references to avoid unnecessary locking/unlocking.

* The following few comments are about the ConnectStateData class that
you did not create. Normally, I do not ask for bug fixes in moved code.
However, in your particular case, you made a private comm.cc-only class
public and started using it in high-level code. This makes you liable
for the now-public class. Sorry.

- ConnectStateData lacks copy constructor and assignment operator
declarations. You do not need to implement them if they are not needed,
but the current default/generated ones are wrong.

- ConnectStateData needs a destructor. Move host cleanup there.

- Please make static ConnectStateData member wrappers private.

- Please move static ConnectStateData member implementation to .cc.

- Can we replace ConnectStateData new/delete definitions and
declarations with an existing macro?

* We now have ConnectStateData and ConnStateData public classes.
StateData suffix is already wonderfully descriptive, but this brings
confusion to the next level.

Moreover, the relationship between new Comm::Connection and old
ConnectStateData is rather odd: In general terms, connection is what
connect(2) (or accept) creates. In the current code, ConnectStateData
uses an unconnected Connection to make a connected Connection.

Furthermore, FwdState keeps an array of unconnected Connections. And
both FwdState and ConnectStateData (and others?) use the first item of
that array specially because it is (usually) a real connection while the
rest are just future connection detail holders.

Meanwhile, each Connection object has the potential of holding such
sensitive information as descriptor and peer pointer. I suspect more
details will be added later.

I think the primary reason for most of these problems is that the new
Connection class has dual personality. It is both holds the information
necessary to establish a connection _and_ the established connection
information. I suggest to split Connection into two classes:

1) Create a PeerPort or PeerAddress class that will hold information
necessary to connect to a peer. This can be a cache_peer, an origin
server, and ICAP server, and [in SMP branch] another Squid process.
FwdState and others that need to cycle through peers will create and
pass PeerPorts or PeerAddresses arrays to Comm::Connector class (see below).

2) Connection object will have a peer member of type PeerPort or
PeerAddress. It will also have fd and other fields necessary for an
_established_ connection.

3) Rename ConnStateData to Connector and document it as "creates a
Connection to [one of] the supplied PeerPorts". This class will iterate
peer details, try opening connections, and will eventually succeed or
fail. When it succeeds, it will create the corresponding Connection
object and return it to the caller.

* Will Comm::Connection be used for incoming connections as well? If
not, the name is probably too generic. Should be something like
PeerConnection. Or perhaps we do not need this class at all?! If it is
only used as an information holder during callback (after the above
PeerPort changes) then the callback data should be modified to carry
that information instead of adding a new class.

* You have changed the semantics of the Icap::Xaction::connection
member. It used to be set during connect and now it is only set if
connect was successful. doneWithIo(), bypassFailure() and possibly other
code relies on the old semantics which is no longer true.

My understanding is that you cannot set the connection member where it
was set because the descriptor is not available at that time. The best
solution is probably to change the connection member type from
int/descriptor to Comm::Connection pointer (if we still have it after
the above changes).

* The new code below sets the connect timeout handler after a successful
connect. That does not make sense to me.

+ // TODO: do we still need the timeout handler set?
+ // there was no mention of un-setting it on success.
+
+ // TODO: service bypass status may differ from that of a
+ typedef CommCbMemFunT<Adaptation::Icap::Xaction, ...
+ AsyncCall::Pointer timeoutCall = asyncCall(93, 5, ...

+
+ commSetTimeout(io.conn->fd, ...

I do not think you need the above at all. The timeout handler will be
set in updateTimeout, which will be called from scheduleWrite(). This
logic is not very clear. If you want to improve it, you may call
updateTimeout() instead of doing the above (it will clear the timeout
then) but doing so is outside your work scope.

Thank you,

Alex.

On 05/22/2010 12:53 AM, Amos Jeffries wrote:
> Alex Rousskov wrote:
>> On 05/19/2010 07:05 AM, Amos Jeffries wrote:
>>> Henrik Nordström wrote:
>>>> tis 2010-05-18 klockan 23:34 +0000 skrev Amos Jeffries:
>>>>
>>>>> I've discovered the VC connections in DNS will need a re-working to
>>>>> handle
>>>>> the new TCP connection setup handling. I've left that for now since it
>>>>> appears that you are working on redesigning that area anyway. The new
>>>>> setup
>>>>> routines will play VERY nicely with persistent TCP links to the
>>>>> nameservers.
>>>> I have not started on the DNS rewrite yet.
>>>>
>>>>> I took some extra time last night and broke the back of the selection
>>>>> and
>>>>> forwarding rewrite. I'm now down to the fine detail build errors. When
>>>>> those are done I'll push the branch to LP for you to do the DNS
>>>>> fixes on
>>>>> top of.
>>>> Ok.
>>>>
>>> Pushed to launchpad: lp:~yadi/squid/cleanup-comm
>>
>> How can I review the changes as one patch without checking out your
>> branch?
>>
>
> Oh. Attached is my latest code.
>
> This has now been run tested and I've been using it for personal
> browsing for a short while now no problems.
>
>
>
>> Thank you,
>>
>> Alex.
>>
>>
>>> This builds, but has not yet been run tested.
>>>
>>> What has changed:
>>>
>>> ConnectionDetails objects have been renamed Comm::Connection and been
>>> extended to hold the FD and Squids' socket flags.
>>>
>>> Peer selection has been extended to do DNS lookups on the peers chosen
>>> for forwarding to and produce a vector<> of possible connection
>>> endpoints (squid local IP via tcp_outgoing_address or tproxy) and remote
>>> server.
>>>
>>> Various connection openers have been converted to use the new
>>> ConnectStateData API and CommCalls (function based so far).
>>>
>>>
>>> ConnectStateData has been moved into src/comm/ (not yet namespaced) and
>>> had all its DNS lookup operations dropped. To be replaced by a looping
>>> process of attempting to open a socket and join a link as described by
>>> some Comm::Connection or vector<> of same.
>>>
>>> ConnectStateData::connect() will go away and do some async work. Will
>>> come back at some point by calling the handler with COMM_OK,
>>> COMM_ERR_CONNECT, COMM_TIMEOUT and ptrs to the Comm::Connection or
>>> vector (whichever were passed in).
>>> On COMM_OK the Comm::Connection pointer or the first entry of the
>>> vector will be an open conn which we can now use.
>>> On COMM_ERR_CONNECT the vector will be empty (all tried and
>>> discarded), the single ptr will be closed if not NULL.
>>> On COMM_TIMEOUT their content is as per COMM_ERR_CONNECT but the vector
>>> may have untried paths still present but closed.
>>>
>>> FD opening, FD problems, connection errors, timeouts, early remote
>>> TCP_RST or NACK closure during the setup are all now wrapped out of
>>> sight inside ConnectStateData.
>>>
>>> The main-level component may set FD handlers as needed for read/write
>>> and closure of the link in their connection-done handler where the FD
>>> first becomes visible to them.
>>>
>>>
>>> Besides the testing there is some work to:
>>> * make it obey squid.conf limits on retries and paths looked up.
>>> * make DNS TCP links ('VC') work again.
>
> The ref-counting helped a bit here not having to juggle the new/delete
> locations. I think it goes, but have not had a conclusive test yet.
> Still very inefficient for what it needs to do.
>
>>> * make the CommCalls proper AsynCalls and not function handler based.
>>> * make Comm::Connection ref-counted so we can have them stored
>>> in the peer details and further reduce the DNS steps.
>
> done.
>
>>> * make ICAP do DNS lookups to set its server Comm::Connection properly.
>>> For now it's stuck with the gethostbyname() blocking lookup.
>>>
>>>
>>> Future work once this is stable is to:
>>> a) push the IDENT, NAT, EUI and TLS operations down into the Comm layer
>>> with simple flags for other layers to turn them on/off as desired.
>>> b) make the general code pass Comm::Connection around so everything
>>> like ACLs can access the client and server conn when they need to.
>>>
>>> Amos
>>
>
> Amos
>
Received on Sun May 23 2010 - 20:36:42 MDT

This archive was generated by hypermail 2.2.0 : Mon May 24 2010 - 12:00:11 MDT