Re: [AUDIT PATCH] comm cleanup - pconn changes

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 05 Oct 2010 00:47:08 +0000

On Mon, 04 Oct 2010 14:31:25 -0600, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 10/02/2010 10:55 AM, Amos Jeffries wrote:
>> Implements the pconn and IdleConnList as storage lists of
>> Comm::Connection.
>>
>> With Comm::Connection the key definition has to change. The old keying
>> based solely on domain would produce a Connection object whose FD did
>> not exactly match the remote IP:port selected by the new routing
logics.
>>
>> I've kept the concept that the pconn key is the details to describe the
>> remote end of the link used to fetch any given domain. As such its
>> become much simpler now. Just the remote IP:port from the TCP details
>> and requested domain (post re-writing) or peer hostname for peer pconn.
>>
>> The change also removes the client address from relevance (enhancement
>> bug fix). Instead we scan the list of pconn going to our desired
>> remote-end for the local-end details (tcp_outgoing_addr or tproxy IPs)
>> to pick an exact pconn match (bug fix).
>
> .
>
>> - list->removeFD(fd); /* might delete list */
>> - comm_close(fd);
>> + /* might delete list */
>> + if (list && list->remove(conn)) {
>> + Comm::ConnectionPointer nonConst = conn;
>> + nonConst->close();
>> + }
>
> The list member cannot be NULL until the remove is called. The nonConst
> conversion is also weird but I will discuss that later. Should probably
> be rewritten as:
>
> Comm::ConnectionPointer deletedConn =
> list->remove(conn); // might delete list
> if (deletedConn != NULL)
> deletedConn->close();
>
> However, this is still not equivalent to the old code because the old
> code was closing the connection regardless of whether it was found in
> the list. Is that an intentional change? Perhaps add a comment about it?
>
>> - list->removeFD(fd); /* might delete list */
>> - comm_close(fd);
>> + Comm::ConnectionPointer temp = new Comm::Connection; // XXX:
>> transition. make timeouts pass conn in
>> + temp->fd = fd;
>> + if (list->remove(temp)) {
>> + temp->close();
>> + } else
>> + temp->fd = -1; // XXX: transition. prevent temp erasure
>> double-closing FD until timeout CB passess conn in.
>
> Lost that critical, IMO, "might delete list" comment.
>
> Also, the creation of a temporary connection seems unnecessary when we
> have a findIndex method. Use findIndex or, better, add a findConn method

> if you want to close the found connection.
>
> The same concerns about not-closing of the not-found connection apply
> here as well.
>
>
>> + for (int index = 0; index < nfds; index++)
>> + theList[index] = oldList[index];
>
> This makes IdleConnList::push slower that the old code that used
> xmemcpy(). The shifting overhead is one of the reasons we should not use

> [C] arrays here, IMO.
>
>
>> -IdleConnList::read(int fd, char *buf, size_t len, comm_err_t flag, int
>> xerrno, void *data)
>> +IdleConnList::read(const Comm::ConnectionPointer &conn, char *buf,
>> size_t len, comm_err_t flag, int xerrno, void *data)
>
> Please capitalize static Read() if you are changing its profile anyway.
>
>
>> + int used = strlen(buf);
>
> Please use const.
>
>
>> + Comm::ConnectionPointer nonConst = conn;
>> + Comm::ConnectionPointer nonConst = conn;
>> + Comm::ConnectionPointer nonConst = conn;
> ...
>
> This needs to be fixed somehow as it is only going to get worse. There
> are two questions here:
>
> 1) Does "const Pointer" always mean that we cannot change the class the
> Pointer points to? No, it does not. It only means that we can never
> change the Pointer itself. Our RefCount is rather confused about
> const-correctness, but we can (and should) fix that.
>
> 2) Does Comm have a right to change Connection or similar classes
> submitted to it for I/O management? Here, the decision is up to us, but
> I think it is pretty clear that a lot of our Comm code already fiddles
> with FD state (via fd_table and such) and, hence, with the Connection
> state.
>
> If you agree, then calling conn->close() for "const Connection::Pointer
> &conn" passed to a Comm I/O handler is just fine. If you need help
> fixing RefCount to support that, please let me know. Otherwise, just
> follow auto_ptr or any semi-standard smart pointer layout.
>

Completely agree. All my uses of "nonConst" are to call ->close() past
that brain-damage.

>
>> + /**
>> + * Search the list for an existing connection. Matches by FD.
>> + *
>> + * \retval false The connection does not currently exist in the
>> list.
>> + * We seem to have hit and lost a race condition.
>> + * Nevermind, but MUST NOT do anything with the raw
>> FD.
>> + */
>> + bool remove(const Comm::ConnectionPointer &conn);
>
> You forgot to mention that remove() removes :-).
>
> As discussed above, the "MUST NOT do anything" caveat seems to
> contradict the current code which does something. Can you describe a
> race condition that would make the Comm callback state essentially
> invalid? Perhaps we can avoid spreading this FUD throughout the handlers

> code...

I can indeed. I hit this a while back before coming in to the conversion.
The bool and close() changes were needed when working with raw FD.

1) Job1 performs an HTTP request. It finishes nicely with keep-alive and
adds it's serverConnection (S) to the pconn pool. Alls good.
2) Job2 arrives, parsed, and server start is scheduled with a conn.
3) The server on (S) FIN's and IdleConnList::read is scheduled.
4) Job2 forwarding selection pop(S) from the IdleConnList for use.
5) the IdleConnList::read gets dialed and removes the fd/conn from pconn.
Not there any more!.
   When (3)/(5) was dialed the read handler was removed, can't cancel
self!.
   Closing as it goes regardless of whether it was still listed there.
6) Job2 goes boom.

Hmm, thinking the conversion over I'm not sure why this (4) happened. (3)
should have make comm_io_callback_pending() true.

>
>> + /**
>> + * Updates destLink to point at an existing open connection if
>> available and retriable.
>> + * Otherwise, return false.
>> + *
>> + * We close available persistent connection if the caller
>> transaction is not
>> + * retriable to avoid having a growing number of open connections
>> when many
>> + * transactions create persistent connections but are not
retriable.
>> + */
>> + bool pop(Comm::ConnectionPointer &serverConn, const char *domain,
>> bool retriable);
>
> Why not always return the [possibly nil] popped connection instead?
> Seems like a simpler and safer interface then separating the serverConn
> value from the pop() status value.

Hmm, will think it over. It started that way, I can't recall the details
why I changed it.

>
>> - int *fds;
>> + Comm::ConnectionPointer *theList;
>
>
> Please use
>
> * std::vector if you want expensive extractions occasional expensive
> insertions due to memory [re]allocations; or
> * std::list if you want small memory allocations/deletions on every
> insertion/extraction.
>
> I would probably use std::list and later either make an intrusive
> Connection list to avoid all allocations or try to MemPool list item
> wrappers (not sure whether such MemPooling is possible).
>

Neither matches what we want exactly, though the order of elements makes
vector appropriate as currently done manually provided a tail extraction
does not deallocate and compact.
The requirement here is fast extraction + search (impacts request service
time) and permitted slow insertions + compaction scans.

The necessity of matching conn::remote IP now has slowed search more than
I'd like.

>
> This container member should also be documented, including the order of
> the elements.

Okay.

>
>
>> + * A pool of persistent connections for a particular service type.
>> + * HTTP servers being one such pool type, ICAP services another etc.
>
> Just like HTTP, ICAP connections are to servers, not services. They are
> identified by the IP:port address and not service URIs. Technically, we
> can maintain one pool for both HTTP and ICAP servers/services and
> everything would work. In some (most?!) environments, it may actually be

> the right thing to do because it will keep the total number of idle
> pconns constant.

I agree, its possible and possibly desirable to combine them. particularly
after this key change in HTTP.

The detail blocking merge is impact on search time for both protocols if
the hash were combined, or one destination server hosted both HTTP and
ICAP. Some high-performance testing may be in order before actioning that.

>
> Besides, PconnPool does not exactly pool individual connections but
> rather IdleConnLists.

It does both. PconnPool is the pool of persistent destinations
IP:port:domain. IdleConnList is the specific set of links that may satisfy
requests for that destination.
 My only change has been to make IdleConnList sensitive to local-end
details for supporting TPROXY and tcp_outgoing_addr properly.

>
> Consider:
> * Manages idle persistent connections to a caller-defined set of
> * servers (e.g., all HTTP servers). Uses a collection of IdleConnLists
> * internally. Controls lists existence and limits the total number of
> * idle connections across the collection.
>
> It would also be nice to document that IdleConnLists assumes that their
> PconnPool parent lives forever, but that is outside your patch scope.

Might as well add a one-liner to the effect while I'm documenting the
members.

Will fix.
Amos
Received on Tue Oct 05 2010 - 00:47:16 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 05 2010 - 12:00:04 MDT