Re: /bzr/squid3/trunk/ r11740: Do not let cache manager requests kill SMP Squid using isOpen() assertion.

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 17 Sep 2011 10:42:15 -0600

On 09/17/2011 02:01 AM, Amos Jeffries wrote:
> On 16/09/11 16:29, Alex Rousskov wrote:
>> On 09/15/2011 08:24 PM, Amos Jeffries wrote:
>>> On 16/09/11 06:16, Alex Rousskov wrote:
>>>> ------------------------------------------------------------
>>>> revno: 11740
>>>> committer: Alex Rousskov<rousskov_at_measurement-factory.com>
>>>> branch nick: trunk
>>>> timestamp: Thu 2011-09-15 12:16:59 -0600
>>>> message:
>>>> Do not let cache manager requests kill SMP Squid using isOpen()
>>>> assertion.
>>>>
>>>> As the comment above the close call implies, we have not imported
>>>> the foreign socket descriptor into our fd_table yet. We must use
>>>> raw close(2), just like the corresponding
>>>> Mgr::Request::Request(msg) code that allocates request.conn, uses
>>>> raw assignment to give that half-baked connection a descriptor.
>>>>
>>>> TODO: This direct manipulation of Connection::fd is ugly, and this
>>>> half-baked connection will most likely cause more [hidden] problems
>>>> down the road. For example, Mgr::Request destructor will assert in
>>>> a similar way if the request object is destroyed before
>>>> Action::respond() is called.>> modified:
>>>> src/mgr/Action.cc
>>
>>
>>> Problem is: why is the FD getting outside the IPC Coordinator/Strand
>>> scope when it is not "valid" according to fd_table?
>>
>>
>> It is not 100% clear where that IPC scope ends in this case, but the
>> problem is not really with the scope borders. The bug was due to the
>> [old] action creator saying "we are not going to import this raw fd so
>> that each specific action can decide whether it actually needs an fd"
>> and the [new] default action code saying "we are going to close this
>> Comm connection" even though the Connection object was in that
>> half-baked state.
>>
>>
>>> - If it can't be read/write/close, it should not really be passed
>>> round
>>> at all.
>>> - If it *can* be read/write/close, the ->close() is the right way to
>>> close and remove the possible handlers.
>>
>> It can be read/written/closed, but it can be read/written/closed using
>> Comm only after the fd is imported into Comm space. The old code did not
>> import and used raw close(2) by default. The new code still did not
>> import but used Comm close by default.
>>
>>
>>> Correlation of these is that the code doing conn->fd = msg.getFd() when
>>> unpacking the TypedMsgHdr should be (a) not, or (b) setting up fd_table
>>> state sufficiently to match the Comm::Connection state.
>>
>
>
> Option (1)
>
>> Yes, that is why I left a TODO note: We can change the old code to
>> import the fd into Comm [and create a full Connection object] as soon as
>> possible. It would be a waste of cycles in most cases, but it will
>> prevent half-baked Connection objects from lingering around and causing
>> more asserts or fd leaks.
>>
>
> option (2)
>> The alternative is to use a "raw fd holder" class/object that would use
>> raw close(2) unless a fully-baked Connection object is created. SSL code
>> uses that approach for many temporary SSL things to avoid leaks.
>
> option (3):
> Comm::Connection::close() could detect the fd_table flag which is
> asserting right? then it could use ::close() instead of comm_close().

I do not like option (3) because it will prevent us from detecting bugs
where a regular fd is closed twice (from two Connection objects) and
other code logic problems.

> option (4):
> Or probably, add another Comm::Connection flag to indicate "partial
> import" for the opening TCP flags. Which triggers the particular closing
> method if still set on ->close().

>> Since this is not a performance-sensitive area of the code, the former
>> (i.e., create a Connection object ASAP) may be better/simpler.
>
> Well, like you keep reminding me. Everything is performance sensitive
> indirectly.

I think that would be an exaggeration. It is difficult to imagine a
realistic scenario where we need to super-optimize cache manager
queries, but if such a need does surface, we can always migrate from
simpler to more complex (option #1 to another option).

> So I'm thinking option (4), with option (1) in specific cases would be
> best.
> Option (3) whenever we fiddle with the fd directly. Option (1)
> whenever its known that the FD will be used. Possibly some time after
> option (3) was done, so unsetting the half-imported flag if set earlier.
> This way we avoid fd_table and related CPU consumptions entirely for
> most of these. And still roughly follow the semantics of what the flags
> mean: things done when opening this FD, so we can repeat the sequence in
> comm_reset() as needed.

I think we should keep it simple and use Option #1 for all cases. The
fewer half-baked things we have, the better (this is not limited to
asserts when closing a descriptor). If you insist, I can live with
option #4.

BTW, I think Ipc::ImportFdIntoComm() is now broken as well. If things go
wrong, it calls conn->close() for a half-baked connection. If you work
on this, please try to find all cases where Connection code changed
close(2) to conn->close(). There may be more such bugs lurking there.

Thank you,

Alex.
Received on Sat Sep 17 2011 - 16:42:57 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 21 2011 - 12:00:03 MDT