Re: ACLFilledChecklist::checkCallback() not needed? from Alex Rousskov on 2012-06-27 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 27 Jun 2012 09:17:26 -0600

On 06/26/2012 11:02 PM, Amos Jeffries wrote:
> On 27/06/2012 4:17 p.m., Alex Rousskov wrote:
>> Does anybody want to keep ACLFilledChecklist::checkCallback()? Long
>> time ago, that method was used to unlock auth_user_request(s) via
>> AUTHUSERREQUESTUNLOCK() to avoid memory leaks. Today, we use refcounting
>> pointers for auth_user_requests so they should not leak even without an
>> explicit NULL assignment.
>>
>> The method, specifically its questionable conn()->auth_user_request
>> part, may be in the way of authentication because it results in
>> forgetting connection's authentication state under some conditions. I
>> would like to remove it. Any objections?

> As I understand it this is about race conditions between validating the
> credentials and reconfiguring Squid. For example; starting an
> authentication lookup, then reconfiguring without auth helpers, then
> resuming the checklist operation. Stateful auth lookups could hit this
> badly as the validation period is so long and twisted.

Hi Amos,

Judging by the old lock/unlock code and comments, it is more about
memory leaks (when the caller disappears during an async check), but
those concerns are no longer valid because we refcount.

The method is executed immediately before calling the callback of an
async ACL list check. It does just two things currently:

1. sets this->auth_user_request member to NULL. This should be
irrelevant because we "delete this" the object at the end of the method.

2. sets conn()->auth_user_request to NULL. This does have long-term
effects but the ones I know about are all bad. And the comments in that
specific code indicate that other developers have been concerned that it
is doing something wrong (connection state should not change just
because we finished an async ACL list check!).

Neither #1 nor #2 is related to race conditions you describe above, as
far as I can tell, because the code does not use anything other than
those two refcounted auth_user_requests, and with those, it just
decrements the reference counter by setting the pointers to NULL.

IMO, the only danger here is that some other code relies on
conn()->auth_user_request to be NULL after the check. I know we have
code that checks for that member to determine NTLM authentication state,
for example (that is how I found the bug I am trying to fix). I think
any such code should either continue to work correctly OR is buggy on
its own, but I am not an expert on that.

> That said, the UserRequest object is probably the better place to handle
> such races by emitting a failed validation result if the auth system
> goes away or breaks before completion.

Moreover, this method is executed "after completion" so we should not
care about auth system state at that point. The races you talk about are
very real, but I do not think they are relevant here.

> IMO remove it and check/update UserRequest can handle the auth race
> cleanly by itself. If we find other bugs later they need fixing via
> other means.

If I do not hear otherwise, I will remove this code. UserRequest will
not be affected by such removal AFAICT.

Thank you,

Alex.

>> The ACLChecklist::checkCallback() will stay, of course, but it will no
>> longer be virtual.
>>
>> I quoted current and the "original" code below for your convenience.
>>
>>
>> Thank you,
>>
>> Alex.
>>
>> current code (that I would like to remove):
>>
>>> void
>>> ACLFilledChecklist::checkCallback(allow_t answer)
>>> {
>>> debugs(28, 5, HERE << this << " answer=" << answer);
>>>
>>> #if USE_AUTH
>>> /* During reconfigure, we can end up not finishing call
>>> * sequences into the auth code */
>>>
>>> if (auth_user_request != NULL) {
>>> /* the filled_checklist lock */
>>> auth_user_request = NULL;
>>> // It might have been connection based
>>> // In the case of sslBump we need to preserve authentication
>>> info
>>> // XXX: need to re-evaluate this. ACL tests should not be
>>> playing with
>>> // XXX: wider scoped TCP connection state, even if the
>>> helper lookup i!
>>> if (conn() && !conn()->switchedToHttps()) {
>>> conn()->auth_user_request = NULL;
>>> }
>>> }
>>> #endif
>>>
>>> ACLChecklist::checkCallback(answer); // may delete us
>>> }
>>
>> which originated from the middle/unlocking part of this code (see trunk
>> r9509.1.29):
>>
>>> void
>>> ACLChecklist::checkCallback(allow_t answer)
>>> {
>>> PF *callback_;
>>> void *cbdata_;
>>> debugs(28, 3, "ACLChecklist::checkCallback: " << this << "
>>> answer=" << answe
>>>
>>> /* During reconfigure, we can end up not finishing call
>>> * sequences into the auth code */
>>>
>>> if (auth_user_request) {
>>> /* the checklist lock */
>>> AUTHUSERREQUESTUNLOCK(auth_user_request, "ACLChecklist");
>>> /* it might have been connection based */
>>> assert(conn() != NULL);
>>> /*
>>> * DPW 2007-05-08
>>> * yuck, this make me uncomfortable. why do this here?
>>> * ConnStateData will do its own unlocking.
>>> */
>>> AUTHUSERREQUESTUNLOCK(conn()->auth_user_request, "conn via
>>> ACLChecklist"
>>> conn()->auth_type = AUTH_BROKEN;
>>> }
>>>
>>> callback_ = callback;
>>> callback = NULL;
>>>
>>> if (cbdataReferenceValidDone(callback_data, &cbdata_))
>>> callback_(answer, cbdata_);
>>>
>>> delete this;
>>> }
>
Received on Wed Jun 27 2012 - 15:17:50 MDT

This archive was generated by hypermail 2.2.0 : Wed Jun 27 2012 - 12:00:07 MDT