Re: [PATCH] Prevent external_acl.cc "inBackground" assertion on queue overloads

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 12 Jan 2013 17:42:49 +1300

On 12/01/2013 1:39 p.m., Alex Rousskov wrote:
> Hello,
>
> I think there is an off-by-one error in the external_acl.cc code
> that kills Squid:
>
>> 2013/01/11 15:13:18.037| external_acl.cc(843) aclMatchExternal: "http://example-3.com/req=debug,uniq4,abort_at=AfterReqHs/resp=debug/": queueing a call.
>> 2013/01/11 15:13:18.037| external_acl.cc(845) aclMatchExternal: "http://example-3.com/req=debug,uniq4,abort_at=AfterReqHs/resp=debug/": return -1.
>> 2013/01/11 15:13:18.037| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'dummyfilter' is -1
>> 2013/01/11 15:13:18.037| Acl.cc(339) matches: ACLList::matches: result is false
>> 2013/01/11 15:13:18.037| Checklist.cc(275) matchNode: 0x99fe8a8 matched=0 async=1 finished=0
>> 2013/01/11 15:13:18.037| Checklist.cc(312) matchNode: 0x99fe8a8 going async
>> 2013/01/11 15:13:18.037| Acl.cc(61) FindByName: ACL::FindByName 'dummyfilter'
>> 2013/01/11 15:13:18.037| Checklist.cc(131) asyncInProgress: ACLChecklist::asyncInProgress: 0x99fe8a8 async set to 1
>> 2013/01/11 15:13:18.037| external_acl.cc(1407) Start: fg lookup in 'dummyfilter' for 'http://example-3.com/req=debug,uniq4,abort_at=AfterReqHs/resp=debug/'
>> 2013/01/11 15:13:18.037| external_acl.cc(1450) Start: 'dummyfilter' queue is too long
>> 2013/01/11 15:13:18.037| assertion failed: external_acl.cc:1451: "inBackground"
>
> The enqueue check for external ACL lookups is inconsistent with the
> final queue length check in ExternalACLLookup::Start(). The former
> allows adding to the already full (but not yet overflowing) queue while
> the latter rightfully(?) asserts that the queue should not overflow.
>
> Should the queue_size check be relaxed to use 2*n_running logic, used by
> some other seemingly similar checks?
>
> And should we use n_running or n_active for these checks?

I think we should get away from the 2*N logic entirely and do something
a bit safer.
As it stands n_active AFAIK is the intended measure. n_running includes
helpers being shutdown, so when they disappear the queue magically
overflows. However if any one of the n_active helpers has a problem they
get removed from n_active and the queue again magically overflows.

Perhapse something like 2*n_max would be better, at least that is a
fixed lengh of queue independent of what is happening with the helpers.

> TODO: There are approximately ten places where we check queue sizes
> against n_running or n_active. Many of those checks are slightly
> different. This inconsistency should be removed, probably by adding a
> few standard methods for a few types of checks that are actually needed.
> Patches welcome!

+1 on that idea.

Amos
Received on Sat Jan 12 2013 - 04:43:06 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 12 2013 - 12:00:10 MST