Re: [MERGE] wordlist refactoring

From: Kinkie <gkinkie_at_gmail.com>
Date: Tue, 19 Aug 2008 17:17:01 +0200

On Fri, Aug 15, 2008 at 9:13 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> On Wed, Aug 13, 2008 at 9:11 AM, Amos Jeffries <squid3_at_treenet.co.nz>
>> wrote:
>>> Kinkie wrote:
>>>>
>>>> Hi all,
>>>> this patch is an attempt at streamlining the wordlist implementation
>>>> in
>>>> 3.HEAD, by:
>>>> - OO-izing it [1]
>>>> - providing an unit-test for it
>>>> - giving it a couple of convenience functions
>>>> - documenting it
>>>>
>>>> It's still a work-in-progress, but in the end a wordlist would probably
>>>> be
>>>> better implemented by using a set of Strings anyways.
>>>> It's however a step forward already, since it gives wordlist callers
>>>> better c++ semantics than the current implementation in HEAD.
>>>>
>>>> The patch has been compile-tested, unit-tested, test-suite'd and
>>>> run-tested.
>>>>
>>>> Please review and hopefully apply.
>>>>
>>>> Kinkie
>>>>
>>>
>>> NP: all I've done is read the patch and mention things that stand out.
>>> Some
>>> of the below need fixing, others just explaining why.
>>
>> Sure. Thanks for taking the time to check it.
>>
>>>
>>> 1)
>>>
>>> in src/Makefile.am
>>> @@ -1188,6 +1188,7 @@
>>> tests/testURL \
>>> + tests/testWordlist
>>> @STORE_TESTS@
>>>
>>> a) Missing a '\' on that line.
>>> b) How did this pass build tests? particularly the new testbed ones.
>>
>> You're right. Maybe the tests failed silently...
>>
>> Fixed; re-ran the tests, seems to be fine.. Unfortunately the
>> significant output gets drowned in huge noise from libtool and gcc. Is
>> there any way to silence it, like for instance is done in the Linux
>> kernel? I'm not an automake expert..
>
> I'm looking at it. Theres output being sprayed in all directions from
> sub-scripts.
>
>>
>>> 2)
>>> if (authenticate) {
>>> delete authenticate;
>>> authenticate = NULL;
>>> }
>>> is equivalent to:
>>> safe_delete(authenticate);
>>> provided by the shared squid compat library headers.
>>
>> Are you sure? I can't find any safe_delete string in the whole
>> sources; maybe you mean safe_free? If so, it's probably not a good
>> choice to mix different allocators and destructors (new / xxfree),
>> especially on platforms with very complex heap managers such as
>> Windows.
>
> I meant safe_delete, as you say its not good to mix alloc/deallocators.
> My brain going again I think. I'm sure I saw it defined somewhere.
> Oh well, its maybe needed anyway.
>
>>
>>> same for: Config.Program.redirect
>>> same for: dnsservers->cmdline
>>> same for: err->ftp.server_msg
>>> same for: p->cmdline and p->arguments in external_acl line 194 & 574
>>>
>>> in fact everywhere you've gone delete. Whether you are using if()
>>> already or
>>> not. (The non-use of if() before delete is unsafe anyway).
>>
>> I tried using it only where I felt it was sensible. In the other cases
>> the deleted member was expected to be there, and by not testing I'd
>> prefer to expose bugs in the initialization code paths.
>>
>> In other words, there's two choices:
>> where pointer is optionally initialized:
>>
>> if (pointer)
>> delete pointer;
>>
>> where pointer is mandatorily pointing to something meaningful as part
>> of the constructor:
>> assert(pointer);
>> delete pointer;
>>
>> I just implicitly assert()ed by not specifying it, it'd bomb anyways.
>
> Understood. Just be sure it bombs with a good trace.
>
>>
>> It's a matter of style, so I'd like to have your feedback on this. My
>> personal take is that I'd rather have squid bomb on broken object
>> initialization rather than hiding the bug every time a member is
>> accessed (after all in the end it's more efficient to make sure that
>> something is there rather than continuously check whether it is, at
>> least in rather straightforward cases such as these)
>
> My personal preference is to always check. If something goes badly wrong
> it will pass the destruct check and bomb anyways. Having a must-exist
> requirement places a lot of burden on the callers.
>
> The issue is a matter of efficiency, whether the pre-condition causes more
> OPs allocating data for these objects than it saves from elimination of
> duplicate data validation.
>
> (I'm hoping we can come up with a good benchmark tool for this type of
> thing at the meet).
>
>>
>>> 3)
>>> There is a lot of 'new' allocation going on thats followed by 'delete'
>>> at
>>> the end of the parent scope.
>>> ie RemovalPolicySettings member pointer, adaptation ServiceGroup
>>> functions
>>> On the face of the patch without digging into the usage code, they
>>> appear to
>>> be one-use memory
>>> Why are these not able to be plain non-pointer member or local variable
>>> now?
>>
>> Should be doable in some cases, but it exposes some warts, i.e. in
>> parse_wordlist. In fact, I'd rather attack those from that angle.
>>
>
> Understood. The mess will need removing at some point. From your later
> talk it looks like that is your long-term aim.
> The few cases that don't have warts should be fixed in this patch anyway
> to bump the performance gains sooner rather than later.
>
>>> 4)
>>> I'm sure the ACL Walkee's don't need wordlist**, now that you are
>>> pre-allocating the wordlist.
>>
>> yup. See comment on parse_wordlist.
>>
>>>
>>> 5)
>>> === modified file 'src/client_side_request.cc'
>>> --- src/client_side_request.cc 2008-06-09 01:58:19 +0000
>>> +++ src/client_side_request.cc 2008-08-09 16:13:04 +0000
>>> @@ -666,7 +666,7 @@
>>> const char *url = http->uri;
>>> HttpRequest *request = http->request;
>>> HttpRequestMethod method = request->method;
>>> - const wordlist *p = NULL;
>>> + const wordlist *p;
>>>
>>>
>>> Why is this small piece of pointer-safety being removed? '*p' is not
>>> guaranteed to be initialized correctly outside the defining function
>>> scope.
>>
>> p is assigned to 40-some lines later, before being dereferenced anywhere.
>>
>> Removing this initialization is, however, intentional: calling member
>> functions on NULL is going to fail anyways, and that's what would be
>> done to manipulate an uninitialized object. On the other hand not
>> assigning the variable gives a chance to the compiler's static code
>> path checks to detect uninitialized variables usage,
>
> Okay.
>
>>
>>> 6)
>>> === modified file 'src/tests/stub_tools.cc'
>>> --- src/tests/stub_tools.cc 2006-09-15 02:13:23 +0000
>>> +++ src/tests/stub_tools.cc 2008-08-11 14:54:43 +0000
>>> @@ -45,3 +45,4 @@
>>> {
>>> fatal ("Not implemented");
>>> }
>>> +
>>>
>>> can be reverted from the merge.
>>
>> Done.
>>
>>> 7)
>>> for the documentation you don't need all the: "\ingroup wordlist"
>>> class methods are automatically grouped.
>>
>> Ok, done. Not an expert in doxygen, and I just copied from elsewhere.
>>
>>> 8) as mentioned the nastiness in wordlist old behavior. That should go
>>> if
>>> possible during this cleanup. IIRC it was mainly the O(N+1)
>>> insertion/append
>>> algorithm, a simple tail pointer should speed that up to O(1).
>>> Adrian may have other bits he noticed, or you might have seen others
>>> already
>>> yourself.
>>
>> There's problems there, due to the fact that the current wordlist
>> object is a C-ism, filling in both roles of a wordlist and
>> wordlistNode. adding a tail-pointer would raise each node's memory use
>> by 50%. On the other hand, splitting the class into wordlist,
>> wordlistNode and wordlistIterator is quite a huge coding effort for
>> something which should at the end of it all become something like
>>
>> typedef boost::intrusive::slist<String> wordlist;
>>
>> Getting to that point (not just for wordlist, but for all
>> genericizable classes) is my long-term goal.
>
> Ah, understood.
>
>>
>> The focus of my refactoring effort was (and unless there's consensus
>> it should change, remains) to work on the callers' side rather than
>> trying to be too clever on the callees' side.
>>
>>
>> I'm attaching a new version of the patch, addressing most concerns. It
>> obsoletes the previous one.

I all,
  I've decided to retire this merge request as development is
continuing with different goals.
Due to the change in scope, I've also renamed the branch; it's now
accessible at

https://code.launchpad.net/~kinkie/squid/wordlist-refactor

I'll launch a new merge proposal as soon as I'm satisfied with the
state of things.

-- 
    /kinkie
Received on Tue Aug 19 2008 - 15:17:08 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 20 2008 - 12:00:05 MDT