Re: [MERGE] wordlist refactoring

From: Adrian Chadd <adrian_at_freebsd.org>
Date: Wed, 20 Aug 2008 07:01:31 +0800

I'll be revisiting the wordlist and intlist code soon in Cacheboy;
perhaps you may get some inspiration from that.

(Hm, can I invent a macro-based templating language to implement
<type>list's in C.. hm.)

Adrian

2008/8/19 Kinkie <gkinkie_at_gmail.com>:
> On Fri, Aug 15, 2008 at 9:13 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>>> On Wed, Aug 13, 2008 at 9:11 AM, Amos Jeffries <squid3_at_treenet.co.nz>
>>> wrote:
>>>> Kinkie wrote:
>>>>>
>>>>> Hi all,
>>>>> this patch is an attempt at streamlining the wordlist implementation
>>>>> in
>>>>> 3.HEAD, by:
>>>>> - OO-izing it [1]
>>>>> - providing an unit-test for it
>>>>> - giving it a couple of convenience functions
>>>>> - documenting it
>>>>>
>>>>> It's still a work-in-progress, but in the end a wordlist would probably
>>>>> be
>>>>> better implemented by using a set of Strings anyways.
>>>>> It's however a step forward already, since it gives wordlist callers
>>>>> better c++ semantics than the current implementation in HEAD.
>>>>>
>>>>> The patch has been compile-tested, unit-tested, test-suite'd and
>>>>> run-tested.
>>>>>
>>>>> Please review and hopefully apply.
>>>>>
>>>>> Kinkie
>>>>>
>>>>
>>>> NP: all I've done is read the patch and mention things that stand out.
>>>> Some
>>>> of the below need fixing, others just explaining why.
>>>
>>> Sure. Thanks for taking the time to check it.
>>>
>>>>
>>>> 1)
>>>>
>>>> in src/Makefile.am
>>>> @@ -1188,6 +1188,7 @@
>>>> tests/testURL \
>>>> + tests/testWordlist
>>>> @STORE_TESTS@
>>>>
>>>> a) Missing a '\' on that line.
>>>> b) How did this pass build tests? particularly the new testbed ones.
>>>
>>> You're right. Maybe the tests failed silently...
>>>
>>> Fixed; re-ran the tests, seems to be fine.. Unfortunately the
>>> significant output gets drowned in huge noise from libtool and gcc. Is
>>> there any way to silence it, like for instance is done in the Linux
>>> kernel? I'm not an automake expert..
>>
>> I'm looking at it. Theres output being sprayed in all directions from
>> sub-scripts.
>>
>>>
>>>> 2)
>>>> if (authenticate) {
>>>> delete authenticate;
>>>> authenticate = NULL;
>>>> }
>>>> is equivalent to:
>>>> safe_delete(authenticate);
>>>> provided by the shared squid compat library headers.
>>>
>>> Are you sure? I can't find any safe_delete string in the whole
>>> sources; maybe you mean safe_free? If so, it's probably not a good
>>> choice to mix different allocators and destructors (new / xxfree),
>>> especially on platforms with very complex heap managers such as
>>> Windows.
>>
>> I meant safe_delete, as you say its not good to mix alloc/deallocators.
>> My brain going again I think. I'm sure I saw it defined somewhere.
>> Oh well, its maybe needed anyway.
>>
>>>
>>>> same for: Config.Program.redirect
>>>> same for: dnsservers->cmdline
>>>> same for: err->ftp.server_msg
>>>> same for: p->cmdline and p->arguments in external_acl line 194 & 574
>>>>
>>>> in fact everywhere you've gone delete. Whether you are using if()
>>>> already or
>>>> not. (The non-use of if() before delete is unsafe anyway).
>>>
>>> I tried using it only where I felt it was sensible. In the other cases
>>> the deleted member was expected to be there, and by not testing I'd
>>> prefer to expose bugs in the initialization code paths.
>>>
>>> In other words, there's two choices:
>>> where pointer is optionally initialized:
>>>
>>> if (pointer)
>>> delete pointer;
>>>
>>> where pointer is mandatorily pointing to something meaningful as part
>>> of the constructor:
>>> assert(pointer);
>>> delete pointer;
>>>
>>> I just implicitly assert()ed by not specifying it, it'd bomb anyways.
>>
>> Understood. Just be sure it bombs with a good trace.
>>
>>>
>>> It's a matter of style, so I'd like to have your feedback on this. My
>>> personal take is that I'd rather have squid bomb on broken object
>>> initialization rather than hiding the bug every time a member is
>>> accessed (after all in the end it's more efficient to make sure that
>>> something is there rather than continuously check whether it is, at
>>> least in rather straightforward cases such as these)
>>
>> My personal preference is to always check. If something goes badly wrong
>> it will pass the destruct check and bomb anyways. Having a must-exist
>> requirement places a lot of burden on the callers.
>>
>> The issue is a matter of efficiency, whether the pre-condition causes more
>> OPs allocating data for these objects than it saves from elimination of
>> duplicate data validation.
>>
>> (I'm hoping we can come up with a good benchmark tool for this type of
>> thing at the meet).
>>
>>>
>>>> 3)
>>>> There is a lot of 'new' allocation going on thats followed by 'delete'
>>>> at
>>>> the end of the parent scope.
>>>> ie RemovalPolicySettings member pointer, adaptation ServiceGroup
>>>> functions
>>>> On the face of the patch without digging into the usage code, they
>>>> appear to
>>>> be one-use memory
>>>> Why are these not able to be plain non-pointer member or local variable
>>>> now?
>>>
>>> Should be doable in some cases, but it exposes some warts, i.e. in
>>> parse_wordlist. In fact, I'd rather attack those from that angle.
>>>
>>
>> Understood. The mess will need removing at some point. From your later
>> talk it looks like that is your long-term aim.
>> The few cases that don't have warts should be fixed in this patch anyway
>> to bump the performance gains sooner rather than later.
>>
>>>> 4)
>>>> I'm sure the ACL Walkee's don't need wordlist**, now that you are
>>>> pre-allocating the wordlist.
>>>
>>> yup. See comment on parse_wordlist.
>>>
>>>>
>>>> 5)
>>>> === modified file 'src/client_side_request.cc'
>>>> --- src/client_side_request.cc 2008-06-09 01:58:19 +0000
>>>> +++ src/client_side_request.cc 2008-08-09 16:13:04 +0000
>>>> @@ -666,7 +666,7 @@
>>>> const char *url = http->uri;
>>>> HttpRequest *request = http->request;
>>>> HttpRequestMethod method = request->method;
>>>> - const wordlist *p = NULL;
>>>> + const wordlist *p;
>>>>
>>>>
>>>> Why is this small piece of pointer-safety being removed? '*p' is not
>>>> guaranteed to be initialized correctly outside the defining function
>>>> scope.
>>>
>>> p is assigned to 40-some lines later, before being dereferenced anywhere.
>>>
>>> Removing this initialization is, however, intentional: calling member
>>> functions on NULL is going to fail anyways, and that's what would be
>>> done to manipulate an uninitialized object. On the other hand not
>>> assigning the variable gives a chance to the compiler's static code
>>> path checks to detect uninitialized variables usage,
>>
>> Okay.
>>
>>>
>>>> 6)
>>>> === modified file 'src/tests/stub_tools.cc'
>>>> --- src/tests/stub_tools.cc 2006-09-15 02:13:23 +0000
>>>> +++ src/tests/stub_tools.cc 2008-08-11 14:54:43 +0000
>>>> @@ -45,3 +45,4 @@
>>>> {
>>>> fatal ("Not implemented");
>>>> }
>>>> +
>>>>
>>>> can be reverted from the merge.
>>>
>>> Done.
>>>
>>>> 7)
>>>> for the documentation you don't need all the: "\ingroup wordlist"
>>>> class methods are automatically grouped.
>>>
>>> Ok, done. Not an expert in doxygen, and I just copied from elsewhere.
>>>
>>>> 8) as mentioned the nastiness in wordlist old behavior. That should go
>>>> if
>>>> possible during this cleanup. IIRC it was mainly the O(N+1)
>>>> insertion/append
>>>> algorithm, a simple tail pointer should speed that up to O(1).
>>>> Adrian may have other bits he noticed, or you might have seen others
>>>> already
>>>> yourself.
>>>
>>> There's problems there, due to the fact that the current wordlist
>>> object is a C-ism, filling in both roles of a wordlist and
>>> wordlistNode. adding a tail-pointer would raise each node's memory use
>>> by 50%. On the other hand, splitting the class into wordlist,
>>> wordlistNode and wordlistIterator is quite a huge coding effort for
>>> something which should at the end of it all become something like
>>>
>>> typedef boost::intrusive::slist<String> wordlist;
>>>
>>> Getting to that point (not just for wordlist, but for all
>>> genericizable classes) is my long-term goal.
>>
>> Ah, understood.
>>
>>>
>>> The focus of my refactoring effort was (and unless there's consensus
>>> it should change, remains) to work on the callers' side rather than
>>> trying to be too clever on the callees' side.
>>>
>>>
>>> I'm attaching a new version of the patch, addressing most concerns. It
>>> obsoletes the previous one.
>
>
> I all,
> I've decided to retire this merge request as development is
> continuing with different goals.
> Due to the change in scope, I've also renamed the branch; it's now
> accessible at
>
> https://code.launchpad.net/~kinkie/squid/wordlist-refactor
>
> I'll launch a new merge proposal as soon as I'm satisfied with the
> state of things.
>
> --
> /kinkie
>
>
Received on Tue Aug 19 2008 - 23:01:35 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 20 2008 - 12:00:05 MDT