Re: [MERGE] wordlist refactoring

From: Kinkie <gkinkie_at_gmail.com>
Date: Wed, 13 Aug 2008 17:59:55 +0200

On Wed, Aug 13, 2008 at 9:11 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> Kinkie wrote:
>>
>> Hi all,
>> this patch is an attempt at streamlining the wordlist implementation in
>> 3.HEAD, by:
>> - OO-izing it [1]
>> - providing an unit-test for it
>> - giving it a couple of convenience functions
>> - documenting it
>>
>> It's still a work-in-progress, but in the end a wordlist would probably be
>> better implemented by using a set of Strings anyways.
>> It's however a step forward already, since it gives wordlist callers
>> better c++ semantics than the current implementation in HEAD.
>>
>> The patch has been compile-tested, unit-tested, test-suite'd and
>> run-tested.
>>
>> Please review and hopefully apply.
>>
>> Kinkie
>>
>
> NP: all I've done is read the patch and mention things that stand out. Some
> of the below need fixing, others just explaining why.

Sure. Thanks for taking the time to check it.

>
> 1)
>
> in src/Makefile.am
> @@ -1188,6 +1188,7 @@
> tests/testURL \
> + tests/testWordlist
> @STORE_TESTS@
>
> a) Missing a '\' on that line.
> b) How did this pass build tests? particularly the new testbed ones.

You're right. Maybe the tests failed silently...

Fixed; re-ran the tests, seems to be fine.. Unfortunately the
significant output gets drowned in huge noise from libtool and gcc. Is
there any way to silence it, like for instance is done in the Linux
kernel? I'm not an automake expert..

> 2)
> if (authenticate) {
> delete authenticate;
> authenticate = NULL;
> }
> is equivalent to:
> safe_delete(authenticate);
> provided by the shared squid compat library headers.

Are you sure? I can't find any safe_delete string in the whole
sources; maybe you mean safe_free? If so, it's probably not a good
choice to mix different allocators and destructors (new / xxfree),
especially on platforms with very complex heap managers such as
Windows.

> same for: Config.Program.redirect
> same for: dnsservers->cmdline
> same for: err->ftp.server_msg
> same for: p->cmdline and p->arguments in external_acl line 194 & 574
>
> in fact everywhere you've gone delete. Whether you are using if() already or
> not. (The non-use of if() before delete is unsafe anyway).

I tried using it only where I felt it was sensible. In the other cases
the deleted member was expected to be there, and by not testing I'd
prefer to expose bugs in the initialization code paths.

In other words, there's two choices:
where pointer is optionally initialized:

if (pointer)
  delete pointer;

where pointer is mandatorily pointing to something meaningful as part
of the constructor:
assert(pointer);
delete pointer;

I just implicitly assert()ed by not specifying it, it'd bomb anyways.

It's a matter of style, so I'd like to have your feedback on this. My
personal take is that I'd rather have squid bomb on broken object
initialization rather than hiding the bug every time a member is
accessed (after all in the end it's more efficient to make sure that
something is there rather than continuously check whether it is, at
least in rather straightforward cases such as these)

> 3)
> There is a lot of 'new' allocation going on thats followed by 'delete' at
> the end of the parent scope.
> ie RemovalPolicySettings member pointer, adaptation ServiceGroup functions
> On the face of the patch without digging into the usage code, they appear to
> be one-use memory
> Why are these not able to be plain non-pointer member or local variable now?

Should be doable in some cases, but it exposes some warts, i.e. in
parse_wordlist. In fact, I'd rather attack those from that angle.

> 4)
> I'm sure the ACL Walkee's don't need wordlist**, now that you are
> pre-allocating the wordlist.

yup. See comment on parse_wordlist.

>
> 5)
> === modified file 'src/client_side_request.cc'
> --- src/client_side_request.cc 2008-06-09 01:58:19 +0000
> +++ src/client_side_request.cc 2008-08-09 16:13:04 +0000
> @@ -666,7 +666,7 @@
> const char *url = http->uri;
> HttpRequest *request = http->request;
> HttpRequestMethod method = request->method;
> - const wordlist *p = NULL;
> + const wordlist *p;
>
>
> Why is this small piece of pointer-safety being removed? '*p' is not
> guaranteed to be initialized correctly outside the defining function scope.

p is assigned to 40-some lines later, before being dereferenced anywhere.

Removing this initialization is, however, intentional: calling member
functions on NULL is going to fail anyways, and that's what would be
done to manipulate an uninitialized object. On the other hand not
assigning the variable gives a chance to the compiler's static code
path checks to detect uninitialized variables usage,

> 6)
> === modified file 'src/tests/stub_tools.cc'
> --- src/tests/stub_tools.cc 2006-09-15 02:13:23 +0000
> +++ src/tests/stub_tools.cc 2008-08-11 14:54:43 +0000
> @@ -45,3 +45,4 @@
> {
> fatal ("Not implemented");
> }
> +
>
> can be reverted from the merge.

Done.

> 7)
> for the documentation you don't need all the: "\ingroup wordlist"
> class methods are automatically grouped.

Ok, done. Not an expert in doxygen, and I just copied from elsewhere.

> 8) as mentioned the nastiness in wordlist old behavior. That should go if
> possible during this cleanup. IIRC it was mainly the O(N+1) insertion/append
> algorithm, a simple tail pointer should speed that up to O(1).
> Adrian may have other bits he noticed, or you might have seen others already
> yourself.

There's problems there, due to the fact that the current wordlist
object is a C-ism, filling in both roles of a wordlist and
wordlistNode. adding a tail-pointer would raise each node's memory use
by 50%. On the other hand, splitting the class into wordlist,
wordlistNode and wordlistIterator is quite a huge coding effort for
something which should at the end of it all become something like

typedef boost::intrusive::slist<String> wordlist;

Getting to that point (not just for wordlist, but for all
genericizable classes) is my long-term goal.

The focus of my refactoring effort was (and unless there's consensus
it should change, remains) to work on the callers' side rather than
trying to be too clever on the callees' side.

I'm attaching a new version of the patch, addressing most concerns. It
obsoletes the previous one.

-- 
 /kinkie

Received on Wed Aug 13 2008 - 16:24:17 MDT

This archive was generated by hypermail 2.2.0 : Sat Aug 23 2008 - 12:00:06 MDT