Re: [MERGE] c++-refactor HttpHdrCc

From: Kinkie <gkinkie_at_gmail.com>
Date: Wed, 28 Sep 2011 15:23:46 +0200

On Tue, Sep 27, 2011 at 6:24 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 09/27/2011 12:23 AM, Kinkie wrote:
>
>>> The "may not be null-terminated" part can be removed, but it would be
>>> useful to preserve it as it will allow us to use this class for many
>>> other things. Please note that if you do preserve it, you would need to
>>> fix comparison functions and add a stream output operator (among other,
>>> minor changes).
>>
>> Preserved.
>> The current iteration of the class is intentionally write-only (can be
>> only initialized and compared, but not read, and its data members are
>> private.
>> This is consistent with the function it needs to perform now.
>> Should I lift this limitation?
>
> I do not consider private members a limitation if we do not need access
> to them. We can always make them public later if needed.
>
> You should document those data members though.

Ok, done.

>> +    bool operator< ( const StringArea &s) const { return theLen < s.theLen || strncmp(theStart,s.theStart,theLen) < 0; }
>
> This is still broken, on two counts:
>
> A) The strncmp call may segfault while accessing wrong memory if theLen
> is greater than s.theLen.
>
> B) You need to make sure that if s1 is less than s2, then s2 is greater
> than or equal to s1. Consider this test case with the current code:
>
>  "bb" < "aaa" (because of the length test)
>
> but the reverse is also true:
>
>  "aaa" < "bb" (because of the strncmp test)
>
> Your searches will not work correctly given the above conflict.
>
> I suspect you want to do this instead:
>
>    return theLen < s.theLen || (theLen == s.theLen && strcmp(...) < 0);

Thanks, fixed.

> I would also replace strncmp() with memcmp() so that the same class can
> be used for any buffer area. Once the less operator is fixed, memcmp()
> would produce the same result as strncmp() and might be a tad faster.

Yes, you're right. Thanks

>>>> +class strblob {
>>>
>>> The class name must be changed to follow Squid style and, hopefully have
>>> more meaning. Consider "StringArea". You may want to move it to
>>> src/base/StringArea.h.
>>
>> Done.
>
> Please note that v5 of your patch is missing StringArea.h

Artifact of the process to create the diff with more context, I suspect.

>>>> +    bool operator< ( const strblob &s2) const { return strncmp(thePtr,s2.thePtr,theLen) < 0; }
>>>
>>> This will crash if s2.theLen is smaller than this->theLen. Needs more
>>> len-specific checks. Note that for your current purposes, you can
>>> declare shorter strings smaller even if they are not so in the dictionary.
>>
>> Done.
>>
>>> This is one reason why such comparison may need to be moved outside the
>>> object. Another reason is that we will probably want case-insensitive
>>> comparison in other contexts.
>>
>> Can't be done without making the pointed-to string readable. Will do
>> if you confirm that you think that's the best course forward.
>
>
> Your call. This can be done later if we need different comparison operators.

Let's do it later.

>>>> +        const strblob tmpstr(item,nlen);
>>>> +        const CcNameToIdMap_t::iterator i=CcNameToIdMap.find(tmpstr);
>>>
>>> Consider s/tmpstr/lookup/ or some such. You can avoid this named object
>>> altogether if you just create strblob when calling find(). The iterator
>>> should be constant, I think. Here is a sketch:
>>>
>>>    const CcNameToIdMap_t::const_iterator match =
>>>        CcNameToIdMap.find(strblob(item, nlen));
>>
>> std::map.find takes a const reference argument. With an unnamed
>> temporary it wouldn't compile.
>
> Why not? It does compile fine for me, but perhaps your compiler is
> better. What error do you get?

It works for me too. I don't remember the error I got.

>>>> +                cc->setMaxAge(MAX_AGE_UNSET);
>>>
>>> So it is set or unset? If you do not want to add an explicit clear
>>> MaxAge() method, how about s/MAX_AGE_UNSET/MAX_AGE_UNKNOWN/?
>>
>> Sure.
>
> I still see UNSET in v5 of the patch.

Changed to UNKNOWN. Also changed all tests that matched against it to
mask-based tests, and added an assert if someone tries to use the
set() method on types with more specific setters. I don't like this
though, as it is not detectable at compile time and imposes a burden
on the caller (which is so far only one).

>>> Similar for S_MAXAGE_UNSET, MIN_FRESH_UNSET, MAX_STALE_ALWAYS, etc. (see
>>> the blob below for the full list).
>>>
>>>
>>>> +     static const int32_t MAX_AGE_UNSET=-1; //max-age is unset
>>>> +     static const int32_t S_MAXAGE_UNSET=-1; //s-maxage is unset
>>>> +     static const int32_t MAX_STALE_UNSET=-1; //max-stale is unset
>>>> +     static const int32_t MAX_STALE_ALWAYS=-2; //max-stale is set to no value
>>>> +     static const int32_t STALE_IF_ERROR_UNSET=-1; //stale_if_error is unset
>>>> +     static const int32_t MIN_FRESH_UNSET=-1; //min_fresh is unset
>>>
>>> The old types were "int". You are making them more specific now with
>>> "int32_t". I am not sure such a change is withing this patch scope, but
>>> even if it is, perhaps you should add a HttpCcDirective typedef or
>>> similar to avoid spreading int32_t throughout the code.
>>
>> I'm simply refactoring what's in there.
>
> No, you are also changing their type from "int" to "int32_t", exposing
> yourself to more criticism about the type choice, typedefs, etc.

In other words, I haven't managed to avoid overdoing it have I?

>> They used to be initialized to
>> -1, as a duplicate validity check to mask (see below for more on
>> mask). even worse, they are (in trunk) tested inconsistently:
>> sometimes with EBIT_TEST on mask, sometimes with checks on >=0 on the
>> value.
>> trunk's HttpHdrCc::parse has the clearest picture on the current conventions:
>> for bool (present/not present) header values, EBIT_SET(mask) is used
>> for integer header values, EBIT_SET(mask) + assigning value is used.
>> In case of parse error on the value, both are cleared
>> for optional-integer header values, EBIT_SET(mask) + assigning value
>> is used. The special "-1" value means both "not set" and "set but not
>> specified", to tell the difference the mask is used.
>>
>> What I've done is to define explicit constants to tell the three cases
>> apart without needing to look up the map, and used getters/setters for
>> those headers in order to enfoce consistency. Other headers are still
>> set by looking at mask (although I'm inclined to define a generic
>> get/set combo for bool values)
>
> It is a shame that most of the above knowledge is not present in your
> changes. Neither the static constant names nor the getters/setters relay
> the above conventions. This will most likely lead to more bugs where
> somebody will copy a mask-testing code when they should have copied the
> value-checking code or vice versa. More on that below.

>
>
>>>> +        if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {
>>> ...
>>>> +             entry->mem_obj->getReply()->cache_control->getStaleIfError() != HttpHdrCc::STALE_IF_ERROR_UNSET &&
>>> ...
>>>> +            if (cc->getMaxAge() != HttpHdrCc::MAX_AGE_UNSET) {
>>>
>>> These and other similar comparisons now look even worse than they were.
>>> How about adding HttpHdrCc::hasMaxAge() and such?
>>
>> Could be done, maybe reintroducing the mask tests?
>>
>> API would then tentatively be
>> class HttpHdrCc {
>>   //generic getters/setters, for bool header values
>>   set(http_hdr_enum);
>>   clear(http_hdr_enum);
>>   bool get ((http_hdr_enum);
>>
>>   //specific getters/setters, for integer values. Some special values
>> need to remain though.
>>   getMinFresh()
>>   setMinFresh()
>>   clearMinFresh()
>
> Fine with me, although I would probably use explicit names for all
> directives, add dedicated has*() tests, and drop "get" (and probably
> "set") prefixes:
>
> public:
>    /* boolean (present, not present) directives */
>    bool hasFoo() const { return hasDirective(ccFoo); }
>    bool foo() const { return foo_; }
>    void foo(bool beSet) { setDirective(ccFoo, foo_, beSet); }
>    void clearFoo() { clearDirective(ccFoo, foo_); }
>
>    /* integer-valued directives */
>    bool hasBar() const { return hasDirective(ccBar); }
>    int32_t bar() const { return bar_; }
>    void bar(int32_t value) { setDirective(ccBar, bar_, beSet); }
>    void clearBar() { clearDirective(ccBar, bar_); }
>
> protected:
>    /* these know that all directives are stored as integers */
>    bool hasDirective(int id) const { return EBIT_TEST(mask, id); }
>    void setDirective(int id, int32_t &member, int32_t value) { ... }
>    void clearDirective(int id, int32_t &member) { ... }

Done. (set/clear are a single method with argument, but it's private anyways)

> This adds a few more lines, but the added lines are simple, there is no
> code duplication, and the whole thing makes the caller's code easier to
> read (and may even detect usage bugs based on your "current usage"
> findings discussed above!).

Done.
Attached is version 6; run-tested.
Includes StringArea.h this time.

Thanks

-- 
    /kinkie

Received on Wed Sep 28 2011 - 13:23:54 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 28 2011 - 12:00:04 MDT