Re: [MERGE] c++-refactor HttpHdrCc

From: Kinkie <gkinkie_at_gmail.com>
Date: Tue, 27 Sep 2011 08:23:31 +0200

Hi Alex,

[...]

>>> On the other hand, you now deallocate and allocate a new SquidString for
>>> every CC directive just to find that directive ID so the total cost may
>>> be not be lower. I think we can do much better, and even if we cannot or
>>> will not, I do not think we should commit the "optimization" part of
>>> this change until proper performance regression testing.
>>
>> Point taken. I've implemented a small helper class (basically a const
>> char*+length) to support the lookup in a very cheap way. I'd like to
>> get your feedback on it before trying to extend its use (it's
>> currently an in-cc-file definition).
>
> You are on the right track with this class. It can be useful for a lot
> of things (we use similar classes in other projects). The new class
> needs some polishing though:
[...]
> The class data is copied. Consider:
>
> /** A char* plus length combination. Useful for temporary storing
>  * and quickly looking up strings.
>  *
>  * The pointed-to string may not be null-terminated.
>  *
>  * Not meant for stand-alone storage. Validity of the
>  * pointed-to string is responsibility of the caller.
>  **/

Done with an additional explicit statement that pointed-to data is not copied.

> The "may not be null-terminated" part can be removed, but it would be
> useful to preserve it as it will allow us to use this class for many
> other things. Please note that if you do preserve it, you would need to
> fix comparison functions and add a stream output operator (among other,
> minor changes).

Preserved.
The current iteration of the class is intentionally write-only (can be
only initialized and compared, but not read, and its data members are
private.
This is consistent with the function it needs to perform now.
Should I lift this limitation?

>> +class strblob {
>
> The class name must be changed to follow Squid style and, hopefully have
> more meaning. Consider "StringArea". You may want to move it to
> src/base/StringArea.h.

Done.

>> +    const char *thePtr;
>> +    size_t theLen;
>
> I would s/thePtr/theStart/ but it is your call.

Done.

>> +    bool operator==(strblob &s) const { return theLen==s.theLen && 0==strncmp(thePtr,s.thePtr,theLen); }
>
> The parameter "s" should be declared const.

Done.

> Value result specific before the computation putting avoid please:
>    strcmp() == 0.
>
> Please add != operator, implemented using the == operator.

Done.

>> +    bool operator< ( const strblob &s2) const { return strncmp(thePtr,s2.thePtr,theLen) < 0; }
>
> This will crash if s2.theLen is smaller than this->theLen. Needs more
> len-specific checks. Note that for your current purposes, you can
> declare shorter strings smaller even if they are not so in the dictionary.

Done.

> This is one reason why such comparison may need to be moved outside the
> object. Another reason is that we will probably want case-insensitive
> comparison in other contexts.

Can't be done without making the pointed-to string readable. Will do
if you confirm that you think that's the best course forward.

> Rename "s2" to "s" or similar.

Done.

>> +        const HttpHeaderCcFields *f=&CcAttrs[i];
>
> "f" can be a [const] reference to avoid taking the address and then
> dereferencing it all the time:
>
>    const HttpHeaderCcFields &f = CcAttrs[i];

Done.

>> +        const strblob tmpstr(item,nlen);
>> +        const CcNameToIdMap_t::iterator i=CcNameToIdMap.find(tmpstr);
>
> Consider s/tmpstr/lookup/ or some such. You can avoid this named object
> altogether if you just create strblob when calling find(). The iterator
> should be constant, I think. Here is a sketch:
>
>    const CcNameToIdMap_t::const_iterator match =
>        CcNameToIdMap.find(strblob(item, nlen));

std::map.find takes a const reference argument. With an unnamed
temporary it wouldn't compile.

>> +                cc->setMaxAge(MAX_AGE_UNSET);
>
> So it is set or unset? If you do not want to add an explicit clear
> MaxAge() method, how about s/MAX_AGE_UNSET/MAX_AGE_UNKNOWN/?

Sure.

> Similar for S_MAXAGE_UNSET, MIN_FRESH_UNSET, MAX_STALE_ALWAYS, etc. (see
> the blob below for the full list).
>
>
>> +     static const int32_t MAX_AGE_UNSET=-1; //max-age is unset
>> +     static const int32_t S_MAXAGE_UNSET=-1; //s-maxage is unset
>> +     static const int32_t MAX_STALE_UNSET=-1; //max-stale is unset
>> +     static const int32_t MAX_STALE_ALWAYS=-2; //max-stale is set to no value
>> +     static const int32_t STALE_IF_ERROR_UNSET=-1; //stale_if_error is unset
>> +     static const int32_t MIN_FRESH_UNSET=-1; //min_fresh is unset
>
> The old types were "int". You are making them more specific now with
> "int32_t". I am not sure such a change is withing this patch scope, but
> even if it is, perhaps you should add a HttpCcDirective typedef or
> similar to avoid spreading int32_t throughout the code.

I'm simply refactoring what's in there. They used to be initialized to
-1, as a duplicate validity check to mask (see below for more on
mask). even worse, they are (in trunk) tested inconsistently:
sometimes with EBIT_TEST on mask, sometimes with checks on >=0 on the
value.
trunk's HttpHdrCc::parse has the clearest picture on the current conventions:
for bool (present/not present) header values, EBIT_SET(mask) is used
for integer header values, EBIT_SET(mask) + assigning value is used.
In case of parse error on the value, both are cleared
for optional-integer header values, EBIT_SET(mask) + assigning value
is used. The special "-1" value means both "not set" and "set but not
specified", to tell the difference the mask is used.

What I've done is to define explicit constants to tell the three cases
apart without needing to look up the map, and used getters/setters for
those headers in order to enfoce consistency. Other headers are still
set by looking at mask (although I'm inclined to define a generic
get/set combo for bool values)

> Can the above constants be converted into an enum?

I don't think they should. But the code allows for that now.

>> +        EBIT_CLR(mask, CC_MIN_FRESH);
>> +        this->min_fresh=STALE_IF_ERROR_UNSET;
>
> Consistency typo.

Thanks

>> +public:
> ...
>> +    int32_t mask;
>> +private:
>> +    int32_t max_age;
>> +    int32_t s_maxage;
> ...
>
> Seems wrong for the mask to remain public if we hid most of the values.

Discussed above.

>
>> +    int32_t mask;
>> +private:
>> +    int32_t max_age;
>> +    int32_t s_maxage;
>> +    int32_t max_stale;
>> +    int32_t stale_if_error;
>> +    int32_t min_fresh;
>> +public:
>> +    String other;
>
> It would be nice to document at least the mask and Other fields.

Done.

>> +    bool gotList;
>
> Please declare when first use and add const if possible.

Done.

>> -## acl needs wordlist. wordlist needs MemBug
>> +## acl needs wordlist. wordlist needs MemBuf
>
> An unrelated change?

Typo-fix. Reverted.

>> -        if (cc && cc->min_fresh > 0) {
>> +        const int32_t minFresh=cc->getMinFresh();
>> +        if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {
>
> This change smells like a nil pointer dereference to me, but perhaps the
> missing context makes it OK?

No, it does not. Fixed.

>> +        if (cc && minFresh!=HttpHdrCc::MIN_FRESH_UNSET) {
> ...
>> +             entry->mem_obj->getReply()->cache_control->getStaleIfError() != HttpHdrCc::STALE_IF_ERROR_UNSET &&
> ...
>> +            if (cc->getMaxAge() != HttpHdrCc::MAX_AGE_UNSET) {
>
> These and other similar comparisons now look even worse than they were.
> How about adding HttpHdrCc::hasMaxAge() and such?

Could be done, maybe reintroducing the mask tests?

API would then tentatively be
class HttpHdrCc {
  //generic getters/setters, for bool header values
  set(http_hdr_enum);
  clear(http_hdr_enum);
  bool get ((http_hdr_enum);

  //specific getters/setters, for integer values. Some special values
need to remain though.
  getMinFresh()
  setMinFresh()
  clearMinFresh()

  String other;
}

>>> And if you want to mix optimization with cleanup, here is one place
>>> where you can save quite a few cycles for many messages by not ignoring
>>> the return value of getList():
>>>
>>>>      getList(HDR_CACHE_CONTROL, &s);
>>>>
>>>> -    cc = httpHdrCcParseCreate(&s);
>>>> +    cc=new HttpHdrCc();
>>>> +    if (!cc->parseInit(s)) {
>>>> +        delete cc;
>>>> +        cc = NULL;
>>>> +    }
>
>> Done.
>
>> +    bool gotList;
>>
>>      if (!CBIT_TEST(mask, HDR_CACHE_CONTROL))
>>          return NULL;
>>      PROF_start(HttpHeader_getCc);
>>
>> -    getList(HDR_CACHE_CONTROL, &s);
>> -
>> -    cc = httpHdrCcParseCreate(&s);
>> +    gotList=getList(HDR_CACHE_CONTROL, &s);
>> +
>> +    cc=new HttpHdrCc();
>> +    if (!gotList)
>> +        return cc;
>> +
>> +    if (!cc->parse(s)) {
>> +        delete cc;
>> +        cc = NULL;
>> +    }
>
>
> Hmm.. I did not see that we already have a HDR_CACHE_CONTROL test above
> that code. The new code now optimizes a rare case of an empty but
> present Cache-Control header. This can only lead to bugs, with no
> positive performance changes. We should remove this exception. Sorry.
>
> (One more reason to provide more than 3 lines of context during review!)

Unfortunately "bzr send" doesn't allow to specify it, that I know of :(
I've checked web and documentation, but I can't find of any
alternative way to sanely produce the patch.

>>>> -HttpHeaderFieldInfo *CcFieldsInfo = NULL;
>>>> +/// Map an header name to its type, to expedite parsing
>>>> +typedef std::map<String,http_hdr_cc_type> HdrCcNameToIdMap_t;
>>>> +static HdrCcNameToIdMap_t HdrCcNameToIdMap;
>>>
>>> HdrCcNameToIdMap should be constant so that we do not accidentally
>>> modify it. You can make it a pointer and use similar-to-current
>>> initialization code or try to initialize statically.
>>
>> Are you sure we run such a risk? I've now changed it so that it is a
>> static data member with const key. Do you think it could be enough?
>
> I was not worried about somebody modifying the key specifically. I was
> more worried about somebody modifying the map (adding, removing, or
> changing its elements). Thus, a constant key does not help much. This is
> not a big deal though.

Ok

>>>> +class HttpHdrCc
>>>> +{
>>>> +
>>>> +public:
>>>> +    int32_t mask;
>>>> +    int32_t max_age;
>>>> +    int32_t s_maxage;
>>>> +    int32_t max_stale;
>>>> +    int32_t stale_if_error;
>>>> +    int32_t min_fresh;
>>>> +    String other;
>>>> +
>>>> +    HttpHdrCc(int32_t max_age_=-1, int32_t s_maxage_=-1,
>>>
>>> Please move public data member after public methods. It would be nice to
>>> document them too, especially those that do not correspond to CC
>>> directives. Also, check whether we really need all of the above,
>>> especially mask, as public fields (more on that later below).
>>
>> Done. Documented "mask" and "other", the others should be straightforward.
>> public mask could be avoided if we do full getters/setters, but doing
>> that elegantly would be scope-balooning.
>
> Not done? And you did "balloon the scope" by adding full
> getters/setters, removing your excuse to leave the mask exposed, right?

Mask is widely used for all non-integer header values.
See proposed API above.

>>>> +    HttpHdrCc(int32_t max_age_=-1, int32_t s_maxage_=-1,
>>>> +            int32_t max_stale_=-1, int32_t min_fresh_=-1) :
>>>
>>> Do we really all five of these constructors, including the one that
>>> allows for implicit conversion from int32_t to HttpHdrCc? I do not think
>>> so. Please leave just what we need and use "explicit" to prevent
>>> implicit conversions as needed.
>>
>> Overengineering. Changed to only the default contstructor (we need it
>> to set various fields to "-1" by default).
>> Ok for explicit.
>
> An explicit keyword does not make sense for a default constructor, does
> it? Not sure why it compiles for you, but I would recommend removing
> that keyword because your current code does not allow implicit type
> conversions.

Not allowing implicit conversions was the objective, and it compiles.
But I'm removing it anyways.

>>>> +    /** reset the structure to a clear state.
>>>> +     *
>>>> +     */
>>>> +    void clear();
>>>
>>> The "clear state" is an undefined concept. Consider a more specific
>>> description such as:
>>> /// reset to the after-default-construction state
>>>
>>> (and there is probably a more elegant way of saying that).
>>
>> How about "reset all data members to default state"? Otherwise I'll
>> use your suggestion.
>
> I can leave with either one, but "reset to default state" is probably best.
>
> BTW, the latest patch lost documentation for this (and some other?) members.

Re-added.
Attached is iteration 4.
Branch tree is at lp:~kinkie/squid/playground

Thanks,

-- 
    /kinkie

Received on Tue Sep 27 2011 - 06:23:39 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 27 2011 - 12:00:03 MDT