Re: StringNG merge?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 12 Nov 2012 13:05:07 +1300

I've run the maintenance script on this branch and it found quite a few
bits of bad code formatting. Please apply the attached patch. It is all
whitespace corrections except one bad include of squid.h and some #if
protection macro name problems.

On 12.11.2012 07:53, Kinkie wrote:
>> I found the thread discussing these problems. It looks like the
>> conclusion was that StringNG cannot support BodyPipe or similar I/O
>> patterns efficiently, but can still be useful elsewhere.
>
> It can probably be used as the foundation for that (e.g. via a
> circular list of SBufs). But that will come later; worst case, we can
> still use MemBuf for that.
>
>>> +SBuf&
>>> +SBuf::append(const char * S, size_type pos, size_type n)
>>> +{
>>> + Must(pos==npos || pos >= 0);
>>> + Must(n==npos || n >= 0);
>>> +
>>> + //acts as "the number of bytes to copy"
>>> +
>>
>> I do not understand what that comment is about. Is it out of date or
>> misplaced?
>
> Out of date. Removed.
>
>>> +SBufList::SBufList()
>>> +{
>>> + //nothing to do really..
>>> +}
>>> +
>>> +SBufList::~SBufList()
>>> +{
>>> + //nothing to do really..
>>> +}
>>> +
>>
>> Please remove these if they do nothing, especially since your
>> destructor
>> adds a virtual table to this class.
>
> Ok, removed.
>
>>> +/** split a SBuf into a list
>>> + *
>>> + * split a SBuf into a list, using any of the supplied delimiters
>>> as a
>>> + * separator.
>>> + * The separators are left at the BEGINNING of the split stings.
>>> + * For example: SBufSplit(SBuf("Foo Bar,Gazonk"),SBuf(" ,")) will
>>> return a list
>>> + * containing
>>> + * {SBuf("Foo"), SBuf(" Bar"), SBuf(",Gazonk")}
>>> + */
>>> +SBufList SBufSplit (SBuf tosplit, const SBuf & delimiters);
>>
>> AFAICT, this does not split "tosplit". It creates a list while
>> leaving
>> caller;s tosplit intact.
>> Why are we adding this strange function? I cannot find where it is
>> used.
>
> it's similar to PERL's split() function in purpose, and it can be
> used
> to create simple tokenizers.
>

It seems useful functionality by itself "But where?" is the big
question.

The big parsers this appears useful on at first thought are HTTP
headers field values, config lines, helper I/O. However on closer
inspection we have a few problems with the split() semantics:

* split(..., " ") is too simple for handling the " toggling optional
ignore for SP and will tokenize inside a quoted-string. Meaning we
cannot use split() on any string which *might* contain even one
quoted-string value.

  - config file lines need to be converted to accept quoted-string
tokens. This is an old TODO and we should not be adding functionality to
the config parser which makes it more difficult to achieve.

  - HTTP header have mandatory token / quoted-string for unknown field
values. Any header containing field-value extensibility cannot be parsed
with split().

  - even the helper protocol is being converted to accept quoted-string
now. So that usage is going to disappear as well before SBuf becomes
very useful.

* split() processes the whole line, rather than just the next single
token. Meaning we cannot use it for tokenizing just one field-value
unless that field-value has already been through 1+ pass of parsing
already to find its StringArea. Which in turn means it is useless for
making a single-pass parser like we want to.

IMO: The SBufTokenizer is likely to be all we need and far more useful.

>>> +class StrList
>>> +{
>>> +public:
>>> + StrList() { }
>>
>> This class is missing a description.
>
> It's an intermediate step, its purpose is to provide a path for
> disambiguating String-used-as-a-list by the various strList*
> functions.
> It can be omitted from the merge.
>
>>> + StrList(SBufList s) {
>>> + const SBuf sep(", ");
>>> + SBuf b=SBufListJoin(s,sep);
>>> + data_=b.toString();
>>> + }
>>> + StrList(const char *s) {
>>> + data_=s;
>>> + }
>>> + StrList (SBuf s) {
>>
>> These single-parameter constructors are probably missing "explicit"
>> keyword to prevent accidental (and expensive!) conversions. Please
>> also
>> check whether all of them need to copy their actual arguments
>> instead of
>> using references.
>
> Added.
> They do. StrList an intermediate step, to find and collecr all the
> client code which does implicit string-to-strlist conversions.
> It's a side project, which can be skipped for now.

Even a note to say that is useful to prevent us trying to use it for
something before completion. Also, experiments should not be left in the
code submitted for audit unless it is to be audited - it wastes
everybodies time reading the diff. If you want it to be audited and
committed fine - but then it cannot be skipped. Or are you saying you
are happy with dropping it out of the branch?

Asside: One of the things about merge is that it pulls in *all* of the
revisions from the merged branch, or a sequence from N..M.
Cherry-picking is going to be difficult on this project merge I think.
The depends on how you were going to cherrypick it though. (cd trunk ;
bzr merge src/SBuf* ../StringNG) is possible - I'm not sure that picks
up the revision history properly, but I have not done it on a branch
using the 2a format repositories yet. The other merge forms will be
rather messy unless you revert the experimental bits like this and
shelve them for later re-commit to the branch after the trunk merges.

Back to the audit...

All over:

* The HERE changes are in trunk. Please remove all the new code
debugs() mentioning of what function name they are being called from.
  + SBuf.cc and SBufExtras.cc seems to be threaded with "MYNAME" and
scope, the other files with hard-coded function names.

SBufTokenizer.*

* can we inline most of these methods? (as proper inline definitions
rather than using .cci)

src/base/TextException.cc:
  * the xstrdup() crash protection change here is a bug fix. Please
apply to trunk asap and separately.

That is all from me for now.

Amos

Received on Mon Nov 12 2012 - 00:05:12 MST

This archive was generated by hypermail 2.2.0 : Wed Nov 14 2012 - 12:00:07 MST