Re: [PATCH] Update SBuf::trim

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 04 Jun 2014 10:24:01 -0600

On 06/04/2014 09:26 AM, Amos Jeffries wrote:
> On 4/06/2014 2:48 a.m., Alex Rousskov wrote:
>> On 06/03/2014 08:22 AM, Amos Jeffries wrote:
>>> On 4/06/2014 1:08 a.m., Alex Rousskov wrote:
>>>> On 06/03/2014 04:46 AM, Amos Jeffries wrote:
>>>>
>>>>> This replaces the SBuf::trim() to use CharacterSet instead of an SBuf()
>>>>> list of characters and memchr()
>>>>>
>>>>> It seems to be faster for CharacterSet lookup than repeated memchr
>>>>> calls, but Im not certain of that. It is certainly makes simpler parser
>>>>> code with trim and a predefined CharacterSet than static SBuf set of chars.
>>>>
>>>> I agree that CharacterSet membership test should be faster than repeated
>>>> memchr() calls.
>>>>
>>>> No objections to this patch, although I suspect that any code calling
>>>> SBuf::trim() should actually use Tokenizer instead; we may be optimizing
>>>> code that should not be used in the first place.
>>>
>>> The use-case that brought this up is scanning mime header lines.
>>>
>>> Tokenizer tok(...);
>>> SBuf line;
>>> while (tok.prefix(line, CharacterSet::LF)) {
>>> // drop optional trailing CR* sequence
>>> line.trim(CharacterSet::CR, false, true);
>>
>> The above does not make sense to me: After tok.prefix(), "line" will
>> contain LF characters only. There will be no CR characters to trim.
>
> Sorry, that is inverted. CharacterSet(not-LF)

The code makes sense then. If you are trying to optimize parsing, please
note that the above implies parsing the same "line" at least twice,
first to find LF and then to split the line into field name and field
value. It is possible to optimize this further to eliminate double
parsing (and remove the trim() call) but it requires more Tokenizer work.

Alex.
Received on Wed Jun 04 2014 - 16:24:08 MDT

This archive was generated by hypermail 2.2.0 : Wed Jun 04 2014 - 12:00:13 MDT