Re: [RFC] Tokenizer API

From: Kinkie <gkinkie_at_gmail.com>
Date: Tue, 10 Dec 2013 11:07:57 +0100

I'd need to check.
To align the others to our IRC discussion: I'm fine with the design
Alex and you are suggesting.
My only suggestions are to have remaining() return SBuf instead of
const SBuf &, and to have a few predefined CharacterSets.

On Tue, Dec 10, 2013 at 10:52 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 10/12/2013 9:38 p.m., Robert Collins wrote:
>> On 10 December 2013 19:13, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>>
>>> The problem with comparing input strings to a SBuf of characters is that
>>> parsing a input of length N againt charset of size M takes O(N*M) time.
>>
>> Huh? There are linear time parsers with PEGs. Or maybe I don't
>> understand one of your preconditions to come up with an N*M complexity
>> here.
>
> The brute-force way is to scan each N position for the M delimiters
> individually.
> --> SBuf uses while(0..N) {memchr(M)} ... O(N*M)
>
>
> PS. I suspect the linear time parsers you know of are converting the PEG
> into bool-array first before doing linear time scan with it or using
> multiple CPU threads to do a parallel linear scan on small M.
>
> Amos
>

-- 
    /kinkie
Received on Tue Dec 10 2013 - 10:08:06 MST

This archive was generated by hypermail 2.2.0 : Fri Dec 13 2013 - 12:00:10 MST