Re: [RFC] Tokenizer API

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 10 Dec 2013 22:52:27 +1300

On 10/12/2013 9:38 p.m., Robert Collins wrote:
> On 10 December 2013 19:13, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>
>> The problem with comparing input strings to a SBuf of characters is that
>> parsing a input of length N againt charset of size M takes O(N*M) time.
>
> Huh? There are linear time parsers with PEGs. Or maybe I don't
> understand one of your preconditions to come up with an N*M complexity
> here.

The brute-force way is to scan each N position for the M delimiters
individually.
 --> SBuf uses while(0..N) {memchr(M)} ... O(N*M)

PS. I suspect the linear time parsers you know of are converting the PEG
into bool-array first before doing linear time scan with it or using
multiple CPU threads to do a parallel linear scan on small M.

Amos
Received on Tue Dec 10 2013 - 09:52:35 MST

This archive was generated by hypermail 2.2.0 : Tue Dec 10 2013 - 12:00:10 MST