#include <Tokenizer.h>

Collaboration diagram for Parser::Tokenizer:

Public Member Functions

 Tokenizer (const SBuf &inBuf)
 
SBuf buf () const
 yet unparsed data More...
 
SBuf::size_type parsedSize () const
 number of parsed bytes, including skipped ones More...
 
bool atEnd () const
 whether the end of the buffer has been reached More...
 
const SBufremaining () const
 the remaining unprocessed section of buffer More...
 
void reset (const SBuf &newBuf)
 reinitialize processing for a new buffer More...
 
bool token (SBuf &returnedToken, const CharacterSet &delimiters)
 
bool prefix (SBuf &returnedToken, const CharacterSet &tokenChars, SBuf::size_type limit=SBuf::npos)
 
bool suffix (SBuf &returnedToken, const CharacterSet &tokenChars, SBuf::size_type limit=SBuf::npos)
 
bool skipSuffix (const SBuf &tokenToSkip)
 
bool skip (const SBuf &tokenToSkip)
 
bool skip (const char tokenChar)
 
bool skipOne (const CharacterSet &discardables)
 
SBuf::size_type skipAll (const CharacterSet &discardables)
 
bool skipOneTrailing (const CharacterSet &discardables)
 
SBuf::size_type skipAllTrailing (const CharacterSet &discardables)
 
bool int64 (int64_t &result, int base=0, bool allowSign=true, SBuf::size_type limit=SBuf::npos)
 
SBuf prefix (const char *description, const CharacterSet &tokenChars, SBuf::size_type limit=SBuf::npos)
 
int64_t udec64 (const char *description, SBuf::size_type limit=SBuf::npos)
 int64() wrapper but limited to unsigned decimal integers (for now) More...
 

Protected Member Functions

SBuf consume (const SBuf::size_type n)
 convenience method: consumes up to n bytes, counts, and returns them More...
 
SBuf::size_type success (const SBuf::size_type n)
 convenience method: consume()s up to n bytes and returns their count More...
 
SBuf consumeTrailing (const SBuf::size_type n)
 convenience method: consumes up to n last bytes and returns them More...
 
SBuf::size_type successTrailing (const SBuf::size_type n)
 convenience method: consumes up to n last bytes and returns their count More...
 
void undoParse (const SBuf &newBuf, SBuf::size_type cParsed)
 reset the buffer and parsed stats to a saved checkpoint More...
 

Private Attributes

SBuf buf_
 yet unparsed input More...
 
SBuf::size_type parsed_
 bytes successfully parsed, including skipped More...
 

Detailed Description

Lexical processor to tokenize a buffer.

Allows arbitrary delimiters and token character sets to be provided by callers.

All methods start from the beginning of the input buffer. Methods returning true consume bytes from the buffer. Methods returning false have no side-effects.

Definition at line 29 of file Tokenizer.h.

Constructor & Destructor Documentation

◆ Tokenizer()

Parser::Tokenizer::Tokenizer ( const SBuf inBuf)
inlineexplicit

Definition at line 32 of file Tokenizer.h.

Member Function Documentation

◆ atEnd()

◆ buf()

SBuf Parser::Tokenizer::buf ( ) const
inline

Definition at line 35 of file Tokenizer.h.

References buf_.

Referenced by getNfmark(), and testTokenizer::testTokenizerInt64().

◆ consume()

SBuf Parser::Tokenizer::consume ( const SBuf::size_type  n)
protected

Definition at line 24 of file Tokenizer.cc.

References buf_, SBuf::consume(), debugs, SBuf::length(), and parsed_.

Referenced by prefix(), reset(), success(), and token().

◆ consumeTrailing()

SBuf Parser::Tokenizer::consumeTrailing ( const SBuf::size_type  n)
protected

Definition at line 42 of file Tokenizer.cc.

References buf_, SBuf::consume(), debugs, SBuf::length(), SBuf::npos, and parsed_.

Referenced by reset(), successTrailing(), and suffix().

◆ int64()

bool Parser::Tokenizer::int64 ( int64_t &  result,
int  base = 0,
bool  allowSign = true,
SBuf::size_type  limit = SBuf::npos 
)

Extracts an unsigned int64_t at the beginning of the buffer.

strtoll(3)-alike function: tries to parse unsigned 64-bit integer at the beginning of the parse buffer, in the base specified by the user or guesstimated; consumes the parsed characters.

Parameters
resultOutput value. Not touched if parsing is unsuccessful.
baseSpecify base to do the parsing in, with the same restrictions as strtoll. Defaults to 0 (meaning guess)
allowSignWhether to accept a '+' or '-' sign prefix.
limitMaximum count of characters to convert.
Returns
whether the parsing was successful

Definition at line 228 of file Tokenizer.cc.

References atEnd(), buf_, INT64_MAX, INT64_MIN, SBuf::rawContent(), SBuf::substr(), success(), xisalpha, xisdigit, and xisupper.

Referenced by ProxyProtocol::One::ExtractPort(), getNfmark(), GetOtherPid(), ProxyProtocol::IntegerToFieldType(), Security::PeerOptions::parseOptions(), reset(), testTokenizer::testTokenizerInt64(), udec64(), and Security::PeerOptions::updateTlsVersionLimits().

◆ parsedSize()

SBuf::size_type Parser::Tokenizer::parsedSize ( ) const
inline

Definition at line 38 of file Tokenizer.h.

References parsed_.

Referenced by ProxyProtocol::Parse(), ProxyProtocol::One::Parse(), and Ftp::Server::parseOneRequest().

◆ prefix() [1/2]

bool Parser::Tokenizer::prefix ( SBuf returnedToken,
const CharacterSet tokenChars,
SBuf::size_type  limit = SBuf::npos 
)

Extracts all sequential permitted characters up to an optional length limit.

Note that Tokenizer cannot tell whether the prefix will continue when/if more input data becomes available later.

Return values
trueone or more characters were found, the sequence (string) is placed in returnedToken
falseno characters from the permitted set were found

Definition at line 81 of file Tokenizer.cc.

References atEnd(), buf_, consume(), debugs, SBuf::findFirstNotOf(), CharacterSet::name, SBuf::npos, and SBuf::substr().

Referenced by ProxyProtocol::One::ExtractIp(), Ftp::Server::handleFeatReply(), mainHandleCommandLineOption(), ProxyProtocol::One::Parse(), ProxyProtocol::One::ParseAddresses(), Ftp::Server::parseOneRequest(), Security::PeerOptions::parseOptions(), parseQuotedStringSuffix(), prefix(), reset(), testTokenizer::testTokenizerPrefix(), testTokenizer::testTokenizerSkip(), and Http::One::tokenOrQuotedString().

◆ prefix() [2/2]

SBuf Parser::Tokenizer::prefix ( const char *  description,
const CharacterSet tokenChars,
SBuf::size_type  limit = SBuf::npos 
)

prefix() wrapper but throws InsufficientInput if input contains nothing but the prefix (i.e. if the prefix is not "terminated")

Definition at line 102 of file Tokenizer.cc.

References atEnd(), prefix(), TexcHere, and ToSBuf().

◆ remaining()

◆ reset()

void Parser::Tokenizer::reset ( const SBuf newBuf)
inline

◆ skip() [1/2]

◆ skip() [2/2]

bool Parser::Tokenizer::skip ( const char  tokenChar)

skips a given single character

Returns
whether the character was skipped

Definition at line 190 of file Tokenizer.cc.

References buf_, debugs, SBuf::isEmpty(), and success().

◆ skipAll()

SBuf::size_type Parser::Tokenizer::skipAll ( const CharacterSet discardables)

Skips all sequential characters from the set, in any order.

Returns
the number of skipped characters

Definition at line 139 of file Tokenizer.cc.

References buf_, debugs, SBuf::findFirstNotOf(), CharacterSet::name, and success().

Referenced by Ftp::Server::calcUri(), Ftp::Server::handleFeatReply(), Http::One::ParseBws(), Ftp::Server::parseOneRequest(), Security::PeerOptions::parseOptions(), prepareAcceleratedURL(), reset(), testTokenizer::testTokenizerSkip(), and token().

◆ skipAllTrailing()

SBuf::size_type Parser::Tokenizer::skipAllTrailing ( const CharacterSet discardables)

Removes all sequential trailing characters from the set, in any order.

Returns
the number of characters removed

Definition at line 212 of file Tokenizer.cc.

References buf_, debugs, SBuf::findLastNotOf(), SBuf::length(), CharacterSet::name, SBuf::npos, and successTrailing().

Referenced by Auth::SchemesConfig::expand(), and reset().

◆ skipOne()

bool Parser::Tokenizer::skipOne ( const CharacterSet discardables)

Skips a single character from the set.

Returns
whether a character was skipped

Definition at line 151 of file Tokenizer.cc.

References buf_, debugs, SBuf::isEmpty(), CharacterSet::name, and success().

Referenced by ProxyProtocol::FieldNameToFieldType(), GetOtherPid(), Security::PeerOptions::parseFlags(), reset(), and testTokenizer::testTokenizerSkip().

◆ skipOneTrailing()

bool Parser::Tokenizer::skipOneTrailing ( const CharacterSet discardables)

Removes a single trailing character from the set.

Returns
whether a character was removed

Definition at line 201 of file Tokenizer.cc.

References buf_, debugs, SBuf::isEmpty(), SBuf::length(), CharacterSet::name, and successTrailing().

Referenced by reset(), and testTokenizer::testTokenizerSuffix().

◆ skipSuffix()

bool Parser::Tokenizer::skipSuffix ( const SBuf tokenToSkip)

skips a given suffix character sequence (string) Operates on the trailing end of the buffer.

Note that Tokenizer cannot tell whether the buffer will gain more data when/if more input becomes available later.

Returns
whether the exact character sequence was found and skipped

Definition at line 162 of file Tokenizer.cc.

References buf_, SBuf::cmp(), debugs, SBuf::length(), SBuf::npos, SBuf::substr(), and successTrailing().

Referenced by ConfigureCurrentKid(), reset(), and testTokenizer::testTokenizerSuffix().

◆ success()

SBuf::size_type Parser::Tokenizer::success ( const SBuf::size_type  n)
protected

Definition at line 35 of file Tokenizer.cc.

References consume(), and SBuf::length().

Referenced by int64(), reset(), skip(), skipAll(), and skipOne().

◆ successTrailing()

SBuf::size_type Parser::Tokenizer::successTrailing ( const SBuf::size_type  n)
protected

Definition at line 57 of file Tokenizer.cc.

References consumeTrailing(), and SBuf::length().

Referenced by reset(), skipAllTrailing(), skipOneTrailing(), and skipSuffix().

◆ suffix()

bool Parser::Tokenizer::suffix ( SBuf returnedToken,
const CharacterSet tokenChars,
SBuf::size_type  limit = SBuf::npos 
)

Extracts all sequential permitted characters up to an optional length limit. Operates on the trailing end of the buffer.

Note that Tokenizer cannot tell whether the buffer will gain more data when/if more input becomes available later.

Return values
trueone or more characters were found, the sequence (string) is placed in returnedToken
falseno characters from the permitted set were found

Definition at line 119 of file Tokenizer.cc.

References buf_, SBuf::consume(), consumeTrailing(), i, SBuf::length(), SBuf::rbegin(), and SBuf::rend().

Referenced by ConfigureCurrentKid(), reset(), and testTokenizer::testTokenizerSuffix().

◆ token()

bool Parser::Tokenizer::token ( SBuf returnedToken,
const CharacterSet delimiters 
)

Basic strtok(3): Skips all leading delimiters (if any), extracts all characters up to the next delimiter (a token), and skips all trailing delimiters (at least one must be present).

Want to extract delimiters? Use prefix() instead.

Note that Tokenizer cannot tell whether the trailing delimiters will continue when/if more input data becomes available later.

Returns
true if found a non-empty token followed by a delimiter

Definition at line 63 of file Tokenizer.cc.

References buf_, consume(), DBG_DATA, debugs, SBuf::findFirstOf(), CharacterSet::name, SBuf::npos, and skipAll().

Referenced by AppendTokens(), Auth::SchemesConfig::expand(), reset(), and testTokenizer::testTokenizerToken().

◆ udec64()

int64_t Parser::Tokenizer::udec64 ( const char *  description,
SBuf::size_type  limit = SBuf::npos 
)

Definition at line 306 of file Tokenizer.cc.

References atEnd(), int64(), TexcHere, and ToSBuf().

Referenced by reset().

◆ undoParse()

void Parser::Tokenizer::undoParse ( const SBuf newBuf,
SBuf::size_type  cParsed 
)
inlineprotected

Definition at line 166 of file Tokenizer.h.

References buf_, and parsed_.

Referenced by reset().

Member Data Documentation

◆ buf_

◆ parsed_

SBuf::size_type Parser::Tokenizer::parsed_
private

Definition at line 170 of file Tokenizer.h.

Referenced by consume(), consumeTrailing(), parsedSize(), and undoParse().


The documentation for this class was generated from the following files:

 

Introduction

Documentation

Support

Miscellaneous

Web Site Translations

Mirrors