stator
A math, geometry, and utility library
Classes | Public Member Functions | Public Attributes | List of all members
sym::detail::ExprTokenizer Class Reference

Implementation of expression tokenization and parsing into Expr types. More...

#include <parser.hpp>

Classes

struct  BinaryOpToken
 
struct  HaltToken
 
struct  LeftOperatorBase
 
struct  ParenthesisToken
 
struct  RightOperatorBase
 
struct  SkipToken
 
struct  UnaryNegative
 
struct  UnaryOpToken
 

Public Member Functions

 ExprTokenizer (const std::string &str)
 
void consume ()
 
void consumeFloat ()
 
bool empty () const
 
void expect (std::string token)
 
std::string next ()
 
Expr parseExpression (int minLBP=0)
 Main parsing entry function. More...
 
std::string parserLoc ()
 
Expr parseToken ()
 Parses a single token (unary/prefix op, variable, or number), where precedence issues do not arise. More...
 

Public Attributes

std::map< std::string, shared_ptr< LeftOperatorBase > > _left_operators
 
std::map< std::string, shared_ptr< RightOperatorBase > > _right_operators
 

Detailed Description

The structure of the code:

Finally, the binary operator precedence and parsing takes place in .

The implementation is largely based around the pseudocode implementations of Theodore Norvell from here and in particular here.

The algorithm is implemented over four key functions:

  1. Initialisation of all operator definitions takes place in the constructor ExprTokenizer::ExprTokenizer.
  2. Expression strings are first broken down into tokens. Tokens are substrings such as "2", "*", "sin", or "(". ExprTokenizer::next yields the current token, and the system is moved onto the next using ExprTokenizer::consume (where the actual tokenization takes place).

Parsing of "leaves" of the Abstract Syntax Tree (AST), such as variables, numbers, including functions/prefix-operators are handled via ExprTokenizer::parseToken. Functions and prefix-operators may contain sub-trees and these are parsed recursively.

  1. Full expression strings/trees are parsed via ExprTokenizer::parseExpression. Its main purpose is to resolve binary operator precedence.

    The hardest part for me to understand from the work of Theodore Norvell was how the ExprTokenizer::parseExpression function created the AST via recursion and "precedence climbing". My notes on this are available in the documentation for ExprTokenizer::parseExpression.

Definition at line 56 of file parser.hpp.

Constructor & Destructor Documentation

◆ ExprTokenizer()

sym::detail::ExprTokenizer::ExprTokenizer ( const std::string &  str)
inline

Definition at line 58 of file parser.hpp.

Member Function Documentation

◆ consume()

void sym::detail::ExprTokenizer::consume ( )
inline

Definition at line 129 of file parser.hpp.

◆ consumeFloat()

void sym::detail::ExprTokenizer::consumeFloat ( )
inline

Definition at line 167 of file parser.hpp.

◆ empty()

bool sym::detail::ExprTokenizer::empty ( ) const
inline

Definition at line 125 of file parser.hpp.

◆ expect()

void sym::detail::ExprTokenizer::expect ( std::string  token)
inline

Definition at line 119 of file parser.hpp.

◆ next()

std::string sym::detail::ExprTokenizer::next ( )
inline

Definition at line 115 of file parser.hpp.

◆ parseExpression()

Expr sym::detail::ExprTokenizer::parseExpression ( int  minLBP = 0)
inline
  • minLBP The minimum binding power of an operator that this call can collect. A value of zero collects all operators.

This function handles all cases where operator precedence arises.

Expressions are streams of tokens. As the tokens are read left-to-right, we always begin with the leftmost leaf of the AST. The first call to parseExpression must then climb UP the AST to its top. The variable maxLBP ensures that each call to parseExpression only climbs up by ensuring that operators with higher precendence than p are handled by a recursion, possibly to become left leafs of a lower-precedence operator higher up the AST (low precedence operators always end up high up the AST). The variable minLBP is used to make sure that each recursive call to parseExpression does not climb above its parent parseExpression call.

There are three types of precedence used.

Left Binding Precedence (LBP) and Right Binding Precedence (RBP) are used to determine which operator binds to an argument between two operators. For example, in the expression 2*3+4, the multiply binds to the 3 as its RBP is higher than the addition's LBP. If the LBP and RBP are equal, the right operator "wins". Thus operators who are left-associative must have their RBP > LBP, and right-associative operators must have RBP <= LBP.

The last argument is the Next Binding Precedence (NBP). As parseExpression climbs up the AST, the NBP of each operator is used to determine the maximum LBP which will cause the parseExpression to replace its current root with a higher level. Generally, an operator must have NBP<=LBP. For left-associative operators we have NBP=LBP, and for right associative we have NBP<LBP, as we are moving from left to right along the tree and thus only left-associative operators allow us to ascend (which we do at each repeated left-associative operator).

Definition at line 388 of file parser.hpp.

◆ parserLoc()

std::string sym::detail::ExprTokenizer::parserLoc ( )
inline

Definition at line 215 of file parser.hpp.

◆ parseToken()

Expr sym::detail::ExprTokenizer::parseToken ( )
inline

Definition at line 315 of file parser.hpp.

Member Data Documentation

◆ _left_operators

std::map<std::string, shared_ptr<LeftOperatorBase> > sym::detail::ExprTokenizer::_left_operators

Definition at line 309 of file parser.hpp.

◆ _right_operators

std::map<std::string, shared_ptr<RightOperatorBase> > sym::detail::ExprTokenizer::_right_operators

Definition at line 310 of file parser.hpp.


The documentation for this class was generated from the following file: