there’s a saying that all problems in computer science / programming can be solved by another level of indirection. in this chapter the pratt parser is a great example of that when it comes to parsing expressions such as
- simple numeric literals i.e 1 or 2
- single operand / prefix expressions like -1
- binary expressions like 1 * 2 involving numeric, equality, comparison, or logical operators
- any complex combination of the above with groupings
back in jlox, expression parsing was based on recursive descent expressions recursive descent. in this chapter, the parse sequence is driven by a special function called parsePrecedence
. two new abstractions (the parse rule table and the rule lookup function) come together in the parsePrecedence
function which is going to be the new entry point to expression parsing
static void parsePrecedence(Precedence precedence) {
advance();
ParseFn prefixRule = getRule(parser.previous.type)->prefix;
if (prefixRule == NULL) {
error("Expect expression.");
return;
}
bool canAssign = precedence <= PREC_ASSIGNMENT;
prefixRule(canAssign);
while (precedence <= getRule(parser.current.type)->precedence) {
advance();
ParseFn infixRule = getRule(parser.previous.type)->infix;
infixRule(canAssign);
}
if (canAssign && match(TOKEN_EQUAL)) {
error("Invalid assignment target.");
}
}
here’s a truncated example of some parse rules in our parse table. it’s a mapping of token types to a group of metadata (prefix parser, infix parser, and precedence level)
ParseRule rules[] = {
[TOKEN_LEFT_PAREN] = {grouping, call, PREC_CALL},
[TOKEN_RIGHT_PAREN] = {NULL, NULL, PREC_NONE},
[TOKEN_MINUS] = {unary, binary, PREC_TERM},
[TOKEN_PLUS] = {NULL, binary, PREC_TERM},
[TOKEN_NUMBER] = {number, NULL, PREC_NONE},
};
unary
is the prefix parsing function for the minus token. binary
is the binary parsing function, and the precedence level of PREC_TERM
. this is the getRule
function that, given a token type, can retrieve that metadata
static ParseRule* getRule(TokenType type) {
return &rules[type];
}
what’s unique about this approach is
- the relevant parse function for a given token consumed via
advance
is fetched dynamically from the parse rule table. so given a token type ofNUMBER
forparser.previous.type
, the first thingparsePrecedence
attempts to do is locate theprefix
function for that token- other prefix functions may themselves call back to
parsePrecedence
such asgrouping
if a left parenthesis is encountered
- other prefix functions may themselves call back to
- for chained expressions involving infix operators i.e
1 + 2 + 3
, the current precedence level is used to continue consuming the following expressions in a left-associative manner. so parsing 1 + 2 + 3 becomes ((1 + 2) + 3) - addition of new tokens involves setting a new token rule for those tokens and their metadata (prefix operator, infix operator if it applies, and precedence level). the
parsePrecedence
function automatically obeys the precedence levels during parsing. in jlox, parsing precedence has to be carefully managed by ensuring that it’s reflected in the call sequence (top down execution where lower precedence parse functions calling higher precedence ones)
unlike recursive descent top down parsers where the syntax reflects both the grammar and precedence order (lower precedence parse targets always invoke higher precedence ones), it’s harder to visualize the call sequence in a pratt parser because the exact call sequence is only apparent during runtime through calls to parsePrecedence
(which decides how far to parse on the current precedence). nevertheless this seems like a more extensible / configurable way to manage expression rules