Jon Ross-Perkins ef0fa81a58 Upgrade clang-format version (#3471) hace 2 años
..
BUILD 8c28a0494e Add size="small" to test targets where advised (#3326) hace 2 años
README.md 309ec35f95 Rename executable_semantics to explorer (#1188) hace 4 años
bison_wrap.h c172487395 Support for declaring functions within a namespace (#2569) hace 3 años
format_grammar.py 309ec35f95 Rename executable_semantics to explorer (#1188) hace 4 años
format_grammar_test.py 309ec35f95 Rename executable_semantics to explorer (#1188) hace 4 años
lex_helper.h 8d0f3364d8 Initial implementation of raw string literals (#1304) hace 3 años
lex_scan_helper.cpp 8d0f3364d8 Initial implementation of raw string literals (#1304) hace 3 años
lex_scan_helper.h 8d0f3364d8 Initial implementation of raw string literals (#1304) hace 3 años
lexer.lpp 1505a634a4 Implement syntax changes from #2760 in Explorer (#2906) hace 2 años
parse.cpp 357baaeef8 Rename //explorer/common to base (#3103) hace 2 años
parse.h 357baaeef8 Rename //explorer/common to base (#3103) hace 2 años
parse_and_lex_context.cpp 357baaeef8 Rename //explorer/common to base (#3103) hace 2 años
parse_and_lex_context.h ef0fa81a58 Upgrade clang-format version (#3471) hace 2 años
parse_test.cpp e3d3122f1d Move tests to the namespace of the code under test (#3244) hace 2 años
parse_test_matchers.h 20728dbd3a CARBON_ header guards (#1261) hace 4 años
parse_test_matchers_internal.h 20728dbd3a CARBON_ header guards (#1261) hace 4 años
parser.ypp 357baaeef8 Rename //explorer/common to base (#3103) hace 2 años
prelude.cpp 9e03d5efe2 Change explorer's file_test to support argument passing to main. (#3032) hace 2 años
prelude.h 357baaeef8 Rename //explorer/common to base (#3103) hace 2 años
unimplemented_example_test.cpp e3d3122f1d Move tests to the namespace of the code under test (#3244) hace 2 años

README.md

The code in this directory is responsible for translating Carbon source code to the AST defined in ast. It consists primarily of a Flex lexer defined in lexer.lpp and a Bison grammar defined in parser.ypp.

It is possible to define and test a new expression syntax without defining its semantics by using the UnimplementedExpression AST node type and the same techniques can be applied to other kinds of AST nodes as needed. See the handling of the UNIMPL_EXAMPLE token for an example of how this is done, and see unimplemented_example_test.cpp for an example of how to test it.

Precedence and associativity

The Bison expression grammar uses the precedence climbing method to model precedence and associativity, suitably modified to handle Carbon's partial precedence order without grammar ambiguities.

Consider this example precedence diagram:

graph BT
    %%{init: {'themeVariables': {'fontFamily': 'monospace'}}}%%
    minus["minus<br>-x"]
    mul>"mul<br>x * y"]
    add>"add<br>x + y"]
    mod["mod<br>x % y"]
    eq["eq<br>x = y"]

    eq --> add & mod
    add --> mul
    mul & mod --> minus

For each precedence level, we have up to three grammar productions:

  • foo_expression represents an expression at that precedence level or higher, and includes as productions all of the expression kinds that are immediately higher in the precedence graph:

    add_expression:
      mul_expression | add_lhs '+' add_operand ;
    
  • foo_operand represents an operand of a foo_expression that is not itself a foo_expression.

    eq_operand:
      add_expression | mod_expression ;
    
  • For left-associative operators, foo_lhs represents either a foo_operand or a foo_expression.

    add_lhs:
      add_operand | add_expression ;
    

The above approach leads to (benign) reduce-reduce conflicts. In our example precedence diagram, the expression -x == y has two different parses:

  • _eqexpression
    • _eqoperand
      • _addexpression
        • _mulexpression
          • _minusexpression
            • -
            • x
    • ==
    • _eqoperand
      • ...
        • y

and

  • _eqexpression
    • _eqoperand
      • _modexpression
        • _minusexpression
          • -
          • x
    • ==
    • _eqoperand
      • ...
        • y

These would invoke the same parsing actions, so the states can be combined, but Bison isn't smart enough to see that.

In order to eliminate these conflicts, if there are multiple paths through the precedence graph between a higher-precedence level foo and some lower precedence level bar -- that is, if there's a diamond in the precedence graph with foo at the top and bar at the bottom -- foo_expressions are excluded from all intermediate _expression productions on the diamond between foo and bar, and are added back in the downstream _operand productions in the diamond instead:

minus_expression:
  identifier | '-' identifier ;

// In the real grammar, trivial productions like this are inlined.
mul_operand:
  minus_expression ;
mul_lhs:
  mul_operand | mul_expression ;
// A minus_expression is not a mul_expression, even though it's a
// higher-precedence expression, because there are multiple paths from
// eq_expression to minus_expression, and this production is on such a path.
mul_expression:
  mul_lhs '*' mul_operand

// minus_expression is listed here because it is excluded from mul_expression.
add_operand:
  minus_expression | mul_expression ;
// This is notionally
//   add_operand | add_expression
// but that introduces another kind of reduce-reduce conflict, because there
// would be two ways to interpret a mul_expression as an add_lhs.
add_lhs:
  minus_expression | add_expression ;
// A mul_expression is an add_expression, because multiplication is
// higher-precedence, and mul is not at the top of a diamond in the precedence
// graph. minus_expression is excluded because we are within a diamond with it
// at the top.
add_expression:
  mul_expression | add_lhs '+' add_operand ;

mod_operand:
  minus_expression ;
mod_expression:
  mod_operand '%' mod_operand ;

// We add back minus_expression here because it was excluded from add_expression
// and mod_expression.
eq_operand:
  minus_expression | add_expression | mod_expression ;
// We also include minus_expression here because this is the bottom of the
// precedence diamond.
eq_expression:
  minus_expression | add_expression | mod_expression | eq_operand '=' eq_operand ;