# Parse ## Table of contents - [Overview](#overview) - [Parse stack](#parse-stack) - [Postorder tree](#postorder-tree) - [Bracketing inside the tree](#bracketing-inside-the-tree) - [Visual example](#visual-example) - [Handling invalid parses](#handling-invalid-parses) - [How is this accomplished?](#how-is-this-accomplished) - [Introducer](#introducer) - [Optional modifiers before an introducer](#optional-modifiers-before-an-introducer) - [Something required in context](#something-required-in-context) - [Optional clauses](#optional-clauses) - [Case 1: introducer to optional clause is used as parent node](#case-1-introducer-to-optional-clause-is-used-as-parent-node) - [Case 2: parent node is required token after optional clause, with different parent node kinds for different options](#case-2-parent-node-is-required-token-after-optional-clause-with-different-parent-node-kinds-for-different-options) - [Case 3: optional sibling](#case-3-optional-sibling) - [Operators](#operators) - [Alternatives considered](#alternatives-considered) - [Restrictive parsing](#restrictive-parsing) ## Overview Parsing uses tokens to produce a parse tree that faithfully represents the tree structure of the source program, interpreted according to the Carbon grammar. No semantics are associated with the tree structure at this level, and no name lookup is performed. The parse tree's structure corresponds to the grammar of the Carbon language. On valid input, there will be a 1:1 correspondence between parse tree nodes and tokens. A parse tree is considered _structurally valid_ if all nodes have the number of children that their node kind requires. On invalid input, nodes may be added that don't correspond to a token to maintain a structurally valid parse tree. When a parse tree node is marked as having an error, it will still be structurally valid, but its children may not match a valid grammar. Code trying to handle children of erroneous nodes must be prepared to handle atypical structures, but it may still be helpful for tools such as syntax highlighters or refactoring tools. In general, we favor doing the checking for whether something is allowed _in a particular context_ in [the check stage](check) instead of the parse stage, unless the context is very local. This is for a few reasons: - We anticipate that the parse stage will be used to operate on invalid code while still preserving as much of the intent of the author as possible, for example in an IDE or a code formatter. - To keep as much code out of the parse stage as possible, so it is simple and fast. - We are building all the infrastructure to keep track of context in the check stage. These reasons explain what local context is okay: where we already have the contextual information at hand so there is no performance cost, and we can output a parse tree that still captures faithfully what the user wrote. Examples: - All declaration modifiers are allowed in any order on any declaration in the parse stage. Diagnosing duplicated modifiers, modifiers that conflict with other modifiers, or modifiers that can't be used on a particular declaration is postponed until the check stage. - Rejecting a keyword after `fn` where a name is expected is done at the parse stage. ## Parse stack The core parser loop is `Parse::Tree::Parse`. In the loop, it pops the next state off the stack, and dispatches to the appropriate `Handle` function. A typical handler function pops the state first, leaving the stack ready for the next state. It may add nodes to the parse tree, based on the current code. If it needs to trigger other states, it will push them onto the stack; because it's a stack, the _next_ state is always pushed _last_. Operator expressions store information about current operator precedence in the stack as well. While this isn't necessary for most parser states, and could be stored separately, it's currently together because it has no impact on the size of a stack entry and is thus more efficient to store in one place. ## Postorder tree The parse tree's storage layout is in postorder. For example, given the code: ```carbon fn foo() -> f64 { return 42; } ``` The node order is (with indentation to indicate nesting): ```yaml [ {kind: 'FileStart', text: ''}, {kind: 'FunctionIntroducer', text: 'fn'}, {kind: 'Name', text: 'foo'}, {kind: 'ParamListStart', text: '('}, {kind: 'ParamList', text: ')', subtree_size: 2}, {kind: 'Literal', text: 'f64'}, {kind: 'ReturnType', text: '->', subtree_size: 2}, {kind: 'FunctionDefinitionStart', text: '{', subtree_size: 7}, {kind: 'ReturnStatementStart', text: 'return'}, {kind: 'Literal', text: '42'}, {kind: 'ReturnStatement', text: ';', subtree_size: 3}, {kind: 'FunctionDefinition', text: '}', subtree_size: 11}, {kind: 'FileEnd', text: ''}, ] ``` In this example, `FileStart`, `FunctionDefinition`, and `FileEnd` are "root" nodes for the tree. Function components are children of `FunctionDefinition`. It's produced in this way because it's an efficient layout to produce with vectorized storage, requiring little context to be maintained during parsing. Because it's stored in postorder, it's also most efficient to process the parsed output in postorder; this affects checking. The parse tree is printed in postorder by default because it matches how the parse tree is expected to be processed within the toolchain , and so can make it easier to reason about. However, the `--preorder` flag may be used in contexts where a preorder representation would be easier to handle. ## Bracketing inside the tree The parse tree is designed to be walked in postorder by checking, allowing checking to be more efficient. To support this, checking sometimes requires context on the meaning of a node when it is encountered. Each `ParseNodeKind` has either a bracketing node, or a specific child count. This helps document and enforce the expected tree structure. When a bracketing node is indicated, it is the opening bracket: it will always be the first child of the parent, and that will be the only time it occurs in the parent's children (it may still occur in children of children). When checking encounters the opening bracket, this means it can make contextual decisions for the later children of the node. Nodes can also have a specific child count, for example, infix operators always have two children: the lhs and rhs expressions. Many nodes have a child count of 0; this just means they're leaf nodes, and will never have children. Because the tree structure is always valid, these are treated as contracts. Some nodes exist only to be used to construct valid tree structures for invalid input, such as `InvalidParse`. Although each subtree's size is also tracked as part of the node, we're currently trying to avoid relying on it and may eliminate it if it turns out to be unnecessary and a meaningful cost for the compiler. ## Visual example To try to explain the transition from code to Parse Tree, consider the statement: ```carbon var x: i32 = y + 1; ``` Lexing creates distinct tokens for each syntactic element, which will form the basis of the parse tree:
Tokens:

+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
First the `var` keyword is used as a "bracketing" node (VariableIntroducer). When this is seen in a postorder traversal, it tells us to expect the basics of a variable declaration structure.
Tokens:

        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
        |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+

Parse tree:







+-----+
| var |
+-----+






Next, we can consider the pattern binding. Here, `x` is the identifier and `i32` is the type expression. The `:` provides a parent node that must always contain two children, the name and type expression. Because it always has two direct children, it doesn't need to be bracketed.
Tokens:

                                +-----+ +-----+ +-----+ +-----+ +-----+
                                |  =  | |  y  | |  +  | |  1  | |  ;  |
                                +-----+ +-----+ +-----+ +-----+ +-----+

Parse tree:

        +-----+ +-----+
        |  x  | | i32 |
        +-----+ +-----+
           |       |
           +-------+-------+
                           |
+-----+                 +-----+
| var |                 |  :  |
+-----+                 +-----+






We use the `=` as a separator (instead of a node with children like `:`) to help indicate the transition from binding to assignment expression, which is important for expression parsing during checking.
Tokens:

                                        +-----+ +-----+ +-----+ +-----+
                                        |  y  | |  +  | |  1  | |  ;  |
                                        +-----+ +-----+ +-----+ +-----+

Parse tree:

        +-----+ +-----+
        |  x  | | i32 |
        +-----+ +-----+
           |       |
           +-------+-------+
                           |
+-----+                 +-----+ +-----+
| var |                 |  :  | |  =  |
+-----+                 +-----+ +-----+






The expression is a subtree with `+` as the parent, and the two operands as child nodes.
Tokens:

                                                                +-----+
                                                                |  ;  |
                                                                +-----+

Parse tree:

        +-----+ +-----+                 +-----+ +-----+
        |  x  | | i32 |                 |  y  | |  1  |
        +-----+ +-----+                 +-----+ +-----+
           |       |                       |       |
           +-------+-------+               +-------+-------+
                           |                               |
+-----+                 +-----+ +-----+                 +-----+
| var |                 |  :  | |  =  |                 |  +  |
+-----+                 +-----+ +-----+                 +-----+






Finally, the `;` is used as the "root" of the variable declaration. It's explicitly tracked as the `;` for a variable declaration so that it's unambiguously bracketed by `var`.
Tokens:





Parse tree:

        +-----+ +-----+                 +-----+ +-----+
        |  x  | | i32 |                 |  y  | |  1  |
        +-----+ +-----+                 +-----+ +-----+
           |       |                       |       |
           +-------+-------+               +-------+-------+
                           |                               |
+-----+                 +-----+ +-----+                 +-----+
| var |                 |  :  | |  =  |                 |  +  |
+-----+                 +-----+ +-----+                 +-----+
   |                       |       |                       |
   +-----------------------+-------+-----------------------+-------+
                                                                   |
                                                                +-----+
                                                                |  ;  |
                                                                +-----+
This is the completed parse tree. In storage, this tree will be flat and in postorder. Because the order hasn't changed much from the original code, we can do the reordering for postorder with a minimal number of nodes being delayed for later output: it will be linear with respect to the depth of the parse tree.
Tokens:

+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+

Parse tree:

        +-----+ +-----+                 +-----+ +-----+
        |  x  | | i32 |                 |  y  | |  1  |
        +-----+ +-----+                 +-----+ +-----+
           |       |                       |       |
           +-------+-------+               +-------+-------+
                           |                               |
+-----+                 +-----+ +-----+                 +-----+
| var |                 |  :  | |  =  |                 |  +  |
+-----+                 +-----+ +-----+                 +-----+
   |                       |       |                       |
   +-----------------------+-------+-----------------------+-------+
                                                                   |
                                                                +-----+
                                                                |  ;  |
                                                                +-----+

Flattened for storage:

+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
| var | |  x  | | i32 | |  :  | |  =  | |  y  | |  1  | |  +  | |  ;  |
+-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
The structural concepts of bracketing nodes (`var` and `;`) and parent nodes with a known child count (`:` and `+` with 2 children, but also `=` with 0 children) will allow checking to reconstruct the tree as it encounters nodes during the postorder. There are other structures that could have been used here, such as `=` being parent of the `var` and pattern nodes, and `;` being the parent of the `=` and assignment expression nodes. In that example alternative, the storage order would be the same; it would only change the tree representation. The current structure is influenced by choices in checking. ## Handling invalid parses On an invalid parse, the output tree should still try to mirror the intended tree structure when possible. There's a balance here, and it's not expected to try too hard to make things correct, but outputting nodes is preferred. There are `InvalidParse` nodes which may be used to provide a node when the planned node kind is too difficult to get correct child counts (bracketed subtrees may not need an `InvalidParse` node). When marking a child node with `has_error=true`, parent nodes may also be marked with `has_error=true`, but try to be conservative about this. As a rule of thumb, if checking could continue on a parent node without needing the child node to be fully checked (possibly with incomplete information), then the parent node should not be marked as `has_error=true`. The goal remains providing something similar to a well-formed parse tree. In general, a parent node must have the immediate children described in [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h), unless it is marked `has_error=true`. If this is violated for a particular parse tree, an error will be raised in `Tree::Verify`. Note that an `InvalidParse` node is allowed as a declaration or expression, and an `InvalidParseSubtree` is allowed as a declaration. These invalid nodes can be added to more node categories as needed. Child states may indicate an error to their parent using `ReturnErrorOnState`. This is particularly intended for when a child state emits a diagnostic, to prevent the parent state from emitting redundant diagnostics; for example, an invalid expression might have more invalid tokens following it, and the parent might skip those without emitting diagnostics. ## How is this accomplished? The specific approach to producing the desired tree depends on the kind of grammar rule being implemented, as well as the desired output tree structure. ### Introducer **Example:** `if (c) { ... }` Here `if` is the introducer. Many other possible introducers could occur in that position, such as `while` or `var`, and we want to dispatch based on which token is present. See [parse/handle_statement.cpp](/toolchain/parse/handle_statement.cpp). The first step is to identify the introducer token, typically using a `switch` or `if` on the `Lex::TokenKind` at the current position: ```cpp switch (context.PositionKind()) { case Lex::TokenKind::___: { ... break; } ... } ``` There should be a `default:` (or `else`) case so every kind of token is handled. This may be an error, in which case: - A [diagnostic](diagnostics.md) should be emitted. - An invalid parse node should be added, using something like: ```cpp context.AddLeafNode(NodeKind::InvalidParse, context.Consume(), /*has_error=*/true); ``` - At least one node should be consumed, particularly if it will continue with this state at this position, to avoid an infinite loop. The default case may also be delegated to another state. For example, in the state where a statement is expected, if no keyword introducer is recognized, it switches to the expression-statement state. Depending on the introducer, different actions can be taken. The most common case is to: - Call `context.PushState(StateKind::___);` to mark the beginning of the statement or declaration and indicate the state that will handle the tokens after the introducer. - Call `context.AddLeafNode(NodeKind::___, context.Consume());` to output a bracketing node for this introducer. The next state can then add sibling nodes until it gets to the end of the declaration or statement. The last token, often a semicolon `;`, is used as a parent node to match the bracketing node of the introducer. If the introducer token won't be used as a bracketing node, it can be temporarily skipped after `context.PushState` by calling `context.ConsumeAndDiscard()` instead of `context.AddLeafNode`. It must be added to the output tree as a node by some later state, unless an error occurs. For example, a `for` statement uses the `for` token as the root of the tree -- it doesn't need a bracketing node since it has a fixed child count. Note that the token was saved when the state was pushed, and can be retrieved when adding a node as in this example: ```cpp auto state = context.PopState(); context.AddNode(NodeKind::ForStatement, state.token, state.subtree_start, state.has_error); ``` If this state is for an element of a scope like the statements in a code block, most introducer tokens indicate that the current state should be repeated, to handle the next statement, but some other token, like a close curly brace (`}`) means that the state should be exited. ### Optional modifiers before an introducer **Example:** `virtual fn Foo();` Here `fn` is the introducer, and `virtual` is an optional modifier that appears before. See [parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp). Use this pattern when the goal is to produce a subtree that starts with the introducer as a bracketing node, as in the previous case, followed by nodes for any modifiers. Note that bracketing is needed here, since the optional modifier nodes mean that there is not a fixed child count for the parent node. This means shuffling the introducer node before an unknown number of modifier nodes. This is accomplished by emitting a placeholder node for the introducer, processing all the modifiers until reaching the introducer, filling in the placeholder with the information about the introducer, and then finishing the rest of the declaration or statement. - **Step 1**: Save the current value of `context.tree().size()`. This could be accomplished by calling `context.PushState()`, which saves that value in the `subtree_start` field of `Context::State`; or by constructing a `Context::State` value directly, as is done in [parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp). This marks the position of the placeholder node we are going to replace, as well as the beginning of the subtree we are eventually going to emit for this declaration or statement. - **Step 2**: Emit the placeholder node using `context.AddLeafNode(NodeKind::Placeholder, *context.position());`. The `NodeKind` and `Lex::TokenIndex` values will be overwritten later. - **Step 3**: Process tokens until we hit the introducer. All of the nodes we emit at this point will appear as siblings after the introducer token in the output tree. - **Step 4 - success**: If an introducer token is found, replace the placeholder node using something like: ```cpp context.ReplacePlaceholderNode(state.subtree_start, introducer_kind, context.Consume()); ``` - `state.subtree_start` is the value of `context.tree().size()` saved in step 1, which marks the position of the placeholder node in the output parse tree. - `introducer_kind` is the `NodeKind` for the introducer of this declaration or statement, a leaf node that will act as a bracketing node at the beginning of the subtree for this declaration or statement - **Step 4 - error**: If we run into something other than a modifier or introducer before finding an introducer, we need to do error handling: ```cpp context.ReplacePlaceholderNode(subtree_start, NodeKind::InvalidParseStart, *context.position(), /*has_error=*/true); ``` - Emit a [diagnostic](diagnostics.md). - Replace the placeholder node (similar to step 4) with an `InvalidParseStart` node. It will be associated with the unexpected token that triggered this error. - Consume input token up to the likely end of the end of the current statement or declaration. For example, we might consume up to a `;` or a token at a lesser indent level using `context.SkipPastLikelyEnd(...)`. It is important that we consume at least one token in the error case, otherwise we could have an infinite loop of generating the same error on the same token. - Emit a `InvalidParseSubtree` node. This will be the parent of any emitted modifier nodes, and will be bracketed by the `InvalidParseStart` node emitted above. It should be associated with the last token consumed. ```cpp // Set `iter` to the last token consumed, one before the current position. auto iter = context.position(); --iter; context.AddNode(NodeKind::InvalidParseSubtree, *iter, subtree_start, /*has_error=*/true); ``` - **Step 5**: (If success at step 4) Push whatever states are to be used to parse the rest of the declaration. The first state pushed (the last state to be processed) will handle the end of this declaration. That pushed state should have a `subtree_start` field set to the value of `context.tree().size()` saved in step 1. - **Step 6**: When handling the state for the end of the declaration, emit the root node of subtree: ```cpp state = context.PopState(); context.AddNode(NodeKind::___, context.Consume(), state.subtree_start, state.has_error); ``` - This `state.subtree_start` will mark everything since the bracketing introducer node as the children of this node. ### Something required in context TODO Example: name after introducer [parse/handle_decl_name_and_params.cpp](/toolchain/parse/handle_decl_name_and_params.cpp) Example: "`[` _implicit parameter list_ `]`" after `impl forall` [parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp) ### Optional clauses #### Case 1: introducer to optional clause is used as parent node **Example:** The optional `-> ` in a function signature uses this pattern, so `fn foo() -> u32;` is transformed to: ```yaml {kind: 'FunctionIntroducer', text: 'fn'}, {kind: 'IdentifierName', text: 'foo'}, {kind: 'TuplePatternStart', text: '('}, {kind: 'TuplePattern', text: ')', subtree_size: 2}, {kind: 'UnsignedIntTypeLiteral', text: 'u32'}, {kind: 'ReturnType', text: '->', subtree_size: 2}, {kind: 'FunctionDecl', text: ';', subtree_size: 7}, ``` Note how the `->` token becomes a `ReturnType` node in the output tree, and is moved after the `u32` type expression that becomes its child. Compare with the parse tree output for `fn foo();` which has no `ReturnType` node: ```yaml {kind: 'FunctionIntroducer', text: 'fn'}, {kind: 'IdentifierName', text: 'foo'}, {kind: 'TuplePatternStart', text: '('}, {kind: 'TuplePattern', text: ')', subtree_size: 2}, {kind: 'FunctionDecl', text: ';', subtree_size: 5}, ``` Here is the code from [parse/handle_function.cpp](/toolchain/parse/handle_function.cpp) that does this: ```cpp auto HandleFunctionAfterParams(Context& context) -> void { ... // If there is a return type, parse the expression before adding the return // type node. if (context.PositionIs(Lex::TokenKind::MinusGreater)) { context.PushState(State::FunctionReturnTypeFinish); context.ConsumeAndDiscard(); context.PushStateForExpr(PrecedenceGroup::ForType()); } } auto HandleFunctionReturnTypeFinish(Context& context) -> void { auto state = context.PopState(); context.AddNode(NodeKind::ReturnType, state.token, state.subtree_start, state.has_error); } ``` The `->` token is saved by `context.PushState(`...`)`, so it is available as `state.token` when calling `context.AddNode(NodeKind::ReturnType, state.token,`...`)` later in `HandleFunctionReturnTypeFinish`. Also see how the optional initializer is handled on `var`, treating the `=` as its introducer in `HandleVarAfterPattern` and `HandleVarInitializer` in [parse/handle_var.cpp](/toolchain/parse/handle_var.cpp). #### Case 2: parent node is required token after optional clause, with different parent node kinds for different options **Example:** The optional type expression before `as` in `impl as` is represented by producing two different output parse nodes for `as`. It outputs a `DefaultSelfImplAs` node with no children when the type expression is absent, and otherwise a `TypeImplAs` parse node with the type expression as its child. So `impl bool as Interface;` is transformed to: ```yaml {kind: 'ImplIntroducer', text: 'impl'}, {kind: 'BoolTypeLiteral', text: 'bool'}, {kind: 'TypeImplAs', text: 'as', subtree_size: 2}, {kind: 'IdentifierNameExpr', text: 'Interface'}, {kind: 'ImplDecl', text: ';', subtree_size: 5}, ``` while `impl as Interface;` is transformed to: ```yaml {kind: 'ImplIntroducer', text: 'impl'}, {kind: 'DefaultSelfImplAs', text: 'as'}, {kind: 'IdentifierNameExpr', text: 'Interface'}, {kind: 'ImplDecl', text: ';', subtree_size: 4}, ``` This is handled by the `ExpectAsOrTypeExpression` code from [parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp): ```cpp if (context.PositionIs(Lex::TokenKind::As)) { // as ... context.AddLeafNode(NodeKind::DefaultSelfImplAs, context.Consume()); context.PushState(StateKind::Expr); } else { // as ... context.PushState(StateKind::ImplBeforeAs); context.PushStateForExpr(PrecedenceGroup::ForImplAs()); } ``` and then `HandleImplBeforeAs` creates the parent node in the second case: ```cpp auto state = context.PopState(); if (auto as = context.ConsumeIf(Lex::TokenKind::As)) { context.AddNode(NodeKind::TypeImplAs, *as, state.subtree_start, state.has_error); context.PushState(StateKind::Expr); } else { if (!state.has_error) { CARBON_DIAGNOSTIC(ImplExpectedAs, Error, "expected `as` in `impl` declaration"); context.emitter().Emit(*context.position(), ImplExpectedAs); } context.ReturnErrorOnState(); } ``` Note (1) that the `state.subtree_start` value comes from the `context.PushState(StateKind::ImplBeforeAs);` before parsing the type expression, and that is how that type expression ends up as the child of the created `TypeImplAs` node. Unlike [the previous case 1](#case-1-introducer-to-optional-clause-is-used-as-parent-node), though, the parent node uses the token after the optional expression, rather than an introducer token for the optional clause. Note (2) how `HandleImplBeforeAs` handles three cases of errors: - `as` present but an error in the child type expression -> error on the output `TypeImplAs` node, but not propagated to the parent. - Error from no `as` present but the type expression was okay -> create a new error. - There was error from the child type expression and no `as` present -> no new diagnostic, we suppress errors once one is emitted until we can recover. If there is no `as` token, we don't output either a `TypeImplAs` or a `DefaultSelfImplAs` node, as required by the parent node, so in those cases we mark the parent as having an error. #### Case 3: optional sibling > TODO: This was changed by > [#3678](https://github.com/carbon-language/carbon-lang/pull/3678) and needs to > be updated. **Example:** The optional type expression before `as` in `impl as` is output as an optional sibling subtree between the `ImplIntroducer` node for the `impl` introducer and the `ImplAs` node for the required `as` keyword. `impl bool as Interface;` is transformed to: ```yaml {kind: 'ImplIntroducer', text: 'impl'}, {kind: 'BoolTypeLiteral', text: 'bool'}, {kind: 'ImplAs', text: 'as'}, {kind: 'IdentifierNameExpr', text: 'Interface'}, {kind: 'ImplDecl', text: ';', subtree_size: 5}, ``` while `impl as Interface;` is transformed to: ```yaml {kind: 'ImplIntroducer', text: 'impl'}, {kind: 'ImplAs', text: 'as'}, {kind: 'IdentifierNameExpr', text: 'Interface'}, {kind: 'ImplDecl', text: ';', subtree_size: 4}, ``` This is handled by the `ExpectAsOrTypeExpression` code from [parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp): ```cpp if (context.PositionIs(Lex::TokenKind::As)) { // as ... context.AddLeafNode(NodeKind::ImplAs, context.Consume()); context.PushState(StateKind::Expr); } else { // as ... context.PushState(StateKind::ImplBeforeAs); context.PushStateForExpr(PrecedenceGroup::ForImplAs()); } ``` and then `HandleImplBeforeAs` follows [the "something required in context" pattern](#something-required-in-context) to deal with the `as` that follows when the type expression is present. ### Operators TODO An independent description of our approach: ["Better operator precedence" on scattered-thoughts.net](https://www.scattered-thoughts.net/writing/better-operator-precedence/) ## Alternatives considered ### Restrictive parsing The toolchain will often parse code that could theoretically be rejected, instead allowing the check phase to reject incorrect structures. For example, consider the code `abstract var x: i32 = 0;`. When parsing the `abstract` modifier, parse could do single-token lookahead to see `var`, and error in the parse (`abstract var` is never valid). Instead, we save the modifier and diagnose it during check. The problem is that code isn't always this simple. Considering the above example, there could be other modifiers, such as `abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a general solution. Some modifiers are also contextually valid; for example, `abstract fn` is only valid inside an `abstract class` scope. As a consequence, a form of either arbitrary lookahead or additional context would be necessary in parse in order to reliably diagnose incorrect uses of `abstract`. In contrast with parse, check will have that additional context. Rejecting incorrect code during parsing can also have negative consequences for diagnostics. The additional information that check has about semantics may produce better diagnostics. Alternately, sometimes check will produce diagnostics equivalent to what parse could, but with less work overall. As a consequence, at times we will defer to the check phase to produce diagnostics instead of trying to produce those same diagnostics during parse. Some examples of why we might diagnose in check instead of parse are: - To issue better diagnostics based on semantic information. - To diagnose similar invalid uses in one place, versus partly in check and partly in parse. - To support syntax highlighting for IDEs in near-correct code, still being typed. Some examples of why we might diagnose in parse are: - When it's important to distinguish between multiple possible syntaxes. - When permitting the syntax would require more work than rejecting it. A few examples of parse designs to avoid are: - Using arbitrary lookahead. - Looking ahead one or two tokens is okay. However, we should never have arbitrary lookahead. - This includes approaches which would require using the mapping of opening brackets to closing brackets that is produced by `TokenizedBuffer`. Those are helpful for error recovery. - Building complex context. - We want parsing to be faster and lighter weight than check. - Duplicating diagnostics between parse and check. - When there are closely related invalid variants of syntax, only some of which can be diagnosed during parse, consider diagnosing all variants during check. This is a balance. We don't want to unnecessarily shift costs from parse onto check, and we don't try to allow clearly invalid constructs. Parse still tries to produce a reasonable parse tree. However, parse leans more towards a permissive parse, and an error-free parse tree does not mean the code is grammatically correct.