1 year ago · a24816a1f4
--- a/toolchain/README.md
+++ b/toolchain/README.md
@@ -6,6 +6,4 @@ Exceptions. See /LICENSE for license information.
 
															 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															 -->
														
 
															-A design is currently maintained in
														
 
															-[Google Drive](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw).
														
 
															-It'll be migrated to markdown once we are confident in its stability.
														
 
															+See [docs](docs/).
														
--- a/toolchain/docs/README.md
+++ b/toolchain/docs/README.md
@@ -0,0 +1,94 @@
 
															+# Toolchain architecture
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Goals](#goals)
														
 
															+-   [High-level architecture](#high-level-architecture)
														
 
															+    -   [Design patterns](#design-patterns)
														
 
															+-   [Adding features](#adding-features)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Goals
														
 
															+
														
 
															+The toolchain represents the production portion of Carbon. At a high level, the
														
 
															+toolchain's top priorities are:
														
 
															+
														
 
															+-   Correctness.
														
 
															+-   Quality of generated code, including performance.
														
 
															+-   Compilation performance.
														
 
															+-   Quality of diagnostics for incorrect or questionable code.
														
 
															+
														
 
															+TODO: Add an expanded document that details the goals and priorities and link to
														
 
															+it here.
														
 
															+
														
 
															+## High-level architecture
														
 
															+
														
 
															+The main components are:
														
 
															+
														
 
															+-   [Driver](driver.md): Provides commands and ties together compilation flow.
														
 
															+-   [Diagnostics](diagnostics.md): Produces diagnostic output.
														
 
															+-   Compilation flow:
														
 
															+
														
 
															+    1. Source: Load the file into a
														
 
															+       [SourceBuffer](/toolchain/source/source_buffer.h).
														
 
															+    2. [Lex](lex.md): Transform a SourceBuffer into a
														
 
															+       [Lex::TokenizedBuffer](/toolchain/lex/tokenized_buffer.h).
														
 
															+    3. [Parse](parse.md): Transform a TokenizedBuffer into a
														
 
															+       [Parse::Tree](/toolchain/parse/tree.h).
														
 
															+    4. [Check](check.md): Transform a Tree to produce
														
 
															+       [SemIR::File](/toolchain/sem_ir/file.h).
														
 
															+    5. [Lower](lower.md): Transform the SemIR to an
														
 
															+       [LLVM Module](https://llvm.org/doxygen/classllvm_1_1Module.html).
														
 
															+    6. CodeGen: Transform the LLVM Module into an Object File.
														
 
															+
														
 
															+### Design patterns
														
 
															+
														
 
															+A few common design patterns are:
														
 
															+
														
 
															+-   Distinct steps: Each step of processing produces an output structure,
														
 
															+    avoiding callbacks passing data between structures.
														
 
															+
														
 
															+    -   For example, the parser takes a `Lex::TokenizedBuffer` as input and
														
 
															+        produces a `Parse::Tree` as output.
														
 
															+
														
 
															+    -   Performance: It should yield better locality versus a callback approach.
														
 
															+
														
 
															+    -   Understandability: Each step has a clear input and output, versus
														
 
															+        callbacks which obscure the flow of data.
														
 
															+
														
 
															+-   Vectorized storage: Data is stored in vectors and flyweights are passed
														
 
															+    around, avoiding more typical heap allocation with pointers.
														
 
															+
														
 
															+    -   For example, the parse tree is stored as a
														
 
															+        `llvm::SmallVector<Parse::Tree::NodeImpl>` indexed by `Parse::Node`
														
 
															+        which wraps an `int32_t`.
														
 
															+
														
 
															+    -   Performance: Vectorization both minimizes memory allocation overhead and
														
 
															+        enables better read caching because adjacent entries will be cached
														
 
															+        together.
														
 
															+
														
 
															+-   Iterative processing: We rely on state stacks and iterative loops for
														
 
															+    parsing, avoiding recursive function calls.
														
 
															+
														
 
															+    -   For example, the parser has a `Parse::State` enum tracked in
														
 
															+        `state_stack_`, and loops in `Parse::Tree::Parse`.
														
 
															+
														
 
															+    -   Scalability: Complex code must not cause recursion issues. We have
														
 
															+        experience in Clang seeing stack frame recursion limits being hit in
														
 
															+        unexpected ways, and non-recursive approaches largely avoid that risk.
														
 
															+
														
 
															+See also [Idioms](idioms.md) for abbreviations and more implementation
														
 
															+techniques.
														
 
															+
														
 
															+## Adding features
														
 
															+
														
 
															+We have a [walkthrough for adding features](adding_features.md).
														
--- a/toolchain/docs/adding_features.md
+++ b/toolchain/docs/adding_features.md
@@ -0,0 +1,433 @@
 
															+# Adding features
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Lex](#lex)
														
 
															+-   [Parse](#parse)
														
 
															+    -   [Typed parse node metadata implementation](#typed-parse-node-metadata-implementation)
														
 
															+-   [Check](#check)
														
 
															+    -   [SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation)
														
 
															+-   [Lower](#lower)
														
 
															+-   [Tests and debugging](#tests-and-debugging)
														
 
															+    -   [Running tests](#running-tests)
														
 
															+    -   [Updating tests](#updating-tests)
														
 
															+        -   [Reviewing test deltas](#reviewing-test-deltas)
														
 
															+    -   [Verbose output](#verbose-output)
														
 
															+    -   [Stack traces](#stack-traces)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Lex
														
 
															+
														
 
															+New lexed tokens must be added to
														
 
															+[token_kind.def](/toolchain/lex/token_kind.def). `CARBON_SYMBOL_TOKEN` and
														
 
															+`CARBON_KEYWORD_TOKEN` both provide some built-in lexing logic, while
														
 
															+`CARBON_TOKEN` requires custom lexing support.
														
 
															+
														
 
															+[TokenizedBuffer::Lex](/toolchain/lex/tokenized_buffer.h) is the main dispatch
														
 
															+for lexing, and calls that need to do custom lexing will be dispatched there.
														
 
															+
														
 
															+## Parse
														
 
															+
														
 
															+A parser feature will have state transitions that produce new parse nodes.
														
 
															+
														
 
															+The resulting parse nodes are in
														
 
															+[parse/node_kind.def](/toolchain/parse/node_kind.def) and
														
 
															+[typed_nodes.h](/toolchain/parse/typed_nodes.h). When choosing node structure,
														
 
															+consider how semantics will process it in post-order; this will rule out some
														
 
															+designs. Adding a parse node kind will also require a handler in the `Check`
														
 
															+step.
														
 
															+
														
 
															+The state transitions are in [parse/state.def](/toolchain/parse/state.def). Each
														
 
															+`CARBON_PARSER_STATE` defines a distinct state and has comments for state
														
 
															+transitions. If several states should share handling, name them
														
 
															+`FeatureAsVariant`.
														
 
															+
														
 
															+Adding a state requires adding a `Handle<name>` function in an appropriate
														
 
															+`parse/handle_*.cpp` file, possibly a new file. The macros are used to generate
														
 
															+declarations in the header, so only extra helper functions should be added
														
 
															+there. Every state handler pops the state from the stack before any other
														
 
															+processing.
														
 
															+
														
 
															+### Typed parse node metadata implementation
														
 
															+
														
 
															+As of [#3534](https://github.com/carbon-language/carbon-lang/pull/3534):
														
 
															+
														
 
															+![parse](parse.svg)
														
 
															+
														
 
															+> TODO: Convert this chart to Mermaid.
														
 
															+
														
 
															+-   [common/enum_base.h](/common/enum_base.h) defines the `EnumBase`
														
 
															+    [CRTP](idioms.md#crtp-or-curiously-recurring-template-pattern) class
														
 
															+    extending `Printable` from [common/ostream.h](/common/ostream.h), along with
														
 
															+    `CARBON_ENUM` macros for making enumerations
														
 
															+
														
 
															+-   [parse/node_kind.h](/toolchain/parse/node_kind.h) includes
														
 
															+    [common/enum_base.h](/common/enum_base.h) and defines an enumeration
														
 
															+    `NodeKind`, along with bitmask enum `NodeCategory`.
														
 
															+
														
 
															+    -   The `NodeKind` enumeration is populated with the list of all parse node
														
 
															+        kinds using [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
														
 
															+        [the .def file idiom](idioms.md#def-files)) _declared_ in this file
														
 
															+        using a macro from [common/enum_base.h](/common/enum_base.h)
														
 
															+
														
 
															+    -   `NodeKind` has a member type `NodeKind::Definition` that extends
														
 
															+        `NodeKind` and adds a `NodeCategory` field (and others in the future).
														
 
															+
														
 
															+    -   `NodeKind` has a method `Define` for creating a `NodeKind::Definition`
														
 
															+        with the same enumerant value, plus values for the other fields.
														
 
															+
														
 
															+    -   `HasKindMember<T>` at the bottom of
														
 
															+        [parse/node_kind.h](/toolchain/parse/node_kind.h) uses
														
 
															+        [field detection](idioms.md#field-detection) to determine if the type
														
 
															+        `T` has a `NodeKind::Definition Kind` static constant member.
														
 
															+
														
 
															+        -   Note: both the type and name of these fields must match exactly.
														
 
															+
														
 
															+    -   Note that additional information is needed to define the `category()`
														
 
															+        method (and other methods in the future) of `NodeKind`. This information
														
 
															+        comes from the typed parse node definitions in
														
 
															+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) (described below).
														
 
															+
														
 
															+-   [parse/node_ids.h](/toolchain/parse/node_ids.h) defines a number of types
														
 
															+    that store a _node id_ that identifies a node in the parse tree
														
 
															+
														
 
															+    -   `NodeId` stores a node id with no restrictions
														
 
															+
														
 
															+    -   `NodeIdForKind<Kind>` inherits from `NodeId` and stores the id of a node
														
 
															+        that must have the specified `NodeKind` "`Kind`". Note that this is not
														
 
															+        used directly, instead aliases `FooId` for
														
 
															+        `NodeIdForKind<NodeKind::Foo>` are defined for every node kind using
														
 
															+        [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
														
 
															+        [the .def file idiom](idioms.md#def-files)).
														
 
															+
														
 
															+    -   `NodeIdInCategory<Category>` inherits from `NodeId` and stores the id of
														
 
															+        a node that must overlap the specified `NodeCategory` "`Category`". Note
														
 
															+        that this is not typically used directly, instead this file defines
														
 
															+        aliases `AnyDeclId`, `AnyExprId`, ..., `AnyStatementId`.
														
 
															+
														
 
															+    -   Similarly `NodeIdOneOf<T, U>` and `NodeIdNot<V>` inherit from `NodeId`
														
 
															+        and stores the id of a node restricted to either matching `T::Kind` or
														
 
															+        `U::Kind` or not matching `V::Kind`.
														
 
															+    -   In addition to the node id type definitions above, the struct
														
 
															+        `NodeForId<T>` is declared but not defined.
														
 
															+
														
 
															+-   [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) defines a typed parse
														
 
															+    node struct type for each kind of parse node.
														
 
															+
														
 
															+    -   Each one defines a static constant named `Kind` that is set using a call
														
 
															+        to `Define()` on the corresponding enumerant member of `NodeKind` from
														
 
															+        [parse/node_kind.h](/toolchain/parse/node_kind.h) (which is included by
														
 
															+        this file).
														
 
															+    -   The fields of these types specify the children of the parse node using
														
 
															+        the types from [parse/node_ids.h](/toolchain/parse/node_ids.h).
														
 
															+
														
 
															+    -   The struct `NodeForId<T>` that is declared in
														
 
															+        [parse/node_ids.h](/toolchain/parse/node_ids.h) is defined in this file
														
 
															+        such that `NodeForId<FooId>::TypedNode` is the `Foo` typed parse node
														
 
															+        struct type.
														
 
															+
														
 
															+    -   This file will fail to compile unless every kind of parse node kind
														
 
															+        defined in [parse/node_kind.def](/toolchain/parse/node_kind.def) has a
														
 
															+        corresponding struct type in this file.
														
 
															+
														
 
															+-   [parse/node_kind.cpp](/toolchain/parse/node_kind.cpp) includes both
														
 
															+    [parse/node_kind.h](/toolchain/parse/node_kind.h) and
														
 
															+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h)
														
 
															+
														
 
															+    -   Uses the macro from [common/enum_base.h](/common/enum_base.h), the
														
 
															+        enumerants of `NodeKind` are _defined_ using the list of parse node
														
 
															+        kinds from [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
														
 
															+        [the .def file idiom](idioms.md#def-files)).
														
 
															+
														
 
															+    -   `NodeKind::definition()` is defined. It has a static table of
														
 
															+        `const NodeKind::Definition*` indexed by the enum value, populated by
														
 
															+        taking the address of the `Kind` member of each typed parse node struct
														
 
															+        type, using the list from
														
 
															+        [parse/node_kind.def](/toolchain/parse/node_kind.def).
														
 
															+
														
 
															+    -   `NodeKind::category()` is defined using `NodeKind::definition()`.
														
 
															+
														
 
															+    -   Tested assumption: the tables built in this file are indexed by the enum
														
 
															+        values. We rely on the fact that we get the parse node kinds in the same
														
 
															+        order by consistently using
														
 
															+        [parse/node_kind.def](/toolchain/parse/node_kind.def).
														
 
															+
														
 
															+-   [parse/tree.h](/toolchain/parse/tree.h) includes
														
 
															+    [parse/node_ids.h](/toolchain/parse/node_ids.h). It does not depend on
														
 
															+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) to reduce compilation
														
 
															+    time in those files that don't use the typed parse node struct types.
														
 
															+
														
 
															+    -   Defines `Tree::Extract`... functions that take a node id and return a
														
 
															+        typed parse node struct type from
														
 
															+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h).
														
 
															+
														
 
															+    -   Uses `HasKindMember<T>` to restrict calling `ExtractAs` except on typed
														
 
															+        nodes defined in [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h).
														
 
															+
														
 
															+    -   `Tree::Extract` uses `NodeForId<T>` to get the corresponding typed parse
														
 
															+        node struct type for a `FooId` type defined in
														
 
															+        [parse/node_ids.h](/toolchain/parse/node_ids.h).
														
 
															+
														
 
															+        -   Note that this is done without a dependency on the typed parse node
														
 
															+            struct types by using the forward declaration of `NodeForId<T>` from
														
 
															+            [parse/node_ids.h](/toolchain/parse/node_ids.h).
														
 
															+
														
 
															+    -   The `Tree::Extract`... functions ultimately call
														
 
															+        `Tree::TryExtractNodeFromChildren<T>`, which is a templated function
														
 
															+        only declared in this file. Its definition is in
														
 
															+        [parse/extract.cpp](/toolchain/parse/extract.cpp).
														
 
															+
														
 
															+-   [parse/extract.cpp](/toolchain/parse/extract.cpp) includes
														
 
															+    [parse/tree.h](/toolchain/parse/tree.h) and
														
 
															+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h)
														
 
															+
														
 
															+    -   Defines struct `Extractable<T>` that defines how to extract a field of
														
 
															+        type `T` from a `Tree::SiblingIterator` pointing at the corresponding
														
 
															+        child node.
														
 
															+
														
 
															+    -   `Extractable<T>` is defined for the node id types defined in
														
 
															+        [parse/node_ids.h](/toolchain/parse/node_ids.h).
														
 
															+
														
 
															+    -   In addition, `Extractable<T>` is defined for standard types
														
 
															+        `std::optional<U>` and `llvm::SmallVector<V>`, to support optional and
														
 
															+        repeated children.
														
 
															+
														
 
															+    -   Uses [struct reflection](idioms.md#struct-reflection) to support
														
 
															+        aggregate struct types containing extractable fields. This is used to
														
 
															+        support typed parse node struct types as well as struct fields that they
														
 
															+        contain.
														
 
															+
														
 
															+    -   Uses `HasKindMember<Foo>` to detect accidental uses of a parse node type
														
 
															+        directly as fields of typed parse node struct types -- in those places
														
 
															+        `FooId` should be used instead.
														
 
															+
														
 
															+    -   Defines `Tree::TryExtractNodeFromChildren<T>` and explicitly
														
 
															+        instantiates it for every typed parse node struct type defined in
														
 
															+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) using
														
 
															+        [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
														
 
															+        [the .def file idiom](idioms.md#def-files)). By explicitly instantiating
														
 
															+        this function only in this file, we avoid redundant compilation work,
														
 
															+        which reduces build times, and allow us to keep all the extraction
														
 
															+        machinery as a private implementation detail of this file.
														
 
															+
														
 
															+-   [parse/typed_nodes_test.cpp](/toolchain/parse/typed_nodes_test.cpp)
														
 
															+    validates that each typed parse node struct type has a static `Kind` member
														
 
															+    that defines the correct corresponding `NodeKind`, and that the `category()`
														
 
															+    function agrees between the `NodeKind` and `NodeKind::Definition`.
														
 
															+
														
 
															+Note: this is broadly similar to
														
 
															+[SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation).
														
 
															+
														
 
															+## Check
														
 
															+
														
 
															+Each parse node kind requires adding a `Handle<kind>` function in a
														
 
															+`check/handle_*.cpp` file.
														
 
															+
														
 
															+If the resulting SemIR needs a new instruction:
														
 
															+
														
 
															+-   add a new kind to [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
														
 
															+    -   Add a `CARBON_SEM_IR_INST_KIND(NewInstKindName)` line in alphabetical
														
 
															+        order
														
 
															+-   a new struct definition to
														
 
															+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h), such as:
														
 
															+
														
 
															+    ```cpp
														
 
															+    struct NewInstKindName {
														
 
															+        static constexpr auto Kind = InstKind::NewInstKindName.Define(
														
 
															+            // the name used in textual IR
														
 
															+            "new_inst_kind_name"
														
 
															+            // Optional: , TerminatorKind::KindOfTerminator
														
 
															+            );
														
 
															+
														
 
															+        // Optional: omit if not associated with a parse node.
														
 
															+        Parse::Node parse_node;
														
 
															+
														
 
															+        // Optional: omit if this sem_ir instruction does not produce a value.
														
 
															+        TypeId type_id;
														
 
															+
														
 
															+        // 0-2 id fields, with types from sem_ir/ids.h or sem_ir/builtin_kind.h
														
 
															+        // For example, fields would look like:
														
 
															+        StringId name_id;
														
 
															+        InstId value_id;
														
 
															+    };
														
 
															+    ```
														
 
															+
														
 
															+Adding an instruction will also require a handler in the Lower step.
														
 
															+
														
 
															+Most new instructions will automatically be formatted reasonably by the SemIR
														
 
															+formatter.
														
 
															+
														
 
															+If the resulting SemIR needs a new built-in, add it to
														
 
															+[builtin_inst_kind.def](/toolchain/sem_ir/builtin_inst_kind.def).
														
 
															+
														
 
															+### SemIR typed instruction metadata implementation
														
 
															+
														
 
															+How does this work? As of
														
 
															+[#3310](https://github.com/carbon-language/carbon-lang/pull/3310):
														
 
															+
														
 
															+![check](check.svg)
														
 
															+
														
 
															+> TODO: Convert this chart to Mermaid.
														
 
															+
														
 
															+-   [common/enum_base.h](/common/enum_base.h) defines the `EnumBase`
														
 
															+    [CRTP](idioms.md#crtp-or-curiously-recurring-template-pattern) class
														
 
															+    extending `Printable` from [common/ostream.h](/common/ostream.h), along with
														
 
															+    `CARBON_ENUM` macros for making enumerations
														
 
															+
														
 
															+-   [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) includes
														
 
															+    [common/enum_base.h](/common/enum_base.h) and defines an enumeration
														
 
															+    `InstKind`, along with `InstValueKind` and `TerminatorKind`.
														
 
															+
														
 
															+    -   The `InstKind` enumeration is populated with the list of all instruction
														
 
															+        kinds using [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
														
 
															+        (using [the .def file idiom](idioms.md#def-files)) _declared_ in this
														
 
															+        file using a macro from [common/enum_base.h](/common/enum_base.h)
														
 
															+
														
 
															+    -   `InstKind` has a member type `InstKind::Definition` that extends
														
 
															+        `InstKind` and adds the `ir_name` string field, and a `TerminatorKind`
														
 
															+        field.
														
 
															+
														
 
															+    -   `InstKind` has a method `Define` for creating a `InstKind::Definition`
														
 
															+        with the same enumerant value, plus values for the other fields.
														
 
															+
														
 
															+-   Note that additional information is needed to define the `ir_name()`,
														
 
															+    `value_kind()`, and `terminator_kind()` methods of `InstKind`. This
														
 
															+    information comes from the typed instruction definitions in
														
 
															+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h).
														
 
															+
														
 
															+-   [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) defines a typed
														
 
															+    instruction struct type for each kind of SemIR instruction, as described
														
 
															+    above.
														
 
															+
														
 
															+    -   Each one defines a static constant named `Kind` that is set using a call
														
 
															+        to `Define()` on the corresponding enumerant member of `InstKind` from
														
 
															+        [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) (which is included
														
 
															+        by this file).
														
 
															+
														
 
															+-   `HasParseNodeMember<TypedInst>` and `HasTypeIdMember<TypedInst>` at the
														
 
															+    bottom of [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) use
														
 
															+    [field detection](idioms.md#field-detection) to determine if `TypedInst` has
														
 
															+    a `Parse::Node parse_node` or a `TypeId type_id` field respectively.
														
 
															+
														
 
															+    -   Note: both the type and name of these fields must match exactly.
														
 
															+
														
 
															+-   [sem_ir/inst_kind.cpp](/toolchain/sem_ir/inst_kind.cpp) includes both
														
 
															+    [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) and
														
 
															+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h)
														
 
															+
														
 
															+    -   Uses the macro from [common/enum_base.h](/common/enum_base.h), the
														
 
															+        enumerants of `InstKind` are _defined_ using the list of instruction
														
 
															+        kinds from [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
														
 
															+        (using [the .def file idiom](idioms.md#def-files))
														
 
															+
														
 
															+    -   `InstKind::value_kind()` is defined. It has a static table of
														
 
															+        `InstValueKind` values indexed by the enum value, populated by applying
														
 
															+        `HasTypeIdMember` from
														
 
															+        [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) to every
														
 
															+        instruction kind by using the list from
														
 
															+        [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
														
 
															+    -   `InstKind::definition()` is defined. It has a static table of
														
 
															+        `const InstKind::Definition*` indexed by the enum value, populated by
														
 
															+        taking the address of the `Kind` member of each `TypedInst`, using the
														
 
															+        list from [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
														
 
															+
														
 
															+    -   `InstKind::ir_name()` and `InstKind::terminator_kind()` are defined
														
 
															+        using `InstKind::definition()`.
														
 
															+    -   Tested assumption: the tables built in this file are indexed by the enum
														
 
															+        values. We rely on the fact that we get the instruction kinds in the
														
 
															+        same order by consistently using
														
 
															+        [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
														
 
															+
														
 
															+    -   This file will fail to compile unless every kind of SemIR instruction
														
 
															+        defined in [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def) has a
														
 
															+        corresponding struct type in
														
 
															+        [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h).
														
 
															+
														
 
															+-   `TypedInstArgsInfo<TypedInst>` defined in
														
 
															+    [sem_ir/inst.h](/toolchain/sem_ir/inst.h) uses
														
 
															+    [struct reflection](idioms.md#struct-reflection) to determine the other
														
 
															+    fields from `TypedInst`. It skips the `parse_node` and `type_id` fields
														
 
															+    using `HasParseNodeMember<TypedInst>` and `HasTypeIdMember<TypedInst>`.
														
 
															+
														
 
															+    -   Tested assumption: the `parse_node` and `type_id` are the first fields
														
 
															+        in `TypedInst`, and there are at most two more fields.
														
 
															+
														
 
															+-   [sem_ir/inst.h](/toolchain/sem_ir/inst.h) defines templated conversions
														
 
															+    between `Inst` and each of the typed instruction structs:
														
 
															+
														
 
															+    -   Uses `TypedInstArgsInfo<TypedInst>`, `HasParseNodeMember<TypedInst>`,
														
 
															+        and `HasTypeIdMember<TypedInst>`, and
														
 
															+        [local lambda](idioms.md#local-lambdas-to-reduce-duplicate-code).
														
 
															+
														
 
															+    -   Defines a templated `ToRaw` function that converts the various id field
														
 
															+        types to an `int32_t`.
														
 
															+    -   Defines a templated `FromRaw<T>` function that converts an `int32_t` to
														
 
															+        `T` to perform the opposite conversion.
														
 
															+    -   Tested assumption: The `parse_node` field is first, when present, and
														
 
															+        the `type_id` is next, when present, in each `TypedInst` struct type.
														
 
															+
														
 
															+-   The "tested assumptions" above are all tested by
														
 
															+    [sem_ir/typed_insts_test.cpp](/toolchain/sem_ir/typed_insts_test.cpp)
														
 
															+
														
 
															+## Lower
														
 
															+
														
 
															+Each SemIR instruction requires adding a `Handle<kind>` function in a
														
 
															+`lower/handle_*.cpp` file.
														
 
															+
														
 
															+## Tests and debugging
														
 
															+
														
 
															+### Running tests
														
 
															+
														
 
															+Tests are run in bulk as `bazel test //toolchain/...`. Many tests are using the
														
 
															+file_test infrastructure; see
														
 
															+[testing/file_test/README.md](/testing/file_test/README.md) for information.
														
 
															+
														
 
															+There are several supported ways to run Carbon on a given test file. For
														
 
															+example, with `toolchain/parse/testdata/basics/empty.carbon`:
														
 
															+
														
 
															+-   `bazel test //toolchain/testing:file_test --test_arg=--file_tests=toolchain/parse/testdata/basics/empty.carbon`
														
 
															+    -   Executes an individual test.
														
 
															+-   `bazel run //toolchain/parse:testdata/basics/empty.carbon.run`
														
 
															+    -   Runs `carbon` on the file with standard arguments, printing output to
														
 
															+        console.
														
 
															+    -   This form will often be most useful when iterating over a specific test.
														
 
															+-   `bazel run //toolchain/parse:testdata/basics/empty.carbon.verbose`
														
 
															+    -   Similar to the previous command, but with the `-v` flag implied.
														
 
															+-   `bazel run //toolchain/driver:carbon -- compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon`
														
 
															+    -   Explicitly runs `carbon` with the provided arguments.
														
 
															+-   `bazel-bin/toolchain/driver/carbon compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon`
														
 
															+    -   Similar to the previous command, but without using `bazel`.
														
 
															+
														
 
															+### Updating tests
														
 
															+
														
 
															+The `toolchain/autoupdate_testdata.py` script can be used to update output. It
														
 
															+invokes the `file_test` autoupdate support. See
														
 
															+[testing/file_test/README.md](/testing/file_test/README.md) for file syntax.
														
 
															+
														
 
															+#### Reviewing test deltas
														
 
															+
														
 
															+Using `autoupdate_testdata.py` can be useful to produce deltas during the
														
 
															+development process because it allows `git status` and `git diff` to be used to
														
 
															+examine what changed.
														
 
															+
														
 
															+### Verbose output
														
 
															+
														
 
															+The `-v` flag can be passed to trace state, and should be specified before the
														
 
															+subcommand name: `carbon -v compile ...`. `CARBON_VLOG` is used to print output
														
 
															+in this mode. There is currently no control over the degree of verbosity.
														
 
															+
														
 
															+### Stack traces
														
 
															+
														
 
															+While the iterative processing pattern means function stack traces will have
														
 
															+minimal context for how the current function is reached, we use LLVM's
														
 
															+`PrettyStackTrace` to include details about the state stack. The state stack
														
 
															+will be above the function stack in crash output.
														
--- a/toolchain/docs/check.md
+++ b/toolchain/docs/check.md
@@ -0,0 +1,616 @@
 
															+# Check
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+-   [Postorder processing](#postorder-processing)
														
 
															+-   [Key IR concepts](#key-ir-concepts)
														
 
															+    -   [Parameters and arguments](#parameters-and-arguments)
														
 
															+-   [SemIR textual format](#semir-textual-format)
														
 
															+    -   [Raw form](#raw-form)
														
 
															+    -   [Formatted IR](#formatted-ir)
														
 
															+        -   [Instructions](#instructions)
														
 
															+        -   [Top-level entities](#top-level-entities)
														
 
															+-   [Core loop](#core-loop)
														
 
															+    -   [Node stack](#node-stack)
														
 
															+    -   [Delayed evaluation (not yet implemented)](#delayed-evaluation-not-yet-implemented)
														
 
															+    -   [Templates (not yet implemented)](#templates-not-yet-implemented)
														
 
															+    -   [Rewrites](#rewrites)
														
 
															+-   [Types](#types)
														
 
															+    -   [Type printing (not yet implemented)](#type-printing-not-yet-implemented)
														
 
															+-   [Expression categories](#expression-categories)
														
 
															+    -   [ExprCategory::NotExpression](#exprcategorynotexpression)
														
 
															+    -   [ExprCategory::Value](#exprcategoryvalue)
														
 
															+    -   [ExprCategory::DurableReference and ExprCategory::EphemeralReference](#exprcategorydurablereference-and-exprcategoryephemeralreference)
														
 
															+    -   [ExprCategory::Initializing](#exprcategoryinitializing)
														
 
															+    -   [ExprCategory::Mixed](#exprcategorymixed)
														
 
															+    -   [Value bindings](#value-bindings)
														
 
															+-   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
														
 
															+-   [Alternatives considered](#alternatives-considered)
														
 
															+    -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+Check takes the parse tree and generates a semantic intermediate representation,
														
 
															+or SemIR. This will look closer to a series of instructions, in preparation for
														
 
															+transformation to LLVM IR. Semantic analysis and type checking occurs during the
														
 
															+production of SemIR. It also does any validation that requires context.
														
 
															+
														
 
															+## Postorder processing
														
 
															+
														
 
															+The checking step is oriented on postorder processing on the `Parse::Tree` to
														
 
															+iterate through the `Parse::NodeImpl` vectorized storage once, in order, as much
														
 
															+as possible. This is primarily for performance, but also relies on the
														
 
															+[information accumulation principle](/docs/project/principles/information_accumulation.md):
														
 
															+that is, when that principle applies, we should be able to generate IR
														
 
															+immediately because we can rely on the principle that when a line is processed,
														
 
															+the information necessary to semantically check that line is already available.
														
 
															+
														
 
															+Indirectly, what this really means is that we should be able to go from a
														
 
															+Parse::Tree (which cannot be used for name lookups) to a SemIR with name lookups
														
 
															+completed in a single pass. The SemIR should not need to be re-processed to add
														
 
															+more information outside of templates. By doing this, we avoid an additional
														
 
															+processing pass with associated storage needs.
														
 
															+
														
 
															+This single-pass approach also means that the checking step does not make use of
														
 
															+the tree structure of the `Parse::Tree`. In cases where the actions performed
														
 
															+for a parse tree node depend on the context in which that node appears, a node
														
 
															+that is visited earlier in the postorder traversal, such as a bracketing node,
														
 
															+needs to establish the necessary context. In this respect, the sequence of
														
 
															+`Parse::Node`s can be thought of as a byte code input that the check step
														
 
															+interprets to build the `SemIR`.
														
 
															+
														
 
															+## Key IR concepts
														
 
															+
														
 
															+A `SemIR::Inst` is the basic building block that represents a simple
														
 
															+instruction, such as an operator or declaring a literal. For each kind of
														
 
															+instruction, a typedef for that specific kind of instruction is provided in the
														
 
															+`SemIR` namespace. For example, `SemIR::Assign` represents an assignment
														
 
															+instruction, and `SemIR::PointerType` represents a pointer type instruction.
														
 
															+
														
 
															+Each instruction class has up to four public data members describing the
														
 
															+instruction, as described in
														
 
															+[sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) (also see
														
 
															+[adding features for Check](adding_features.md#check)):
														
 
															+
														
 
															+-   A `Parse::Node parse_node;` member that tracks its location is present on
														
 
															+    almost all instructions, except instructions like `SemIR::Builtin` that
														
 
															+    don't have an associated location.
														
 
															+
														
 
															+-   A `SemIR::TypeId type_id;` member that describes the type of the instruction
														
 
															+    is present on all instructions that produce a value. This includes namespace
														
 
															+    instructions, which are modeled as producing a value of "namespace" type,
														
 
															+    even though they can't be used as a first-class value in Carbon expressions.
														
 
															+
														
 
															+-   Up to two additional, kind-specific members. For example `SemIR::Assign` has
														
 
															+    members `InstId lhs_id` and `InstId rhs_id`.
														
 
															+
														
 
															+Instructions are stored as type-erased `SemIR::Inst` objects, which store the
														
 
															+instruction kind and the (up to) four fields described above. This balances the
														
 
															+size of `SemIR::Inst` against the overhead of indirection.
														
 
															+
														
 
															+A `SemIR::InstBlock` can represent a code block. However, it can also be created
														
 
															+when a series of instructions needs to be closely associated, such as a
														
 
															+parameter list.
														
 
															+
														
 
															+A `SemIR::Builtin` represents a language built-in, such as the unconstrained
														
 
															+facet type `type`. We will also have built-in functions which would need to form
														
 
															+the implementation of some library types, such as `i32`. Built-ins are in a
														
 
															+stable index across `SemIR` instances.
														
 
															+
														
 
															+### Parameters and arguments
														
 
															+
														
 
															+Parameters and arguments will be stored as two `SemIR::InstBlock`s each. The
														
 
															+first will contain the full IR, while the second will contain references to the
														
 
															+last instruction for each parameter or argument. The references block will have
														
 
															+a size equal to the number of parameters or arguments, allowing for quick size
														
 
															+comparisons and indexed access.
														
 
															+
														
 
															+## SemIR textual format
														
 
															+
														
 
															+There are two textual ways to view `SemIR`.
														
 
															+
														
 
															+### Raw form
														
 
															+
														
 
															+The raw form of SemIR shows the details of the representation, such as numeric
														
 
															+instruction and block IDs. The representation is intended to very closely match
														
 
															+the `SemIR::File` and `SemIR::Inst` representations. This can be useful when
														
 
															+debugging low-level issues with the `SemIR` representation.
														
 
															+
														
 
															+The driver will print this when passed `--dump-raw-sem-ir`.
														
 
															+
														
 
															+### Formatted IR
														
 
															+
														
 
															+In addition to the raw form, there is a higher-level formatted IR that aims to
														
 
															+be human readable. This is used in most `check` tests to validate the output,
														
 
															+and also expected to be used regularly by toolchain developers to inspect the
														
 
															+result of checking the parse tree.
														
 
															+
														
 
															+The driver will print this when passed `--dump-sem-ir`.
														
 
															+
														
 
															+Unlike the raw form, certain representational choices in the `SemIR` data may
														
 
															+not be visible in this form. However, it is intended to be possible to parse the
														
 
															+`SemIR` output and form an equivalent – but not necessarily identical – `SemIR`
														
 
															+representation, although no such parser currently exists.
														
 
															+
														
 
															+As an example, given the program:
														
 
															+
														
 
															+```carbon
														
 
															+fn Cond() -> bool;
														
 
															+fn Run() -> i32 { return if Cond() then 1 else 2; }
														
 
															+```
														
 
															+
														
 
															+The formatted IR is currently:
														
 
															+
														
 
															+```
														
 
															+constants {
														
 
															+  %.1: i32 = int_literal 1 [template]
														
 
															+  %.2: i32 = int_literal 2 [template]
														
 
															+}
														
 
															+
														
 
															+file {
														
 
															+  package: <namespace> = namespace [template] {
														
 
															+    .Cond = %Cond
														
 
															+    .Run = %Run
														
 
															+  }
														
 
															+  %Cond: <function> = fn_decl @Cond [template] {
														
 
															+    %return.var.loc1: ref bool = var <return slot>
														
 
															+  }
														
 
															+  %Run: <function> = fn_decl @Run [template] {
														
 
															+    %return.var.loc2: ref i32 = var <return slot>
														
 
															+  }
														
 
															+}
														
 
															+
														
 
															+fn @Cond() -> bool;
														
 
															+
														
 
															+fn @Run() -> i32 {
														
 
															+!entry:
														
 
															+  %Cond.ref: <function> = name_ref Cond, file.%Cond [template = file.%Cond]
														
 
															+  %.loc2_33.1: init bool = call %Cond.ref()
														
 
															+  %.loc2_26.1: bool = value_of_initializer %.loc2_33.1
														
 
															+  %.loc2_33.2: bool = converted %.loc2_33.1, %.loc2_26.1
														
 
															+  if %.loc2_33.2 br !if.expr.then else br !if.expr.else
														
 
															+
														
 
															+!if.expr.then:
														
 
															+  %.loc2_41: i32 = int_literal 1 [template = constants.%.1]
														
 
															+  br !if.expr.result(%.loc2_41)
														
 
															+
														
 
															+!if.expr.else:
														
 
															+  %.loc2_48: i32 = int_literal 2 [template = constants.%.2]
														
 
															+  br !if.expr.result(%.loc2_48)
														
 
															+
														
 
															+!if.expr.result:
														
 
															+  %.loc2_26.2: i32 = block_arg !if.expr.result
														
 
															+  return %.loc2_26.2
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+There are three kinds of names in formatted IR, which are distinguished by their
														
 
															+leading sigils:
														
 
															+
														
 
															+-   `%name` denotes a value produced by an instruction. These names are
														
 
															+    introduced by a line of the form `%name: <category> <type> = <instruction>`,
														
 
															+    and are scoped to the enclosing top-level entity. `<category>` describes the
														
 
															+    [expression category](#expression-categories), which is `init` for an
														
 
															+    initializing expression, `ref` for a reference expression, or omitted for a
														
 
															+    value expression. Typically, values can only be referenced by instructions
														
 
															+    that their introduction
														
 
															+    [dominates](<https://en.wikipedia.org/wiki/Dominator_(graph_theory)>), but
														
 
															+    some kinds of instruction might have other rules. Names in the `file` block
														
 
															+    can be referenced as `file.%<name>`.
														
 
															+
														
 
															+-   `!name` denotes a label, and `!name:` appears as a prefix of each
														
 
															+    `InstBlock` in a `Function`. These names are scoped to their enclosing
														
 
															+    function, and can be referenced anywhere in that function, but not outside.
														
 
															+
														
 
															+-   `@name` denotes a top-level entity, such as a function, class, or interface.
														
 
															+    The SemIR view of these entities is flattened, so member functions are
														
 
															+    treated as top-level entities.
														
 
															+
														
 
															+Names in formatted IR are all invented by the formatter, and generally are of
														
 
															+the form `<base_name>[.loc<line>[_<col>[.<counter>]]]` where `<line>` and
														
 
															+`<col>` describe the location of the instruction, and `<counter>` is used as a
														
 
															+disambiguator if multiple instructions appear at the same location. Trailing
														
 
															+name components are only included if they are necessary to disambiguate the
														
 
															+name. `<base_name>` is a guessed good name for the instruction, often derived
														
 
															+from source-level identifiers, and is empty if no guess was made.
														
 
															+
														
 
															+#### Instructions
														
 
															+
														
 
															+There is usually one line in a `InstBlock` for each `Inst`. You can find the
														
 
															+documentation for the different kinds of instructions in
														
 
															+[toolchain/sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h). For example,
														
 
															+given a formatted SemIR line like:
														
 
															+
														
 
															+```
														
 
															+%N: i32 = assoc_const_decl N [template]
														
 
															+```
														
 
															+
														
 
															+you would look for a `struct` definition that uses `"assoc_const_decl"` as its
														
 
															+`ir_name`. In this case, this is the `AssociatedConstantDecl` instruction:
														
 
															+
														
 
															+```cpp
														
 
															+// An associated constant declaration in an interface, such as `let T:! type;`.
														
 
															+struct AssociatedConstantDecl {
														
 
															+  static constexpr auto Kind =
														
 
															+      InstKind::AssociatedConstantDecl.Define<Parse::NodeId>(
														
 
															+          {.ir_name = "assoc_const_decl", .is_lowered = false});
														
 
															+
														
 
															+  TypeId type_id;
														
 
															+  NameId name_id;
														
 
															+};
														
 
															+```
														
 
															+
														
 
															+Since this instruction produces a value, it has a `TypeId type_id` field, which
														
 
															+corresponds to the type written between the `:` and the `=`. In the example
														
 
															+above, that type is `i32`. The other arguments to the instruction are written
														
 
															+after the `ir_name` -- in this example the `name_id` is `N`. From this we find
														
 
															+that the instruction corresponds to an associated constant declaration in an
														
 
															+interface like `let N:! i32;`.
														
 
															+
														
 
															+Instructions producing a constant value, like `assoc_const_decl` above, are
														
 
															+followed by their phase, either `[symbolic]` or `[template]`, and then `=` the
														
 
															+value if it is the value of a different instruction.
														
 
															+
														
 
															+Instructions that do not produce a value, such as the `br` and `return`
														
 
															+instructions above, omit the leading `%name: ... =` prefix, as they cannot be
														
 
															+named by other instructions. These instructions do not have a `TypeId type_id`
														
 
															+field, like the `AdaptDecl` instruction:
														
 
															+
														
 
															+```cpp
														
 
															+// An adapted type declaration in a class, of the form `adapt T;`.
														
 
															+struct AdaptDecl {
														
 
															+  static constexpr auto Kind = InstKind::AdaptDecl.Define<Parse::AdaptDeclId>(
														
 
															+      {.ir_name = "adapt_decl", .is_lowered = false});
														
 
															+
														
 
															+  // No type_id; this is not a value.
														
 
															+  TypeId adapted_type_id;
														
 
															+};
														
 
															+```
														
 
															+
														
 
															+An `adapt SomeClass;` declaration would have the corresponding SemIR formatted
														
 
															+as:
														
 
															+
														
 
															+```
														
 
															+adapt_decl %SomeClass
														
 
															+```
														
 
															+
														
 
															+Some instructions have special argument handling. For example, some invalid
														
 
															+arguments will be omitted. Or an `InstBlockId` argument will be rendered inline,
														
 
															+commonly enclosed in braces `{`...`}` or parens `(`...`)`. In other cases, the
														
 
															+formatter will combine instructions together to make the IR more readable:
														
 
															+
														
 
															+-   A terminator sequence in a block, comprising a sequence of `BranchIf`
														
 
															+    instructions followed by a `Branch` or `BranchWithArg` instruction, is
														
 
															+    collapsed into a single
														
 
															+    `if %cond br !label1 else if ... else br !labelN(%arg)` line.
														
 
															+-   A struct type, formed by a sequence of `StructTypeField` instructions
														
 
															+    followed by a `StructType` instruction, is collapsed into a single
														
 
															+    `struct_type{.field1: %value1, ..., .fieldN: %valueN}` line.
														
 
															+
														
 
															+These exceptions may be found in
														
 
															+[toolchain/sem_ir/formatter.cpp](/toolchain/sem_ir/formatter.cpp).
														
 
															+
														
 
															+#### Top-level entities
														
 
															+
														
 
															+**Question:** Are these too in flux to document at this time?
														
 
															+
														
 
															+-   `constants`: TODO
														
 
															+-   `imports`: TODO
														
 
															+-   `file`: TODO
														
 
															+-   entities
														
 
															+    -   TODO: may be preceded by `extern`.
														
 
															+    -   TODO: may be preceded by `generic`.
														
 
															+        -   These may have an optional `!definition:` section containing the
														
 
															+            generic's `definition_block_id`.
														
 
															+    -   `fn`: TODO; followed by `= "`...`"` for builtins
														
 
															+    -   `class`: TODO
														
 
															+    -   `interface`: TODO
														
 
															+    -   `impl`: TODO
														
 
															+-   `specific`: TODO
														
 
															+    -   body in braces `{`...`}` has a bunch of
														
 
															+        ``<generic parameter> => <specific value>` assignment lines
														
 
															+    -   The first lines of the body describe the declaration
														
 
															+    -   If there is a valid definition, there are additional definition
														
 
															+        assignments after a `!definition:` line.
														
 
															+
														
 
															+## Core loop
														
 
															+
														
 
															+The core loop is `Check::CheckParseTree`. This loops through the `Parse::Tree`
														
 
															+and calls a `Handle`... function corresponding to the `NodeKind` of each node.
														
 
															+Communication between these functions for different nodes working together is
														
 
															+through the `Context` object defined in
														
 
															+[check/context.h](/toolchain/check/context.h), which stores things in a
														
 
															+collection of stacks. The common pattern is that the children of a node are
														
 
															+processed first. They produce information that is then consumed when processing
														
 
															+the parent node.
														
 
															+
														
 
															+One example of this pattern is expressions. Each subexpression outputs SemIR
														
 
															+instructions to compute the value of that subexpression to the current
														
 
															+instruction block, added to the top of the `InstBlockStack` stored in the
														
 
															+`Context` object. It leaves an instruction id on the top of the
														
 
															+[node stack](#node-stack) pointing to the instruction that produces the value of
														
 
															+that subexpression. Those are consumed by parent operations, like an
														
 
															+[RPN](https://en.wikipedia.org/wiki/Reverse_Polish_notation) calculator. For
														
 
															+example, the expression `1 * 2 + 3` corresponds to this parse tree:
														
 
															+
														
 
															+```yaml
														
 
															+    {kind: 'IntegerLiteral', text: '1'},
														
 
															+    {kind: 'IntegerLiteral', text: '2'},
														
 
															+  {kind: 'InfixOperator', text: '*', subtree_size: 3},
														
 
															+  {kind: 'IntegerLiteral', text: '3'},
														
 
															+{kind: 'InfixOperator', text: '+', subtree_size: 5},
														
 
															+```
														
 
															+
														
 
															+This parse tree is processed by one call to a `Handle` function per node:
														
 
															+
														
 
															+-   The first node is an integer literal, so the core loop calls
														
 
															+    `HandleIntegerLiteral`.
														
 
															+
														
 
															+    -   It calls `context::AddInstAndPush` to output a `SemIR::IntegerLiteral`
														
 
															+        instruction to the current instruction block, and pushes the parse node
														
 
															+        along with the instruction id to the [node stack](#node-stack).
														
 
															+
														
 
															+-   The second node is also an integer literal, which outputs a second
														
 
															+    instruction and pushes another entry onto the node stack.
														
 
															+
														
 
															+-   `HandleInfixOperator` pops the two entries off of the node stack, outputs
														
 
															+    any conversion instructions that are needed, and uses
														
 
															+    `context::AddInstAndPush` to create and push the instruction id representing
														
 
															+    the output of a multiplication instruction. That multiplication instruction
														
 
															+    takes the instruction ids it popped off the stack at the beginning as
														
 
															+    arguments.
														
 
															+
														
 
															+-   Another integer literal instruction is created for `3` and pushed onto the
														
 
															+    stack.
														
 
															+
														
 
															+-   `HandleInfixOperator` is called again. It pops the two instruction ids off
														
 
															+    the stack to use as the arguments to the multiplication instruction it
														
 
															+    creates and pushes.
														
 
															+
														
 
															+In this way, the handle functions coordinate producing their output using the
														
 
															+instruction block stack and node block stack from the context.
														
 
															+
														
 
															+A similar pattern uses bracketing nodes to support parent nodes that can have a
														
 
															+variable number of children. For example, a `return` statement can produce parse
														
 
															+trees following a few different patterns:
														
 
															+
														
 
															+-   `return;`
														
 
															+
														
 
															+    ```yaml
														
 
															+      {kind: 'ReturnStatementStart', text: 'return'},
														
 
															+    {kind: 'ReturnStatement', text: ';', subtree_size: 2},
														
 
															+    ```
														
 
															+
														
 
															+-   `return x;`
														
 
															+
														
 
															+    ```yaml
														
 
															+      {kind: 'ReturnStatementStart', text: 'return'},
														
 
															+      {kind: 'NameExpr', text: 'x'},
														
 
															+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
														
 
															+    ```
														
 
															+
														
 
															+-   `return var;`
														
 
															+
														
 
															+    ```yaml
														
 
															+      {kind: 'ReturnStatementStart', text: 'return'},
														
 
															+      {kind: 'ReturnVarModifier', text: 'var'},
														
 
															+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
														
 
															+    ```
														
 
															+
														
 
															+In all three cases, the introducer node `ReturnStatementStart` pushes an entry
														
 
															+on the [node stack](#node-stack) with just the parse node and no id, called a
														
 
															+_solo parse node_. The handler for the parent `ReturnStatement` node can pop and
														
 
															+process entries from the node stack until it finds that solo parse node from
														
 
															+`ReturnStatementStart` that indicates it is done.
														
 
															+
														
 
															+Another pattern that arises is state is set up by an introducer node, updated by
														
 
															+its siblings, and then consumed by the bracketing parent node. FIXME: example
														
 
															+
														
 
															+### Node stack
														
 
															+
														
 
															+The node stack, defined in [check/node_stack.h](/toolchain/check/node_stack.h),
														
 
															+stores pairs of a `Parse::Node` and an id. The type of the id is determined by
														
 
															+the `NodeKind` of the parse node. It is the default, general-purpose stack used
														
 
															+by `Handle`... functions in the check stage. Using a single stack is beneficial
														
 
															+since it improves locality of reference and reduces allocations. However,
														
 
															+additional stacks are used to ensure we never need to search through the stack
														
 
															+to find data -- we always want to be operating on the top of the stack (or a
														
 
															+fixed offset).
														
 
															+
														
 
															+The node stack contains any state pushed by siblings of the current
														
 
															+`Parse::Node` at the top, and state pushed by siblings of ancestors below. The
														
 
															+boundaries between what is a sibling of the current `Parse::Node` versus what is
														
 
															+a sibling of an ancestor are not explicitly determined. Instead, the handler for
														
 
															+the parent node knows how many nodes it must pop from the stack based either on
														
 
															+knowing the fixed number of children for that node kind or popping nodes until
														
 
															+it reaches a bracketing node. The arity or bracketing node kind for each parent
														
 
															+node is documented in [parse/node_kind.def](/toolchain/parse/node_kind.def).
														
 
															+
														
 
															+When each `Parse::Node` is evaluated, the SemIR for it is typically immediately
														
 
															+generated as `SemIR::Inst`s. To help generate the IR to an appropriate context,
														
 
															+scopes have separate `SemIR::InstBlock`s.
														
 
															+
														
 
															+### Delayed evaluation (not yet implemented)
														
 
															+
														
 
															+Sometimes, nodes will need to have delayed evaluation; for example, an inline
														
 
															+definition of a class member function needs to be evaluated after the class is
														
 
															+fully declared. The `SemIR::Inst`s cannot be immediately generated because they
														
 
															+may include name references to the class. We're likely to store a reference to
														
 
															+the relevant `Parse::Node` for each definition for re-evaluation after the class
														
 
															+scope completes. This means that nodes in a definition would be traversed twice,
														
 
															+once while determining that they're inline and without full checking or IR
														
 
															+generation, then again with full checking and IR generation.
														
 
															+
														
 
															+### Templates (not yet implemented)
														
 
															+
														
 
															+Templates need to have partial semantic checking when declared, but can't be
														
 
															+fully implemented before they're instantiated against a specific type.
														
 
															+
														
 
															+We are likely to generate a partial IR for templates, allowing for checking with
														
 
															+the incomplete information in the IR. Instantiation will likely use that IR and
														
 
															+fill in the missing information, but it could also reevaluate the original
														
 
															+`Parse::Node`s with the known template state.
														
 
															+
														
 
															+### Rewrites
														
 
															+
														
 
															+Carbon relies on rewrites of code, such as rewriting the destination of an
														
 
															+initializer to a specific target object once that object is known.
														
 
															+
														
 
															+We have two ways to achieve this. One is to track the IR location of a
														
 
															+placeholder instruction and, if it needs updating, replace it with a "rewrite"
														
 
															+`SemIR::Inst` that points to a new `SemIR::InstBlock` containing the required IR
														
 
															+and specifying which value is the result of that rewrite. This is expressed in
														
 
															+SemIR as a `splice_block` instruction. Another is to track the list of
														
 
															+instructions to be created separately from the node block stack, and merge those
														
 
															+instructions into the current block once we have decided on their contents.
														
 
															+
														
 
															+## Types
														
 
															+
														
 
															+Type expressions are treated like any other expression, and are modeled as
														
 
															+`SemIR::Inst`s. The types computed by type expressions are deduplicated,
														
 
															+resulting in a canonical `SemIR::TypeId` for each distinct type.
														
 
															+
														
 
															+### Type printing (not yet implemented)
														
 
															+
														
 
															+The `TypeId` preserves only the identity of the type, not its spelling, and so
														
 
															+printing it will produce a fully-resolved type name, which isn't a great user
														
 
															+experience as it doesn't reflect how the type was written in the source code.
														
 
															+
														
 
															+Instead, when printing a type name for use in a diagnostic, we will start with
														
 
															+one of two `InstId`s:
														
 
															+
														
 
															+-   A `InstId` for a type expression that describes the way the type was
														
 
															+    computed.
														
 
															+-   A `InstId` for an expression that has the given type.
														
 
															+
														
 
															+In the former case, the type is pretty-printed by walking the type expression
														
 
															+and printing it. In the latter case, the type of the expression is reconstructed
														
 
															+based on the form of the expression: for example, to print the type of `&x`, we
														
 
															+print the type of `x` and append a `*`, being careful to take potential
														
 
															+precedence issues into account.
														
 
															+
														
 
															+TODO: This requires being able to print the type of, for example,
														
 
															+`x.foo[0].bar`, by printing only the desired portion of the type of `x`, and
														
 
															+similarly may require handling the case where the type of an expression involves
														
 
															+generic parameters whose arguments are specified by that expression. In effect,
														
 
															+the type computation performed when checking an operation is duplicated into the
														
 
															+type printing logic, but is simpler because errors don't need to be detected.
														
 
															+
														
 
															+This approach means we don't need to preserve a fully-sugared type for each
														
 
															+expression instruction. Instead, we compute that type when we need to print it.
														
 
															+
														
 
															+## Expression categories
														
 
															+
														
 
															+Each `SemIR::Inst` that has an associated type also has an expression category,
														
 
															+which describes how it produces a value of that type. These
														
 
															+`SemIR::ExprCategory` values correspond to the Carbon expression categories
														
 
															+defined in proposal
														
 
															+[#2006](https://github.com/carbon-language/carbon-lang/pull/2006):
														
 
															+
														
 
															+### ExprCategory::NotExpression
														
 
															+
														
 
															+This instruction is not an expression instruction, and doesn't have an
														
 
															+expression category. This is used for namespaces, control flow instructions, and
														
 
															+other constructs that represent some non-expression-level semantics.
														
 
															+
														
 
															+### ExprCategory::Value
														
 
															+
														
 
															+This instruction produces a value using the type's value representation.
														
 
															+Lowering the instruction will produce an LLVM value using that value
														
 
															+representation.
														
 
															+
														
 
															+### ExprCategory::DurableReference and ExprCategory::EphemeralReference
														
 
															+
														
 
															+This instruction produces a reference to an object. Lowering will produce a
														
 
															+pointer to an object representation.
														
 
															+
														
 
															+### ExprCategory::Initializing
														
 
															+
														
 
															+This instruction represents the initialization of an object. Depending on the
														
 
															+initializing representation for the type, the initializing expression
														
 
															+instruction will do one of the following:
														
 
															+
														
 
															+-   For an in-place initializing representation, the instruction will store a
														
 
															+    value to the target of the initialization.
														
 
															+
														
 
															+-   For a by-copy initializing representation, the instruction will produce an
														
 
															+    object representation by value that can be stored into the target. This is
														
 
															+    currently only used in cases where the object representation and the value
														
 
															+    representation are the same.
														
 
															+
														
 
															+-   For a type with no initializing representation, such as an empty struct or
														
 
															+    tuple, it does neither of the above things.
														
 
															+
														
 
															+Regardless of the initializing representation, an initializing expression should
														
 
															+be consumed by another instruction that finishes the initialization. For a
														
 
															+by-copy initialization, this final instruction represents the store into the
														
 
															+target, whereas in the other cases it is only used to track in SemIR how the
														
 
															+initialization was used. When an in-place initializer uses a by-copy initializer
														
 
															+as a subexpression, an `initialize_from` instruction is inserted to perform this
														
 
															+final store.
														
 
															+
														
 
															+### ExprCategory::Mixed
														
 
															+
														
 
															+This instruction represents a language construct that doesn't have a single
														
 
															+expression category. This is used for struct and tuple literals, where the
														
 
															+elements of the literal can have different expression categories. Instructions
														
 
															+with a mixed expression category are treated as a special case in conversion,
														
 
															+which recurses into the elements of those instructions before performing
														
 
															+conversions.
														
 
															+
														
 
															+### Value bindings
														
 
															+
														
 
															+A value binding represents a conversion from a reference expression to the value
														
 
															+stored in that expression. There are three important cases here:
														
 
															+
														
 
															+-   For types with a by-copy value representation, such as `i32`, a value
														
 
															+    binding represents a load from the address indicated by the reference
														
 
															+    expression.
														
 
															+
														
 
															+-   For types with a by-pointer value representation, such as arrays and large
														
 
															+    structs and tuples, a value binding implicitly takes the address of the
														
 
															+    reference expression.
														
 
															+
														
 
															+-   For structs and tuples, the value representation is a struct or tuple of the
														
 
															+    elements' value representations, which is not necessarily the same as a
														
 
															+    struct or tuple of the elements' object representations. In the case where
														
 
															+    the value representation is not a copy of, or pointer to, the object
														
 
															+    representation, `value_binding` instructions are not used, and a
														
 
															+    `tuple_value` or `struct_value` instruction is used to construct a value
														
 
															+    representation instead. `value_binding` should still be used in the case
														
 
															+    where the value and object representation are the same, but this is not yet
														
 
															+    implemented.
														
 
															+
														
 
															+## Handling Parse::Tree errors (not yet implemented)
														
 
															+
														
 
															+`Parse::Tree` errors will typically indicate that checking would error for a
														
 
															+given context. We'll want to be careful about how this is handled, but we'll
														
 
															+likely want to generate diagnostics for valid child nodes, then reduce
														
 
															+diagnostics once invalid nodes are encountered. We should be able to reasonably
														
 
															+abandon generated IR of the valid children when we encounter an invalid parent,
														
 
															+without severe effects on surrounding checks.
														
 
															+
														
 
															+For example, an invalid line of code in a function might generate some
														
 
															+incomplete IR in the function's `SemIR::InstBlock`, but that IR won't negatively
														
 
															+interfere with checking later valid lines in the same function.
														
 
															+
														
 
															+## Alternatives considered
														
 
															+
														
 
															+### Using a traditional AST representation
														
 
															+
														
 
															+Clang creates an AST as part of compilation. In Carbon, it's something we could
														
 
															+do as a step between parsing and checking, possibly replacing the SemIR. It's
														
 
															+likely that doing so would be simpler, amongst other possible trade-offs.
														
 
															+However, we think the SemIR approach is going to yield higher performance,
														
 
															+enough so that it's the chosen approach.
														
--- a/toolchain/docs/check.svg
+++ b/toolchain/docs/check.svg
--- a/toolchain/docs/diagnostics.md
+++ b/toolchain/docs/diagnostics.md
@@ -0,0 +1,230 @@
 
															+# Diagnostics
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+-   [DiagnosticEmitter](#diagnosticemitter)
														
 
															+-   [DiagnosticConsumers](#diagnosticconsumers)
														
 
															+-   [Producing diagnostics](#producing-diagnostics)
														
 
															+-   [Diagnostic registry](#diagnostic-registry)
														
 
															+-   [CARBON_DIAGNOSTIC placement](#carbon_diagnostic-placement)
														
 
															+-   [Diagnostic context](#diagnostic-context)
														
 
															+-   [Diagnostic parameter types](#diagnostic-parameter-types)
														
 
															+-   [Diagnostic message style guide](#diagnostic-message-style-guide)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+The diagnostic code is used by the toolchain to produce output.
														
 
															+
														
 
															+## DiagnosticEmitter
														
 
															+
														
 
															+[DiagnosticEmitters](/toolchain/diagnostics/diagnostic_emitter.h) handle the
														
 
															+main formatting of a message. It's parameterized on a location type, for which a
														
 
															+DiagnosticLocationTranslator must be provided that can translate the location
														
 
															+type into a standardized DiagnosticLocation of file, line, and column.
														
 
															+
														
 
															+When emitting, the resulting formatted message is passed to a
														
 
															+DiagnosticConsumer.
														
 
															+
														
 
															+## DiagnosticConsumers
														
 
															+
														
 
															+DiagnosticConsumers handle output of diagnostic messages after they've been
														
 
															+formatted by an Emitter. Important consumers are:
														
 
															+
														
 
															+-   [ConsoleDiagnosticConsumer](/toolchain/diagnostics/diagnostic_emitter.h):
														
 
															+    prints diagnostics to console.
														
 
															+
														
 
															+-   [ErrorTrackingDiagnosticConsumer](/toolchain/diagnostics/diagnostic_emitter.h):
														
 
															+    counts the number of errors produced, particularly so that it can be
														
 
															+    determined whether any errors were encountered.
														
 
															+
														
 
															+-   [SortingDiagnosticConsumer](/toolchain/diagnostics/sorting_diagnostic_consumer.h):
														
 
															+    sorts diagnostics by line so that diagnostics are seen in terminal based on
														
 
															+    their order in the file rather than the order they were produced.
														
 
															+
														
 
															+-   [NullDiagnosticConsumer](/toolchain/diagnostics/null_diagnostics.h):
														
 
															+    suppresses diagnostics, particularly for tests.
														
 
															+
														
 
															+Note that `SortingDiagnosticConsumer` is used by default by `carbon compile`. In
														
 
															+cases where one error leads to another error at an earlier location, for example
														
 
															+if an error in a function call argument leads to an error in the function call,
														
 
															+this can result in confusing diagnostic output where a consequence of the error
														
 
															+is reported before the cause. Usually this should be handled by tracking that an
														
 
															+error occurred and suppressing the follow-on diagnostic. During toolchain
														
 
															+development, it can be useful to disable the sorting so that the diagnostic
														
 
															+order matches the order in which the file was processed. This can be done using
														
 
															+`carbon compile –stream-errors`.
														
 
															+
														
 
															+## Producing diagnostics
														
 
															+
														
 
															+Diagnostics are used to surface issues from compilation. A simple diagnostic
														
 
															+looks like:
														
 
															+
														
 
															+```cpp
														
 
															+CARBON_DIAGNOSTIC(InvalidCode, Error, "Code is invalid");
														
 
															+emitter.Emit(location, InvalidCode);
														
 
															+```
														
 
															+
														
 
															+Here, `CARBON_DIAGNOSTIC` defines a static instance of a diagnostic named
														
 
															+`InvalidCode` with the associated severity (`Error` or `Warning`).
														
 
															+
														
 
															+The `Emit` call produces a single instance of the diagnostic. When emitted,
														
 
															+`"Code is invalid"` will be the message used. The type of `location` depends on
														
 
															+the `DiagnosticEmitter`.
														
 
															+
														
 
															+A diagnostic with an argument looks like:
														
 
															+
														
 
															+```cpp
														
 
															+CARBON_DIAGNOSTIC(InvalidCharacter, Error, "Invalid character {0}.", char);
														
 
															+emitter.Emit(location, InvalidCharacter, invalid_char);
														
 
															+```
														
 
															+
														
 
															+Here, the additional `char` argument to `CARBON_DIAGNOSTIC` specifies the type
														
 
															+of an argument to expect for message formatting. The `invalid_char` argument to
														
 
															+`Emit` provides the matching value. It's then passed along with the diagnostic
														
 
															+message format to `llvm::formatv` to produce the final diagnostic message.
														
 
															+
														
 
															+## Diagnostic registry
														
 
															+
														
 
															+There is a [registry](/toolchain/diagnostics/diagnostic_kind.def) which all
														
 
															+diagnostics must be added to. Each diagnostic has a line like:
														
 
															+
														
 
															+```cpp
														
 
															+CARBON_DIAGNOSTIC_KIND(InvalidCode)
														
 
															+```
														
 
															+
														
 
															+This produces a central enumeration of all diagnostics. The eventual intent is
														
 
															+to require tests for every diagnostic that can be produced, but that isn't
														
 
															+currently implemented.
														
 
															+
														
 
															+## CARBON_DIAGNOSTIC placement
														
 
															+
														
 
															+Idiomatically, `CARBON_DIAGNOSTIC` will be adjacent to the `Emit` call. However,
														
 
															+this is only because many diagnostics can only be produced in one code location.
														
 
															+If they can be produced in multiple locations, they will be at a higher scope so
														
 
															+that multiple `Emit` calls can reference them. When in a function,
														
 
															+`CARBON_DIAGNOSTIC` should be placed as close as possible to the usage so that
														
 
															+it's easier to see the associated output.
														
 
															+
														
 
															+## Diagnostic context
														
 
															+
														
 
															+Diagnostics can provide additional context for errors by attaching notes, which
														
 
															+have their own location information. A diagnostic with a note looks like:
														
 
															+
														
 
															+```cpp
														
 
															+CARBON_DIAGNOSTIC(CallArgCountMismatch, Error,
														
 
															+                  "{0} argument(s) passed to function expecting "
														
 
															+                  "{1} argument(s).",
														
 
															+                  int, int);
														
 
															+CARBON_DIAGNOSTIC(InCallToFunction, Note,
														
 
															+                  "Calling function declared here.");
														
 
															+context.emitter()
														
 
															+    .Build(call_parse_node, CallArgCountMismatch, arg_refs.size(),
														
 
															+           param_refs.size())
														
 
															+    .Note(param_parse_node, InCallToFunction)
														
 
															+    .Emit();
														
 
															+```
														
 
															+
														
 
															+The error and the note are registered as two separate diagnostics, but a single
														
 
															+overall diagnostic object is built and emitted, so that the error and the note
														
 
															+can be treated as a single unit.
														
 
															+
														
 
															+Diagnostic context information can also be registered in a scope, so that all
														
 
															+diagnostics produced in that scope attach a specific note. For example:
														
 
															+
														
 
															+```cpp
														
 
															+DiagnosticAnnotationScope annotate_diagnostics(
														
 
															+    &context.emitter(), [&](auto& builder) {
														
 
															+      CARBON_DIAGNOSTIC(
														
 
															+          InCallToFunctionParam, Note,
														
 
															+          "Initializing parameter {0} of function declared here.", int);
														
 
															+      builder.Note(param_parse_node, InCallToFunctionParam,
														
 
															+                   diag_param_index + 1);
														
 
															+    });
														
 
															+```
														
 
															+
														
 
															+This is useful when delegating to another part of Check that may produce many
														
 
															+different kinds of diagnostic.
														
 
															+
														
 
															+## Diagnostic parameter types
														
 
															+
														
 
															+Here are some types you might consider for the parameters to a diagnostic:
														
 
															+
														
 
															+-   `llvm::StringLiteral`. Note that we don't use `llvm::StringRef` to avoid
														
 
															+    lifetime issues.
														
 
															+-   `std::string`
														
 
															+-   Carbon types `T` that implement `llvm::format_provider<T>` like:
														
 
															+    -   `Lex::TokenKind`
														
 
															+    -   `Lex::NumericLiteral::Radix`
														
 
															+    -   `Parse::RelativeLocation`
														
 
															+-   integer types: `int`, `uint64_t`, `int64_t`, `size_t`
														
 
															+-   `char`
														
 
															+-   Other
														
 
															+    [types supported by llvm::formatv](https://llvm.org/doxygen/FormatVariadic_8h_source.html)
														
 
															+
														
 
															+## Diagnostic message style guide
														
 
															+
														
 
															+In order to provide a consistent experience, Carbon diagnostics should be
														
 
															+written in the following style:
														
 
															+
														
 
															+-   Start diagnostics with a capital letter or quoted code, and end them with a
														
 
															+    period.
														
 
															+
														
 
															+-   Quoted code should be enclosed in backticks, for example:
														
 
															+    ``"`{0}` is bad."``
														
 
															+
														
 
															+-   Phrase diagnostics as bullet points rather than full sentences. Leave out
														
 
															+    articles unless they're necessary for clarity.
														
 
															+
														
 
															+-   Diagnostics should describe the situation the toolchain observed and the
														
 
															+    language rule that was violated, although either can be omitted if it's
														
 
															+    clear from the other. For example:
														
 
															+
														
 
															+    -   `"Redeclaration of X."` describes the situation and implies that
														
 
															+        redeclarations are not permitted.
														
 
															+
														
 
															+    -   ``"`self` can only be declared in an implicit parameter list."``
														
 
															+        describes the language rule and implies that you declared `self`
														
 
															+        somewhere else.
														
 
															+
														
 
															+    -   It's OK for a diagnostic to guess at the developer's intent and provide
														
 
															+        a hint after explaining the situation and the rule, but not as a
														
 
															+        substitute for that. For example,
														
 
															+        ``"Add an `as String` cast to format this integer as a string."`` is not
														
 
															+        sufficient as an error message, but
														
 
															+        ``"Cannot add i32 to String. Add an `as String` cast to format this integer as a string."``
														
 
															+        could be acceptable.
														
 
															+
														
 
															+-   TODO: Should diagnostics be atemporal and non-sequential ("multiple
														
 
															+    declarations of X", "additional declaration here"), present tense but
														
 
															+    sequential ("redeclaration of X", "previous declaration is here"), or
														
 
															+    temporal ("redeclaration of X", "previous declaration was here")? We could
														
 
															+    try to sidestep difference between the latter two by avoiding verbs with
														
 
															+    tense ("previously declared here", "Y declared here", with no is/was).
														
 
															+
														
 
															+-   TODO: Word choices:
														
 
															+
														
 
															+    -   For disallowed constructs, do we say they're not permitted / not allowed
														
 
															+        / not valid / not legal / illegal / ill-formed / disallowed? Do we say
														
 
															+        "X cannot be Y" or "X may not be Y" or "X must not be Y" or "X shall not
														
 
															+        be Y"?
														
 
															+
														
 
															+-   TODO: Is structuring diagnostics such that inputs can be parsed without
														
 
															+    string parsing important? that is, when is passing strings in as part of the
														
 
															+    message templating okay?
														
 
															+
														
 
															+-   TODO: When do we put identifiers or expressions in diagnostics, versus
														
 
															+    requiring notes pointing at relevant code? Is it only avoided for values, or
														
 
															+    only allowed for types?
														
 
															+
														
 
															+-   TODO: Lots more things to decide, give examples.
														
--- a/toolchain/docs/driver.md
+++ b/toolchain/docs/driver.md
@@ -0,0 +1,22 @@
 
															+# Driver
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+The driver provides commands and ties together the toolchain's flow. Running a
														
 
															+command such as `carbon compile --phase=lower <file>` will run through the flow
														
 
															+and print output. Several dump flags, such as `--dump-parse-tree`, print output
														
 
															+in YAML format for easier parsing.
														
--- a/toolchain/docs/idioms.md
+++ b/toolchain/docs/idioms.md
@@ -0,0 +1,424 @@
 
															+# Idioms
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+-   [C++ dialect](#c-dialect)
														
 
															+-   [Abbreviations used in the code (AKA Carbon abbreviation decoder ring)](#abbreviations-used-in-the-code-aka-carbon-abbreviation-decoder-ring)
														
 
															+-   [`.def` files](#def-files)
														
 
															+    -   [EnumBase types](#enumbase-types)
														
 
															+-   [Index types](#index-types)
														
 
															+-   [ValueStore](#valuestore)
														
 
															+-   [Template metaprogramming](#template-metaprogramming)
														
 
															+    -   [Struct reflection](#struct-reflection)
														
 
															+    -   [Field detection](#field-detection)
														
 
															+-   [Local lambdas to reduce duplicate code](#local-lambdas-to-reduce-duplicate-code)
														
 
															+-   [Immediately invoked function expressions (IIFE)](#immediately-invoked-function-expressions-iife)
														
 
															+-   [Declarations in conditions](#declarations-in-conditions)
														
 
															+-   [CRTP or "Curiously recurring template pattern"](#crtp-or-curiously-recurring-template-pattern)
														
 
															+-   [Multiple inheritance](#multiple-inheritance)
														
 
															+-   [Defining constants usable in constexpr contexts](#defining-constants-usable-in-constexpr-contexts)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+The toolchain implementation uses some implementation techniques that may not be
														
 
															+commonly found in typical C++ code.
														
 
															+
														
 
															+## C++ dialect
														
 
															+
														
 
															+The toolchain implementation does not use some C++ features, following
														
 
															+[Google's C++ style guide](https://google.github.io/styleguide/cppguide.html):
														
 
															+
														
 
															+-   [Exceptions](https://google.github.io/styleguide/cppguide.html#Exceptions)
														
 
															+-   [Virtual base classes](https://google.github.io/styleguide/cppguide.html#Inheritance)
														
 
															+-   [RTTI](https://google.github.io/styleguide/cppguide.html#Run-Time_Type_Information__RTTI_)
														
 
															+
														
 
															+## Abbreviations used in the code (AKA Carbon abbreviation decoder ring)
														
 
															+
														
 
															+Note that abbreviations are typically only used in code, not comments (except
														
 
															+when referring to an entity from the code).
														
 
															+
														
 
															+-   **Addr**: "address"
														
 
															+-   **Arg**: "argument"
														
 
															+-   **Decl**: "declaration"
														
 
															+-   **Expr**: "expression"
														
 
															+    -   **SubExpr**: "subexpression"
														
 
															+-   **Float**: "floating point"
														
 
															+-   **Init**: "initialization"
														
 
															+-   **Inst**: "instruction"
														
 
															+-   **Int**: "integer"
														
 
															+-   **Loc**: "location"
														
 
															+-   **Param**: "parameter"
														
 
															+-   **Paren**: "parenthesis"
														
 
															+-   **Ref**: "reference"
														
 
															+    -   **Deref**: "dereference"
														
 
															+-   **Subst**: "substitute"
														
 
															+
														
 
															+Phrase abbreviations (where we have an abbreviation for a phrase, where we
														
 
															+wouldn't perform all of the abbreviations of those words individually):
														
 
															+
														
 
															+-   **InitRepr**: "initializing representation"
														
 
															+-   **ObjectRepr**: "object representation"
														
 
															+-   **SemIR**: "semantics intermediate representation"
														
 
															+-   **ValueRepr**: "value representation"
														
 
															+
														
 
															+## `.def` files
														
 
															+
														
 
															+The Carbon toolchain uses a technique related to
														
 
															+[X-macros](https://en.wikipedia.org/wiki/X_macro) to generate code that operates
														
 
															+over a collection of types, enumerators, or another similar list of names. This
														
 
															+works as follows:
														
 
															+
														
 
															+-   A `.def` file is provided, that is intended to be repeatedly included by way
														
 
															+    of `#include`.
														
 
															+-   The user of the `.def` defines a macro, with a name and a form specified by
														
 
															+    the `.def` file, for example
														
 
															+    `#define CARBON_EACH_WIDGET(Name) Scope::Name,`.
														
 
															+-   A `#include` of the `.def` file expands to `CARBON_EACH_WIDGET(Name1)`,
														
 
															+    `CARBON_EACH_WIDGET(Name2)`, ... for each widget name, and then `#undef`s
														
 
															+    the `CARBON_EACH_WIDGET` macro.
														
 
															+
														
 
															+For example:
														
 
															+
														
 
															+```cpp
														
 
															+enum Widgets {
														
 
															+#define CARBON_EACH_WIDGET(Name) Name,
														
 
															+#include "widgets.def"
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+... would expand to an enumeration definition with one enumerator per widget
														
 
															+name.
														
 
															+
														
 
															+### EnumBase types
														
 
															+
														
 
															+Most `.def` files will have a corresponding [EnumBase](/common/enum_base.h)
														
 
															+child class (if `widgets.def` has X-macros, `widgets.h` and `widgets.cpp` has
														
 
															+the `EnumBase` child class). These work similarly to an `enum class`, with the
														
 
															+addition of a `name()` function and `<<` stream operator support. Many also have
														
 
															+further utility functions for information related to the enum value.
														
 
															+
														
 
															+In code, these types and values can be used directly in a `switch`. They will
														
 
															+convert to an internal _actual_ `enum class` for the `switch`, and receive
														
 
															+corresponding compiler safety checks that all enum values are handled.
														
 
															+
														
 
															+## Index types
														
 
															+
														
 
															+Carbon makes frequent use of
														
 
															+[IndexBase and IdBase](/toolchain/base/index_base.h). The `IndexBase` and
														
 
															+`IdBase` types are small wrappers around `int32_t` to provide a measure of
														
 
															+type-checking when passing around indices to vector-like storage types. The only
														
 
															+difference is that `IndexBase` supports all comparison operators, whereas
														
 
															+`IdBase` only supports equality comparison.
														
 
															+
														
 
															+Variable naming will often have `_id` at the end to indicate that it corresponds
														
 
															+to an `IdBase`. This may include the full type, as in `operand_inst_id` being an
														
 
															+`InstId` for an operand.
														
 
															+
														
 
															+A block is an array of ids. These will be indicated with either a `_block`
														
 
															+suffix or pluralization (for example, `param_refs` pluralizing `refs`).
														
 
															+
														
 
															+The `ref` concept in a name means that there is an underlying instruction block,
														
 
															+but only a subset of instructions are present in the `refs` block. For example,
														
 
															+function parameters have a sequence, and also have a `refs` block with one entry
														
 
															+per parameter. The `refs` block allows parameters to be counted and accessed
														
 
															+directly, rather than through vector iteration.
														
 
															+
														
 
															+## ValueStore
														
 
															+
														
 
															+Many of Carbon's data types are stored in a
														
 
															+[ValueStore](/toolchain/base/value_store.h) or related type with similar
														
 
															+semantics (`sem_ir` has [several such classes](/toolchain/base/value_store.h)).
														
 
															+`ValueStore` links an indexing type to a value type with vector-like storage.
														
 
															+The indices typically use `IdBase`.
														
 
															+
														
 
															+`ValueStore`s APIs follow the shape of simple array access and mutation:
														
 
															+
														
 
															+-   `Add` which takes a value and returns the index.
														
 
															+-   `Set` which takes a value and index to modify.
														
 
															+-   `Get` takes an index and returns a reference to the value (possibly a
														
 
															+    constant reference).
														
 
															+-   Other vector-like functionality, including `size` or `Reserve`
														
 
															+
														
 
															+ValueStores should be named after the type they contain. The index type used on
														
 
															+the value store should have a `using ValueType...` which indicates the stored
														
 
															+type. When taking a return of one of these functions, it's common to use `auto`
														
 
															+and rely on the name of the storage type to imply the returned type.
														
 
															+
														
 
															+Some name mirroring examples are:
														
 
															+
														
 
															+-   `ints` is a `ValueStore<IntId>`, which has an index type of `IntId` and a
														
 
															+    value type of `llvm::APInt`.
														
 
															+
														
 
															+-   `functions` is a `ValueStore<SemIR::FunctionId>`, which has an index type of
														
 
															+    `SemIR::FunctionId` and a value type of `SemIR::` `Function`.
														
 
															+
														
 
															+-   `strings` is a `ValueStore<StringId>`, which has an index type of
														
 
															+    `StringId`, but for copy-related reasons, uses `llvm::StringRef` for values.
														
 
															+
														
 
															+A fairly complete list of `ValueStore` uses should be available on
														
 
															+[checking's Context class](https://github.com/search?q=repository%3Acarbon-language%2Fcarbon-lang%20path%3Acheck%2Fcontext.h%20symbol%3Aidentifiers&type=code).
														
 
															+
														
 
															+## Template metaprogramming
														
 
															+
														
 
															+FIXME: show example patterns
														
 
															+
														
 
															+-   TypedInstArgsInfo from toolchain/sem_ir/inst.h
														
 
															+-   templated using
														
 
															+-   std::declval
														
 
															+-   decltype
														
 
															+-   static_assert
														
 
															+-   if constexpr
														
 
															+-   template specialization, for example `Inst::FromRaw<T>` (maybe also type
														
 
															+    traits?)
														
 
															+
														
 
															+### Struct reflection
														
 
															+
														
 
															+The toolchain uses a primitive form of struct reflection to operate generically
														
 
															+over the fields in a typed `SemIR` instruction. This is implemented in
														
 
															+`common/struct_reflection.h`, and the interface to the functionality is
														
 
															+`StructReflection::AsTuple(your_struct)`, which converts the given struct into a
														
 
															+`std::tuple` containing the same fields in the same order.
														
 
															+
														
 
															+### Field detection
														
 
															+
														
 
															+The presence of specific fields in a struct with a specified type is detected
														
 
															+using the following idiom:
														
 
															+
														
 
															+```cpp
														
 
															+template <typename T, typename = FieldType T::*>
														
 
															+constexpr bool HasField = false;
														
 
															+template <typename T>
														
 
															+constexpr bool HasField<T, decltype(&T::field)> = true;
														
 
															+```
														
 
															+
														
 
															+This is intended to check the same property as the following concept, which we
														
 
															+can't use because we currently need to compile in C++17 mode:
														
 
															+
														
 
															+```cpp
														
 
															+template <typename T> concept HasField = requires (T x) {
														
 
															+  { x.field } -> std::same_as<FieldType>;
														
 
															+};
														
 
															+```
														
 
															+
														
 
															+To detect a field with a specific name with a type derived from a specified base
														
 
															+type, use this idiom:
														
 
															+
														
 
															+```cpp
														
 
															+// HasField<T> is true if T has a `U field` field,
														
 
															+// where `U` extends `BaseClass`.
														
 
															+template <typename T, bool Enabled = true>
														
 
															+inline constexpr bool HasField = false;
														
 
															+template <typename T>
														
 
															+inline constexpr bool HasField<
														
 
															+    T, bool(std::is_base_of_v<BaseClass, decltype(T::field)>)> = true;
														
 
															+```
														
 
															+
														
 
															+The equivalent concept is:
														
 
															+
														
 
															+```cpp
														
 
															+template <typename T> concept HasField = requires (T x) {
														
 
															+  { x.field } -> std::derived_from<BaseClass>;
														
 
															+};
														
 
															+```
														
 
															+
														
 
															+## Local lambdas to reduce duplicate code
														
 
															+
														
 
															+Sometimes code that would be repeated in a function is factored into a local
														
 
															+variable containing a
														
 
															+[lambda](https://en.cppreference.com/w/cpp/language/lambda):
														
 
															+
														
 
															+```cpp
														
 
															+auto common_code = [&](AType param1, AnotherType param2) {
														
 
															+  // code that would otherwise be repeated
														
 
															+  ...
														
 
															+}
														
 
															+if (something) {
														
 
															+  common_code(...);
														
 
															+}
														
 
															+if (something_else) {
														
 
															+  common_code(...)
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+Compared to defining a new function, this has the advantage of being able to be
														
 
															+declared in context and access the local variables of the enclosing function.
														
 
															+
														
 
															+## Immediately invoked function expressions (IIFE)
														
 
															+
														
 
															+Instead of creating a separate function with its own name that will be called
														
 
															+once to produce the initial value for a variable, the function can be declared
														
 
															+inline and then immediately called.
														
 
															+
														
 
															+This can be used for complex initialization, as in:
														
 
															+
														
 
															+```cpp
														
 
															+// variable declaration
														
 
															+static const llvm::ArrayRef<std::byte> entropy_bytes =
														
 
															+// initializer starts with a lambda
														
 
															+    []() -> llvm::ArrayRef<std::byte> {
														
 
															+  static llvm::SmallVector<std::byte> bytes;
														
 
															+
														
 
															+  // a bunch of code
														
 
															+
														
 
															+  // return the value to initialize the variable with
														
 
															+  return bytes;
														
 
															+
														
 
															+// finish defining the lambda, and then immediately invoke it
														
 
															+}();
														
 
															+```
														
 
															+
														
 
															+It can also be used inside a `CARBON_DCHECK` to avoid computation that is only
														
 
															+needed in debug builds:
														
 
															+
														
 
															+```cpp
														
 
															+CARBON_DCHECK([&] {
														
 
															+  // a bunch of code
														
 
															+
														
 
															+  // condition that will be tested by CARBON_DCHECK
														
 
															+  return complicated && multiple_parts;
														
 
															+
														
 
															+// finish defining the lambda, and then immediately invoke it
														
 
															+}()) << "Complicated things went wrong";
														
 
															+```
														
 
															+
														
 
															+See a description of this technique on
														
 
															+[wikipedia](https://en.wikipedia.org/wiki/Immediately_invoked_function_expression).
														
 
															+
														
 
															+## Declarations in conditions
														
 
															+
														
 
															+The condition part of an `if` statement may contain a declaration with an
														
 
															+initializer followed by a semicolon (`;`) and then the proper boolean condition
														
 
															+expression, as in:
														
 
															+
														
 
															+```cpp
														
 
															+if (auto verify = tree.Verify(); !verify.ok()) {
														
 
															+```
														
 
															+
														
 
															+The condition can be replaced by a declaration entirely, as in:
														
 
															+
														
 
															+```cpp
														
 
															+if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal)) {
														
 
															+// Equivalent to:
														
 
															+if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal); equals) {
														
 
															+```
														
 
															+
														
 
															+or
														
 
															+
														
 
															+```cpp
														
 
															+if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>()) {
														
 
															+// Equivalent to:
														
 
															+if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>(); literal) {
														
 
															+```
														
 
															+
														
 
															+This is a common way of handling a function that returns an optional value.
														
 
															+
														
 
															+See
														
 
															+[https://en.cppreference.com/w/cpp/language/if](https://en.cppreference.com/w/cpp/language/if)
														
 
															+
														
 
															+## CRTP or "Curiously recurring template pattern"
														
 
															+
														
 
															+[Curiously Recurring Template Pattern - cppreference.com](https://en.cppreference.com/w/cpp/language/crtp)
														
 
															+
														
 
															+[Curiously recurring template pattern - Wikipedia](https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern)
														
 
															+
														
 
															+[Google search](https://www.google.com/search?q=crtp+c%2B%2B)
														
 
															+
														
 
															+Examples:
														
 
															+
														
 
															+-   `template <typename DerivedT, ...>` in [enum_base.h](/common/enum_base.h)
														
 
															+-   `template <typename DerivedT>` in [ostream.h](/common/ostream.h)
														
 
															+
														
 
															+## Multiple inheritance
														
 
															+
														
 
															+We use multiple inheritance to support uses of
														
 
															+[CRTP](#crtp-or-curiously-recurring-template-pattern).
														
 
															+
														
 
															+Example:
														
 
															+
														
 
															+```cpp
														
 
															+struct NameScopeId : public IndexBase, public Printable<NameScopeId> {
														
 
															+```
														
 
															+
														
 
															+## Defining constants usable in constexpr contexts
														
 
															+
														
 
															+To declare a constant usable at compile time in `constexpr` contexts as a static
														
 
															+class member, we use this pattern:
														
 
															+
														
 
															+Declaration:
														
 
															+
														
 
															+```cpp
														
 
															+class Foo {
														
 
															+  // ...
														
 
															+  static const std::array<ElementType, ElementCount> MyTable;
														
 
															+  static constexpr auto ComputeMyTable()
														
 
															+      -> std::array<ElementType, ElementCount> { ... }
														
 
															+};
														
 
															+```
														
 
															+
														
 
															+Definition:
														
 
															+
														
 
															+```cpp
														
 
															+constexpr std::array<ElementType, ElementCount>
														
 
															+    Foo::MyTable = Foo::ComputeMyTable();
														
 
															+```
														
 
															+
														
 
															+Note the `const` on the declaration does not match the `constexpr` on
														
 
															+definition, and that the definition is outside of the class body. This allows
														
 
															+the initializer to depend on the definition of the class.
														
 
															+
														
 
															+Further note that this only works with static members of classes, not static
														
 
															+variables in functions.
														
 
															+
														
 
															+Due to [a Clang bug](https://github.com/llvm/llvm-project/issues/85461), this
														
 
															+technique does not work in a class template. The following pattern can be used
														
 
															+instead:
														
 
															+
														
 
															+```cpp
														
 
															+template <typename T>
														
 
															+class Foo {
														
 
															+  // ...
														
 
															+  template <typename Self = Foo>
														
 
															+  static constexpr auto MyValueImpl = Self();
														
 
															+  static constexpr const Foo& MyValue = MyValueImpl<>;
														
 
															+  // ...
														
 
															+};
														
 
															+```
														
 
															+
														
 
															+The parameters of the variable template can be chosen to allow reuse of the same
														
 
															+variable template for multiple static data members.
														
 
															+
														
 
															+Examples:
														
 
															+
														
 
															+-   `NodeStack::IdKindTable` in
														
 
															+    [check/node_stack.h](/toolchain/check/node_stack.h)
														
 
															+-   `BuiltinKind::ValidCount` in
														
 
															+    [sem_ir/builtin_inst_kind.h](/toolchain/sem_ir/builtin_inst_kind.h)
														
 
															+
														
 
															+A global constant may use a single definition without a separate declaration:
														
 
															+
														
 
															+```cpp
														
 
															+static constexpr std::array<bool, 256> IsIdStartByteTable = [] {
														
 
															+  std::array<bool, 256> table = {};
														
 
															+  // ...
														
 
															+  return table;
														
 
															+}();
														
 
															+```
														
 
															+
														
 
															+Note this example is using an
														
 
															+[immediately invoked function expression](#immediately-invoked-function-expressions-iife)
														
 
															+to compute the initial value, which is common.
														
 
															+
														
 
															+Examples:
														
 
															+
														
 
															+-   [lex/lex.cpp](/toolchain/lex/lex.cpp)
														
--- a/toolchain/docs/lex.md
+++ b/toolchain/docs/lex.md
@@ -0,0 +1,44 @@
 
															+# Lex
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+-   [Bracket matching](#bracket-matching)
														
 
															+-   [Alternatives considered](#alternatives-considered)
														
 
															+    -   [Bracket matching in parser](#bracket-matching-in-parser)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+Lexing converts input source code into tokenized output. Literals, such as
														
 
															+string literals, have their value parsed and form a single token at this stage.
														
 
															+
														
 
															+## Bracket matching
														
 
															+
														
 
															+The lexer handles matching for `()`, `[]`, and `{}`. When a bracket lacks a
														
 
															+match, it will insert a "recovery" token to produce a match. As a consequence,
														
 
															+the lexer's output should always have matched brackets, even with invalid code.
														
 
															+
														
 
															+While bracket matching could use hints such as contextual clues from
														
 
															+indentation, that is not yet implemented.
														
 
															+
														
 
															+## Alternatives considered
														
 
															+
														
 
															+### Bracket matching in parser
														
 
															+
														
 
															+Bracket matching could have also been implemented in the parser, with some
														
 
															+awareness of parse state. However, that would shift some of the complexity of
														
 
															+recovery in other error situations, such as where the parser searches for the
														
 
															+next comma in a list. That needs to skip over bracketed ranges. We don't think
														
 
															+the trade-offs would yield a net benefit, so any change in this direction would
														
 
															+need to show concrete improvement, for example better diagnostics for common
														
 
															+issues.
														
--- a/toolchain/docs/lower.md
+++ b/toolchain/docs/lower.md
@@ -0,0 +1,25 @@
 
															+# Lower
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+Lowering takes the SemIR and produces LLVM IR. At present, this is done in a
														
 
															+single pass, although it's possible we may need to do a second pass so that we
														
 
															+can first generate type information for function arguments.
														
 
															+
														
 
															+Lowering is done per `SemIR::InstBlock`. This minimizes changes to the
														
 
															+`IRBuilder` insertion point, something that is both expensive and potentially
														
 
															+fragile.
														
--- a/toolchain/docs/parse.md
+++ b/toolchain/docs/parse.md
@@ -0,0 +1,802 @@
 
															+# Parse
														
 
															+
														
 
															+<!--
														
 
															+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
														
 
															+Exceptions. See /LICENSE for license information.
														
 
															+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
														
 
															+-->
														
 
															+
														
 
															+<!-- toc -->
														
 
															+
														
 
															+## Table of contents
														
 
															+
														
 
															+-   [Overview](#overview)
														
 
															+-   [Parse stack](#parse-stack)
														
 
															+-   [Postorder tree](#postorder-tree)
														
 
															+-   [Bracketing inside the tree](#bracketing-inside-the-tree)
														
 
															+-   [Visual example](#visual-example)
														
 
															+-   [Handling invalid parses](#handling-invalid-parses)
														
 
															+-   [How is this accomplished?](#how-is-this-accomplished)
														
 
															+    -   [Introducer](#introducer)
														
 
															+    -   [Optional modifiers before an introducer](#optional-modifiers-before-an-introducer)
														
 
															+    -   [Something required in context](#something-required-in-context)
														
 
															+    -   [Optional clauses](#optional-clauses)
														
 
															+        -   [Case 1: introducer to optional clause is used as parent node](#case-1-introducer-to-optional-clause-is-used-as-parent-node)
														
 
															+        -   [Case 2: parent node is required token after optional clause, with different parent node kinds for different options](#case-2-parent-node-is-required-token-after-optional-clause-with-different-parent-node-kinds-for-different-options)
														
 
															+        -   [Case 3: optional sibling](#case-3-optional-sibling)
														
 
															+    -   [Operators](#operators)
														
 
															+
														
 
															+<!-- tocstop -->
														
 
															+
														
 
															+## Overview
														
 
															+
														
 
															+Parsing uses tokens to produce a parse tree that faithfully represents the tree
														
 
															+structure of the source program, interpreted according to the Carbon grammar. No
														
 
															+semantics are associated with the tree structure at this level, and no name
														
 
															+lookup is performed.
														
 
															+
														
 
															+The parse tree's structure corresponds to the grammar of the Carbon language. On
														
 
															+valid input, there will be a 1:1 correspondence between parse tree nodes and
														
 
															+tokens.
														
 
															+
														
 
															+A parse tree is considered _structurally valid_ if all nodes have the number of
														
 
															+children that their node kind requires. On invalid input, nodes may be added
														
 
															+that don't correspond to a token to maintain a structurally valid parse tree.
														
 
															+When a parse tree node is marked as having an error, it will still be
														
 
															+structurally valid, but its children may not match a valid grammar. Code trying
														
 
															+to handle children of erroneous nodes must be prepared to handle atypical
														
 
															+structures, but it may still be helpful for tools such as syntax highlighters or
														
 
															+refactoring tools.
														
 
															+
														
 
															+In general, we favor doing the checking for whether something is allowed _in a
														
 
															+particular context_ in [the check stage](check.md) instead of the parse stage,
														
 
															+unless the context is very local. This is for a few reasons:
														
 
															+
														
 
															+-   We anticipate that the parse stage will be used to operate on invalid code
														
 
															+    while still preserving as much of the intent of the author as possible, for
														
 
															+    example in an IDE or a code formatter.
														
 
															+-   To keep as much code out of the parse stage as possible, so it is simple and
														
 
															+    fast.
														
 
															+-   We are building all the infrastructure to keep track of context in the check
														
 
															+    stage.
														
 
															+
														
 
															+These reasons explain what local context is okay: where we already have the
														
 
															+contextual information at hand so there is no performance cost, and we can
														
 
															+output a parse tree that still captures faithfully what the user wrote.
														
 
															+Examples:
														
 
															+
														
 
															+-   All declaration modifiers are allowed in any order on any declaration in the
														
 
															+    parse stage. Diagnosing duplicated modifiers, modifiers that conflict with
														
 
															+    other modifiers, or modifiers that can't be used on a particular declaration
														
 
															+    is postponed until the check stage.
														
 
															+-   Rejecting a keyword after `fn` where a name is expected is done at the parse
														
 
															+    stage.
														
 
															+
														
 
															+## Parse stack
														
 
															+
														
 
															+The core parser loop is `Parse::Tree::Parse`. In the loop, it pops the next
														
 
															+state off the stack, and dispatches to the appropriate `Handle` function.
														
 
															+
														
 
															+A typical handler function pops the state first, leaving the stack ready for the
														
 
															+next state. It may add nodes to the parse tree, based on the current code. If it
														
 
															+needs to trigger other states, it will push them onto the stack; because it's a
														
 
															+stack, the _next_ state is always pushed _last_.
														
 
															+
														
 
															+Operator expressions store information about current operator precedence in the
														
 
															+stack as well. While this isn't necessary for most parser states, and could be
														
 
															+stored separately, it's currently together because it has no impact on the size
														
 
															+of a stack entry and is thus more efficient to store in one place.
														
 
															+
														
 
															+## Postorder tree
														
 
															+
														
 
															+The parse tree's storage layout is in postorder. For example, given the code:
														
 
															+
														
 
															+```carbon
														
 
															+fn foo() -> f64 {
														
 
															+  return 42;
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+The node order is (with indentation to indicate nesting):
														
 
															+
														
 
															+<!-- Prevent prettier from changing indents. -->
														
 
															+<!-- prettier-ignore-start -->
														
 
															+
														
 
															+```yaml
														
 
															+[
														
 
															+  {kind: 'FileStart', text: ''},
														
 
															+      {kind: 'FunctionIntroducer', text: 'fn'},
														
 
															+      {kind: 'Name', text: 'foo'},
														
 
															+        {kind: 'ParamListStart', text: '('},
														
 
															+      {kind: 'ParamList', text: ')', subtree_size: 2},
														
 
															+        {kind: 'Literal', text: 'f64'},
														
 
															+      {kind: 'ReturnType', text: '->', subtree_size: 2},
														
 
															+    {kind: 'FunctionDefinitionStart', text: '{', subtree_size: 7},
														
 
															+      {kind: 'ReturnStatementStart', text: 'return'},
														
 
															+      {kind: 'Literal', text: '42'},
														
 
															+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
														
 
															+  {kind: 'FunctionDefinition', text: '}', subtree_size: 11},
														
 
															+  {kind: 'FileEnd', text: ''},
														
 
															+]
														
 
															+```
														
 
															+
														
 
															+<!-- prettier-ignore-end -->
														
 
															+
														
 
															+In this example, `FileStart`, `FunctionDefinition`, and `FileEnd` are "root"
														
 
															+nodes for the tree. Function components are children of `FunctionDefinition`.
														
 
															+
														
 
															+It's produced in this way because it's an efficient layout to produce with
														
 
															+vectorized storage, requiring little context to be maintained during parsing.
														
 
															+Because it's stored in postorder, it's also most efficient to process the parsed
														
 
															+output in postorder; this affects checking.
														
 
															+
														
 
															+The parse tree is printed in postorder by default because it matches how the
														
 
															+parse tree is expected to be processed within the toolchain , and so can make it
														
 
															+easier to reason about. However, the `--preorder` flag may be used in contexts
														
 
															+where a preorder representation would be easier to handle.
														
 
															+
														
 
															+## Bracketing inside the tree
														
 
															+
														
 
															+The parse tree is designed to be walked in postorder by checking, allowing
														
 
															+checking to be more efficient. To support this, checking sometimes requires
														
 
															+context on the meaning of a node when it is encountered.
														
 
															+
														
 
															+Each `ParseNodeKind` has either a bracketing node, or a specific child count.
														
 
															+This helps document and enforce the expected tree structure.
														
 
															+
														
 
															+When a bracketing node is indicated, it is the opening bracket: it will always
														
 
															+be the first child of the parent, and that will be the only time it occurs in
														
 
															+the parent's children (it may still occur in children of children). When
														
 
															+checking encounters the opening bracket, this means it can make contextual
														
 
															+decisions for the later children of the node.
														
 
															+
														
 
															+Nodes can also have a specific child count, for example, infix operators always
														
 
															+have two children: the lhs and rhs expressions. Many nodes have a child count of
														
 
															+0; this just means they're leaf nodes, and will never have children.
														
 
															+
														
 
															+Because the tree structure is always valid, these are treated as contracts. Some
														
 
															+nodes exist only to be used to construct valid tree structures for invalid
														
 
															+input, such as `StructFieldUnknown`.
														
 
															+
														
 
															+Although each subtree's size is also tracked as part of the node, we're
														
 
															+currently trying to avoid relying on it and may eliminate it if it turns out to
														
 
															+be unnecessary and a meaningful cost for the compiler.
														
 
															+
														
 
															+## Visual example
														
 
															+
														
 
															+To try to explain the transition from code to Parse Tree, consider the
														
 
															+statement:
														
 
															+
														
 
															+```carbon
														
 
															+var x: i32 = y + 1;
														
 
															+```
														
 
															+
														
 
															+Lexing creates distinct tokens for each syntactic element, which will form the
														
 
															+basis of the parse tree:
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
														
 
															++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+</pre>
														
 
															+
														
 
															+First the `var` keyword is used as a "bracketing" node (VariableIntroducer).
														
 
															+When this is seen in a postorder traversal, it tells us to expect the basics of
														
 
															+a variable declaration structure.
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															+        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+        |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
														
 
															+        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+
														
 
															+<b>Parse tree:</b>
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															++-----+
														
 
															+| var |
														
 
															++-----+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+</pre>
														
 
															+
														
 
															+Next, we can consider the pattern binding. Here, `x` is the identifier and `i32`
														
 
															+is the type expression. The `:` provides a parent node that must always contain
														
 
															+two children, the name and type expression. Because it always has two direct
														
 
															+children, it doesn't need to be bracketed.
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															+                                +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+                                |  =  | |  y  | |  +  | |  1  | |  ;  |
														
 
															+                                +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+
														
 
															+<b>Parse tree:</b>
														
 
															+
														
 
															+        +-----+ +-----+
														
 
															+        |  x  | | i32 |
														
 
															+        +-----+ +-----+
														
 
															+           |       |
														
 
															+           +-------+-------+
														
 
															+                           |
														
 
															++-----+                 +-----+
														
 
															+| var |                 |  :  |
														
 
															++-----+                 +-----+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+</pre>
														
 
															+
														
 
															+We use the `=` as a separator (instead of a node with children like `:`) to help
														
 
															+indicate the transition from binding to assignment expression, which is
														
 
															+important for expression parsing during checking.
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															+                                        +-----+ +-----+ +-----+ +-----+
														
 
															+                                        |  y  | |  +  | |  1  | |  ;  |
														
 
															+                                        +-----+ +-----+ +-----+ +-----+
														
 
															+
														
 
															+<b>Parse tree:</b>
														
 
															+
														
 
															+        +-----+ +-----+
														
 
															+        |  x  | | i32 |
														
 
															+        +-----+ +-----+
														
 
															+           |       |
														
 
															+           +-------+-------+
														
 
															+                           |
														
 
															++-----+                 +-----+ +-----+
														
 
															+| var |                 |  :  | |  =  |
														
 
															++-----+                 +-----+ +-----+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+</pre>
														
 
															+
														
 
															+The expression is a subtree with `+` as the parent, and the two operands as
														
 
															+child nodes.
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															+                                                                +-----+
														
 
															+                                                                |  ;  |
														
 
															+                                                                +-----+
														
 
															+
														
 
															+<b>Parse tree:</b>
														
 
															+
														
 
															+        +-----+ +-----+                 +-----+ +-----+
														
 
															+        |  x  | | i32 |                 |  y  | |  1  |
														
 
															+        +-----+ +-----+                 +-----+ +-----+
														
 
															+           |       |                       |       |
														
 
															+           +-------+-------+               +-------+-------+
														
 
															+                           |                               |
														
 
															++-----+                 +-----+ +-----+                 +-----+
														
 
															+| var |                 |  :  | |  =  |                 |  +  |
														
 
															++-----+                 +-----+ +-----+                 +-----+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+</pre>
														
 
															+
														
 
															+Finally, the `;` is used as the "root" of the variable declaration. It's
														
 
															+explicitly tracked as the `;` for a variable declaration so that it's
														
 
															+unambiguously bracketed by `var`.
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+
														
 
															+<b>Parse tree:</b>
														
 
															+
														
 
															+        +-----+ +-----+                 +-----+ +-----+
														
 
															+        |  x  | | i32 |                 |  y  | |  1  |
														
 
															+        +-----+ +-----+                 +-----+ +-----+
														
 
															+           |       |                       |       |
														
 
															+           +-------+-------+               +-------+-------+
														
 
															+                           |                               |
														
 
															++-----+                 +-----+ +-----+                 +-----+
														
 
															+| var |                 |  :  | |  =  |                 |  +  |
														
 
															++-----+                 +-----+ +-----+                 +-----+
														
 
															+   |                       |       |                       |
														
 
															+   +-----------------------+-------+-----------------------+-------+
														
 
															+                                                                   |
														
 
															+                                                                +-----+
														
 
															+                                                                |  ;  |
														
 
															+                                                                +-----+
														
 
															+</pre>
														
 
															+
														
 
															+This is the completed parse tree.
														
 
															+
														
 
															+In storage, this tree will be flat and in postorder. Because the order hasn't
														
 
															+changed much from the original code, we can do the reordering for postorder with
														
 
															+a minimal number of nodes being delayed for later output: it will be linear with
														
 
															+respect to the depth of the parse tree.
														
 
															+
														
 
															+<pre>
														
 
															+<b>Tokens:</b>
														
 
															+
														
 
															++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
														
 
															++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+
														
 
															+<b>Parse tree:</b>
														
 
															+
														
 
															+        +-----+ +-----+                 +-----+ +-----+
														
 
															+        |  x  | | i32 |                 |  y  | |  1  |
														
 
															+        +-----+ +-----+                 +-----+ +-----+
														
 
															+           |       |                       |       |
														
 
															+           +-------+-------+               +-------+-------+
														
 
															+                           |                               |
														
 
															++-----+                 +-----+ +-----+                 +-----+
														
 
															+| var |                 |  :  | |  =  |                 |  +  |
														
 
															++-----+                 +-----+ +-----+                 +-----+
														
 
															+   |                       |       |                       |
														
 
															+   +-----------------------+-------+-----------------------+-------+
														
 
															+                                                                   |
														
 
															+                                                                +-----+
														
 
															+                                                                |  ;  |
														
 
															+                                                                +-----+
														
 
															+
														
 
															+<b>Flattened for storage:</b>
														
 
															+
														
 
															++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+| var | |  x  | | i32 | |  :  | |  =  | |  y  | |  1  | |  +  | |  ;  |
														
 
															++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
														
 
															+</pre>
														
 
															+
														
 
															+The structural concepts of bracketing nodes (`var` and `;`) and parent nodes
														
 
															+with a known child count (`:` and `+` with 2 children, but also `=` with 0
														
 
															+children) will allow checking to reconstruct the tree as it encounters nodes
														
 
															+during the postorder.
														
 
															+
														
 
															+There are other structures that could have been used here, such as `=` being
														
 
															+parent of the `var` and pattern nodes, and `;` being the parent of the `=` and
														
 
															+assignment expression nodes. In that example alternative, the storage order
														
 
															+would be the same; it would only change the tree representation. The current
														
 
															+structure is influenced by choices in checking.
														
 
															+
														
 
															+## Handling invalid parses
														
 
															+
														
 
															+On an invalid parse, the output tree should still try to mirror the intended
														
 
															+tree structure when possible. There's a balance here, and it's not expected to
														
 
															+try too hard to make things correct, but outputting nodes is preferred. There
														
 
															+are `InvalidParse` nodes which may be used to provide a node when the planned
														
 
															+node kind is too difficult to get correct child counts (bracketed subtrees may
														
 
															+not need an `InvalidParse` node).
														
 
															+
														
 
															+When marking a child node with `has_error=true`, parent nodes may also be marked
														
 
															+with `has_error=true`, but try to be conservative about this. As a rule of
														
 
															+thumb, if checking could continue on a parent node without needing the child
														
 
															+node to be fully checked (possibly with incomplete information), then the parent
														
 
															+node should not be marked as `has_error=true`. The goal remains providing
														
 
															+something similar to a well-formed parse tree.
														
 
															+
														
 
															+In general, a parent node must have the immediate children described in
														
 
															+[parse/typed_nodes.h](/toolchain/parse/typed_nodes.h), unless it is marked
														
 
															+`has_error=true`. If this is violated for a particular parse tree, an error will
														
 
															+be raised in `Tree::Verify`. Note that an `InvalidParse` node is allowed as a
														
 
															+declaration or expression, and an `InvalidParseSubtree` is allowed as a
														
 
															+declaration. These invalid nodes can be added to more node categories as needed.
														
 
															+
														
 
															+Child states may indicate an error to their parent using `ReturnErrorOnState`.
														
 
															+This is particularly intended for when a child state emits a diagnostic, to
														
 
															+prevent the parent state from emitting redundant diagnostics; for example, an
														
 
															+invalid expression might have more invalid tokens following it, and the parent
														
 
															+might skip those without emitting diagnostics.
														
 
															+
														
 
															+## How is this accomplished?
														
 
															+
														
 
															+The specific approach to producing the desired tree depends on the kind of
														
 
															+grammar rule being implemented, as well as the desired output tree structure.
														
 
															+
														
 
															+### Introducer
														
 
															+
														
 
															+**Example:** `if (c) { ... }`
														
 
															+
														
 
															+Here `if` is the introducer. Many other possible introducers could occur in that
														
 
															+position, such as `while` or `var`, and we want to dispatch based on which token
														
 
															+is present. See
														
 
															+[parse/handle_statement.cpp](/toolchain/parse/handle_statement.cpp).
														
 
															+
														
 
															+The first step is to identify the introducer token, typically using a `switch`
														
 
															+or `if` on the `Lex::TokenKind` at the current position:
														
 
															+
														
 
															+```cpp
														
 
															+switch (context.PositionKind()) {
														
 
															+  case Lex::TokenKind::___: {
														
 
															+    ...
														
 
															+    break;
														
 
															+  }
														
 
															+  ...
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+There should be a `default:` (or `else`) case so every kind of token is handled.
														
 
															+This may be an error, in which case:
														
 
															+
														
 
															+-   A [diagnostic](diagnostics.md) should be emitted.
														
 
															+
														
 
															+-   An invalid parse node should be added, using something like:
														
 
															+
														
 
															+    ```cpp
														
 
															+    context.AddLeafNode(NodeKind::InvalidParse, context.Consume(),
														
 
															+                        /*has_error=*/true);
														
 
															+    ```
														
 
															+
														
 
															+-   At least one node should be consumed, particularly if it will continue with
														
 
															+    this state at this position, to avoid an infinite loop.
														
 
															+
														
 
															+The default case may also be delegated to another state. For example, in the
														
 
															+state where a statement is expected, if no keyword introducer is recognized, it
														
 
															+switches to the expression-statement state.
														
 
															+
														
 
															+Depending on the introducer, different actions can be taken. The most common
														
 
															+case is to:
														
 
															+
														
 
															+-   Call `context.PushState(State::___);` to mark the beginning of the statement
														
 
															+    or declaration and indicate the state that will handle the tokens after the
														
 
															+    introducer.
														
 
															+
														
 
															+-   Call `context.AddLeafNode(NodeKind::___, context.Consume());` to output a
														
 
															+    bracketing node for this introducer.
														
 
															+
														
 
															+The next state can then add sibling nodes until it gets to the end of the
														
 
															+declaration or statement. The last token, often a semicolon `;`, is used as a
														
 
															+parent node to match the bracketing node of the introducer.
														
 
															+
														
 
															+If the introducer token won't be used as a bracketing node, it can be
														
 
															+temporarily skipped after `context.PushState` by calling
														
 
															+`context.ConsumeAndDiscard()` instead of `context.AddLeafNode`. It must be added
														
 
															+to the output tree as a node by some later state, unless an error occurs. For
														
 
															+example, a `for` statement uses the `for` token as the root of the tree -- it
														
 
															+doesn't need a bracketing node since it has a fixed child count. Note that the
														
 
															+token was saved when the state was pushed, and can be retrieved when adding a
														
 
															+node as in this example:
														
 
															+
														
 
															+```cpp
														
 
															+auto state = context.PopState();
														
 
															+context.AddNode(NodeKind::ForStatement, state.token, state.subtree_start,
														
 
															+                state.has_error);
														
 
															+```
														
 
															+
														
 
															+If this state is for an element of a scope like the statements in a code block,
														
 
															+most introducer tokens indicate that the current state should be repeated, to
														
 
															+handle the next statement, but some other token, like a close curly brace (`}`)
														
 
															+means that the state should be exited.
														
 
															+
														
 
															+### Optional modifiers before an introducer
														
 
															+
														
 
															+**Example:** `virtual fn Foo();`
														
 
															+
														
 
															+Here `fn` is the introducer, and `virtual` is an optional modifier that appears
														
 
															+before. See
														
 
															+[parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp).
														
 
															+
														
 
															+Use this pattern when the goal is to produce a subtree that starts with the
														
 
															+introducer as a bracketing node, as in the previous case, followed by nodes for
														
 
															+any modifiers. Note that bracketing is needed here, since the optional modifier
														
 
															+nodes mean that there is not a fixed child count for the parent node. This means
														
 
															+shuffling the introducer node before an unknown number of modifier nodes. This
														
 
															+is accomplished by emitting a placeholder node for the introducer, processing
														
 
															+all the modifiers until reaching the introducer, filling in the placeholder with
														
 
															+the information about the introducer, and then finishing the rest of the
														
 
															+declaration or statement.
														
 
															+
														
 
															+-   **Step 1**: Save the current value of `context.tree().size()`. This could be
														
 
															+    accomplished by calling `context.PushState()`, which saves that value in the
														
 
															+    `subtree_start` field of `Context::StateStackEntry`; or by constructing a
														
 
															+    `Context::StateStackEntry` value directly, as is done in
														
 
															+    [parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp).
														
 
															+    This marks the position of the placeholder node we are going to replace, as
														
 
															+    well as the beginning of the subtree we are eventually going to emit for
														
 
															+    this declaration or statement.
														
 
															+
														
 
															+-   **Step 2**: Emit the placeholder node using
														
 
															+    `context.AddLeafNode(NodeKind::Placeholder, *context.position());`. The
														
 
															+    `NodeKind` and `Lex::TokenIndex` values will be overwritten later.
														
 
															+
														
 
															+-   **Step 3**: Process tokens until we hit the introducer. All of the nodes we
														
 
															+    emit at this point will appear as siblings after the introducer token in the
														
 
															+    output tree.
														
 
															+
														
 
															+-   **Step 4 - success**: If an introducer token is found, replace the
														
 
															+    placeholder node using something like:
														
 
															+
														
 
															+    ```cpp
														
 
															+    context.ReplacePlaceholderNode(state.subtree_start, introducer_kind,
														
 
															+                                   context.Consume());
														
 
															+    ```
														
 
															+
														
 
															+    -   `state.subtree_start` is the value of `context.tree().size()` saved in
														
 
															+        step 1, which marks the position of the placeholder node in the output
														
 
															+        parse tree.
														
 
															+
														
 
															+    -   `introducer_kind` is the `NodeKind` for the introducer of this
														
 
															+        declaration or statement, a leaf node that will act as a bracketing node
														
 
															+        at the beginning of the subtree for this declaration or statement
														
 
															+
														
 
															+-   **Step 4 - error**: If we run into something other than a modifier or
														
 
															+    introducer before finding an introducer, we need to do error handling:
														
 
															+
														
 
															+    ```cpp
														
 
															+    context.ReplacePlaceholderNode(subtree_start, NodeKind::InvalidParseStart,
														
 
															+                                   *context.position(), /*has_error=*/true);
														
 
															+    ```
														
 
															+
														
 
															+    -   Emit a [diagnostic](diagnostics.md).
														
 
															+
														
 
															+    -   Replace the placeholder node (similar to step 4) with an
														
 
															+        `InvalidParseStart` node. It will be associated with the unexpected
														
 
															+        token that triggered this error.
														
 
															+
														
 
															+    -   Consume input token up to the likely end of the end of the current
														
 
															+        statement or declaration. For example, we might consume up to a `;` or a
														
 
															+        token at a lesser indent level using `context.SkipPastLikelyEnd(...)`.
														
 
															+        It is important that we consume at least one token in the error case,
														
 
															+        otherwise we could have an infinite loop of generating the same error on
														
 
															+        the same token.
														
 
															+
														
 
															+    -   Emit a `InvalidParseSubtree` node. This will be the parent of any
														
 
															+        emitted modifier nodes, and will be bracketed by the `InvalidParseStart`
														
 
															+        node emitted above. It should be associated with the last token
														
 
															+        consumed.
														
 
															+
														
 
															+        ```cpp
														
 
															+        // Set `iter` to the last token consumed, one before the current position.
														
 
															+        auto iter = context.position();
														
 
															+        --iter;
														
 
															+        context.AddNode(NodeKind::InvalidParseSubtree, *iter, subtree_start,
														
 
															+                        /*has_error=*/true);
														
 
															+        ```
														
 
															+
														
 
															+-   **Step 5**: (If success at step 4) Push whatever states are to be used to
														
 
															+    parse the rest of the declaration. The first state pushed (the last state to
														
 
															+    be processed) will handle the end of this declaration. That pushed state
														
 
															+    should have a `subtree_start` field set to the value of
														
 
															+    `context.tree().size()` saved in step 1.
														
 
															+
														
 
															+-   **Step 6**: When handling the state for the end of the declaration, emit the
														
 
															+    root node of subtree:
														
 
															+
														
 
															+    ```cpp
														
 
															+    state = context.PopState();
														
 
															+    context.AddNode(NodeKind::___, context.Consume(),
														
 
															+                    state.subtree_start, state.has_error);
														
 
															+    ```
														
 
															+
														
 
															+    -   This `state.subtree_start` will mark everything since the bracketing
														
 
															+        introducer node as the children of this node.
														
 
															+
														
 
															+### Something required in context
														
 
															+
														
 
															+FIXME
														
 
															+
														
 
															+Example: name after introducer
														
 
															+[parse/handle_decl_name_and_params.cpp](/toolchain/parse/handle_decl_name_and_params.cpp)
														
 
															+
														
 
															+Example: "`[` _implicit parameter list_ `]`" after `impl forall`
														
 
															+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp)
														
 
															+
														
 
															+### Optional clauses
														
 
															+
														
 
															+#### Case 1: introducer to optional clause is used as parent node
														
 
															+
														
 
															+**Example:** The optional `-> <return type expression>` in a function signature
														
 
															+uses this pattern, so `fn foo() -> u32;` is transformed to:
														
 
															+
														
 
															+```yaml
														
 
															+  {kind: 'FunctionIntroducer', text: 'fn'},
														
 
															+  {kind: 'IdentifierName', text: 'foo'},
														
 
															+    {kind: 'TuplePatternStart', text: '('},
														
 
															+  {kind: 'TuplePattern', text: ')', subtree_size: 2},
														
 
															+    {kind: 'UnsignedIntTypeLiteral', text: 'u32'},
														
 
															+  {kind: 'ReturnType', text: '->', subtree_size: 2},
														
 
															+{kind: 'FunctionDecl', text: ';', subtree_size: 7},
														
 
															+```
														
 
															+
														
 
															+Note how the `->` token becomes a `ReturnType` node in the output tree, and is
														
 
															+moved after the `u32` type expression that becomes its child. Compare with the
														
 
															+parse tree output for `fn foo();` which has no `ReturnType` node:
														
 
															+
														
 
															+```yaml
														
 
															+  {kind: 'FunctionIntroducer', text: 'fn'},
														
 
															+  {kind: 'IdentifierName', text: 'foo'},
														
 
															+    {kind: 'TuplePatternStart', text: '('},
														
 
															+  {kind: 'TuplePattern', text: ')', subtree_size: 2},
														
 
															+{kind: 'FunctionDecl', text: ';', subtree_size: 5},
														
 
															+```
														
 
															+
														
 
															+Here is the code from
														
 
															+[parse/handle_function.cpp](/toolchain/parse/handle_function.cpp) that does
														
 
															+this:
														
 
															+
														
 
															+```cpp
														
 
															+auto HandleFunctionAfterParams(Context& context) -> void {
														
 
															+  ...
														
 
															+  // If there is a return type, parse the expression before adding the return
														
 
															+  // type node.
														
 
															+  if (context.PositionIs(Lex::TokenKind::MinusGreater)) {
														
 
															+    context.PushState(State::FunctionReturnTypeFinish);
														
 
															+    context.ConsumeAndDiscard();
														
 
															+    context.PushStateForExpr(PrecedenceGroup::ForType());
														
 
															+  }
														
 
															+}
														
 
															+
														
 
															+auto HandleFunctionReturnTypeFinish(Context& context) -> void {
														
 
															+  auto state = context.PopState();
														
 
															+
														
 
															+  context.AddNode(NodeKind::ReturnType, state.token, state.subtree_start,
														
 
															+                  state.has_error);
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+The `->` token is saved by `context.PushState(`...`)`, so it is available as
														
 
															+`state.token` when calling
														
 
															+`context.AddNode(NodeKind::ReturnType, state.token,`...`)` later in
														
 
															+`HandleFunctionReturnTypeFinish`.
														
 
															+
														
 
															+Also see how the optional initializer is handled on `var`, treating the `=` as
														
 
															+its introducer in `HandleVarAfterPattern` and `HandleVarInitializer` in
														
 
															+[parse/handle_var.cpp](/toolchain/parse/handle_var.cpp).
														
 
															+
														
 
															+#### Case 2: parent node is required token after optional clause, with different parent node kinds for different options
														
 
															+
														
 
															+**Example:** The optional type expression before `as` in `impl as` is
														
 
															+represented by producing two different output parse nodes for `as`. It outputs a
														
 
															+`DefaultSelfImplAs` node with no children when the type expression is absent,
														
 
															+and otherwise a `TypeImplAs` parse node with the type expression as its child.
														
 
															+
														
 
															+So `impl bool as Interface;` is transformed to:
														
 
															+
														
 
															+```yaml
														
 
															+  {kind: 'ImplIntroducer', text: 'impl'},
														
 
															+    {kind: 'BoolTypeLiteral', text: 'bool'},
														
 
															+  {kind: 'TypeImplAs', text: 'as', subtree_size: 2},
														
 
															+  {kind: 'IdentifierNameExpr', text: 'Interface'},
														
 
															+{kind: 'ImplDecl', text: ';', subtree_size: 5},
														
 
															+```
														
 
															+
														
 
															+while `impl as Interface;` is transformed to:
														
 
															+
														
 
															+```yaml
														
 
															+  {kind: 'ImplIntroducer', text: 'impl'},
														
 
															+  {kind: 'DefaultSelfImplAs', text: 'as'},
														
 
															+  {kind: 'IdentifierNameExpr', text: 'Interface'},
														
 
															+{kind: 'ImplDecl', text: ';', subtree_size: 4},
														
 
															+```
														
 
															+
														
 
															+This is handled by the `ExpectAsOrTypeExpression` code from
														
 
															+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp):
														
 
															+
														
 
															+```cpp
														
 
															+if (context.PositionIs(Lex::TokenKind::As)) {
														
 
															+  // as <expression> ...
														
 
															+  context.AddLeafNode(NodeKind::DefaultSelfImplAs, context.Consume());
														
 
															+  context.PushState(State::Expr);
														
 
															+} else {
														
 
															+  // <expression> as <expression>...
														
 
															+  context.PushState(State::ImplBeforeAs);
														
 
															+  context.PushStateForExpr(PrecedenceGroup::ForImplAs());
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+and then `HandleImplBeforeAs` creates the parent node in the second case:
														
 
															+
														
 
															+```cpp
														
 
															+auto state = context.PopState();
														
 
															+if (auto as = context.ConsumeIf(Lex::TokenKind::As)) {
														
 
															+  context.AddNode(NodeKind::TypeImplAs, *as, state.subtree_start,
														
 
															+                  state.has_error);
														
 
															+  context.PushState(State::Expr);
														
 
															+} else {
														
 
															+  if (!state.has_error) {
														
 
															+    CARBON_DIAGNOSTIC(ImplExpectedAs, Error,
														
 
															+                      "Expected `as` in `impl` declaration.");
														
 
															+    context.emitter().Emit(*context.position(), ImplExpectedAs);
														
 
															+  }
														
 
															+  context.ReturnErrorOnState();
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+Note (1) that the `state.subtree_start` value comes from the
														
 
															+`context.PushState(State::ImplBeforeAs);` before parsing the type expression,
														
 
															+and that is how that type expression ends up as the child of the created
														
 
															+`TypeImplAs` node. Unlike
														
 
															+[the previous case 1](#case-1-introducer-to-optional-clause-is-used-as-parent-node),
														
 
															+though, the parent node uses the token after the optional expression, rather
														
 
															+than an introducer token for the optional clause.
														
 
															+
														
 
															+Note (2) how `HandleImplBeforeAs` handles three cases of errors:
														
 
															+
														
 
															+-   `as` present but an error in the child type expression -> error on the
														
 
															+    output `TypeImplAs` node, but not propagated to the parent.
														
 
															+-   Error from no `as` present but the type expression was okay -> create a new
														
 
															+    error.
														
 
															+-   There was error from the child type expression and no `as` present -> no new
														
 
															+    diagnostic, we suppress errors once one is emitted until we can recover.
														
 
															+
														
 
															+If there is no `as` token, we don't output either a `TypeImplAs` or a
														
 
															+`DefaultSelfImplAs` node, as required by the parent node, so in those cases we
														
 
															+mark the parent as having an error.
														
 
															+
														
 
															+#### Case 3: optional sibling
														
 
															+
														
 
															+> TODO: This was changed by
														
 
															+> [#3678](https://github.com/carbon-language/carbon-lang/pull/3678) and needs to
														
 
															+> be updated.
														
 
															+
														
 
															+**Example:** The optional type expression before `as` in `impl as` is output as
														
 
															+an optional sibling subtree between the `ImplIntroducer` node for the `impl`
														
 
															+introducer and the `ImplAs` node for the required `as` keyword.
														
 
															+
														
 
															+`impl bool as Interface;` is transformed to:
														
 
															+
														
 
															+```yaml
														
 
															+  {kind: 'ImplIntroducer', text: 'impl'},
														
 
															+  {kind: 'BoolTypeLiteral', text: 'bool'},
														
 
															+  {kind: 'ImplAs', text: 'as'},
														
 
															+  {kind: 'IdentifierNameExpr', text: 'Interface'},
														
 
															+{kind: 'ImplDecl', text: ';', subtree_size: 5},
														
 
															+```
														
 
															+
														
 
															+while `impl as Interface;` is transformed to:
														
 
															+
														
 
															+```yaml
														
 
															+  {kind: 'ImplIntroducer', text: 'impl'},
														
 
															+  {kind: 'ImplAs', text: 'as'},
														
 
															+  {kind: 'IdentifierNameExpr', text: 'Interface'},
														
 
															+{kind: 'ImplDecl', text: ';', subtree_size: 4},
														
 
															+```
														
 
															+
														
 
															+This is handled by the `ExpectAsOrTypeExpression` code from
														
 
															+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp):
														
 
															+
														
 
															+```cpp
														
 
															+if (context.PositionIs(Lex::TokenKind::As)) {
														
 
															+  // as <expression> ...
														
 
															+  context.AddLeafNode(NodeKind::ImplAs, context.Consume());
														
 
															+  context.PushState(State::Expr);
														
 
															+} else {
														
 
															+  // <expression> as <expression>...
														
 
															+  context.PushState(State::ImplBeforeAs);
														
 
															+  context.PushStateForExpr(PrecedenceGroup::ForImplAs());
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+and then `HandleImplBeforeAs` follows
														
 
															+[the "something required in context" pattern](#something-required-in-context) to
														
 
															+deal with the `as` that follows when the type expression is present.
														
 
															+
														
 
															+### Operators
														
 
															+
														
 
															+FIXME
														
 
															+
														
 
															+An independent description of our approach:
														
 
															+["Better operator precedence" on scattered-thoughts.net](https://www.scattered-thoughts.net/writing/better-operator-precedence/)
														
--- a/toolchain/docs/parse.svg
+++ b/toolchain/docs/parse.svg
--- a/website/prebuild.py
+++ b/website/prebuild.py
@@ -189,7 +189,9 @@ def main() -> None:
 
															     # Reset the order for the implementation children.
														
 
															     nav_order[0] = 0
														
 
															-    label_subdir("toolchain", next(nav_order), parent_title="Implementation")
														
 
															+    label_subdir(
														
 
															+        "toolchain/docs", next(nav_order), parent_title="Implementation"
														
 
															+    )
														
 
															     label_subdir("explorer", next(nav_order), parent_title="Implementation")
														
 
															     label_subdir("testing", next(nav_order), parent_title="Implementation")