1 年間前 · a24816a1f4
--- a/toolchain/README.md
+++ b/toolchain/README.md
@@ -6,6 +6,4 @@ Exceptions. See /LICENSE for license information.
 
				 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				 -->
			
 
				 
			
 
				-A design is currently maintained in
			
 
				-[Google Drive](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw).
			
 
				-It'll be migrated to markdown once we are confident in its stability.
			
 
				+See [docs](docs/).
			
--- a/toolchain/docs/README.md
+++ b/toolchain/docs/README.md
@@ -0,0 +1,94 @@
 
				+# Toolchain architecture
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Goals](#goals)
			
 
				+-   [High-level architecture](#high-level-architecture)
			
 
				+    -   [Design patterns](#design-patterns)
			
 
				+-   [Adding features](#adding-features)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Goals
			
 
				+
			
 
				+The toolchain represents the production portion of Carbon. At a high level, the
			
 
				+toolchain's top priorities are:
			
 
				+
			
 
				+-   Correctness.
			
 
				+-   Quality of generated code, including performance.
			
 
				+-   Compilation performance.
			
 
				+-   Quality of diagnostics for incorrect or questionable code.
			
 
				+
			
 
				+TODO: Add an expanded document that details the goals and priorities and link to
			
 
				+it here.
			
 
				+
			
 
				+## High-level architecture
			
 
				+
			
 
				+The main components are:
			
 
				+
			
 
				+-   [Driver](driver.md): Provides commands and ties together compilation flow.
			
 
				+-   [Diagnostics](diagnostics.md): Produces diagnostic output.
			
 
				+-   Compilation flow:
			
 
				+
			
 
				+    1. Source: Load the file into a
			
 
				+       [SourceBuffer](/toolchain/source/source_buffer.h).
			
 
				+    2. [Lex](lex.md): Transform a SourceBuffer into a
			
 
				+       [Lex::TokenizedBuffer](/toolchain/lex/tokenized_buffer.h).
			
 
				+    3. [Parse](parse.md): Transform a TokenizedBuffer into a
			
 
				+       [Parse::Tree](/toolchain/parse/tree.h).
			
 
				+    4. [Check](check.md): Transform a Tree to produce
			
 
				+       [SemIR::File](/toolchain/sem_ir/file.h).
			
 
				+    5. [Lower](lower.md): Transform the SemIR to an
			
 
				+       [LLVM Module](https://llvm.org/doxygen/classllvm_1_1Module.html).
			
 
				+    6. CodeGen: Transform the LLVM Module into an Object File.
			
 
				+
			
 
				+### Design patterns
			
 
				+
			
 
				+A few common design patterns are:
			
 
				+
			
 
				+-   Distinct steps: Each step of processing produces an output structure,
			
 
				+    avoiding callbacks passing data between structures.
			
 
				+
			
 
				+    -   For example, the parser takes a `Lex::TokenizedBuffer` as input and
			
 
				+        produces a `Parse::Tree` as output.
			
 
				+
			
 
				+    -   Performance: It should yield better locality versus a callback approach.
			
 
				+
			
 
				+    -   Understandability: Each step has a clear input and output, versus
			
 
				+        callbacks which obscure the flow of data.
			
 
				+
			
 
				+-   Vectorized storage: Data is stored in vectors and flyweights are passed
			
 
				+    around, avoiding more typical heap allocation with pointers.
			
 
				+
			
 
				+    -   For example, the parse tree is stored as a
			
 
				+        `llvm::SmallVector<Parse::Tree::NodeImpl>` indexed by `Parse::Node`
			
 
				+        which wraps an `int32_t`.
			
 
				+
			
 
				+    -   Performance: Vectorization both minimizes memory allocation overhead and
			
 
				+        enables better read caching because adjacent entries will be cached
			
 
				+        together.
			
 
				+
			
 
				+-   Iterative processing: We rely on state stacks and iterative loops for
			
 
				+    parsing, avoiding recursive function calls.
			
 
				+
			
 
				+    -   For example, the parser has a `Parse::State` enum tracked in
			
 
				+        `state_stack_`, and loops in `Parse::Tree::Parse`.
			
 
				+
			
 
				+    -   Scalability: Complex code must not cause recursion issues. We have
			
 
				+        experience in Clang seeing stack frame recursion limits being hit in
			
 
				+        unexpected ways, and non-recursive approaches largely avoid that risk.
			
 
				+
			
 
				+See also [Idioms](idioms.md) for abbreviations and more implementation
			
 
				+techniques.
			
 
				+
			
 
				+## Adding features
			
 
				+
			
 
				+We have a [walkthrough for adding features](adding_features.md).
			
--- a/toolchain/docs/adding_features.md
+++ b/toolchain/docs/adding_features.md
@@ -0,0 +1,433 @@
 
				+# Adding features
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Lex](#lex)
			
 
				+-   [Parse](#parse)
			
 
				+    -   [Typed parse node metadata implementation](#typed-parse-node-metadata-implementation)
			
 
				+-   [Check](#check)
			
 
				+    -   [SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation)
			
 
				+-   [Lower](#lower)
			
 
				+-   [Tests and debugging](#tests-and-debugging)
			
 
				+    -   [Running tests](#running-tests)
			
 
				+    -   [Updating tests](#updating-tests)
			
 
				+        -   [Reviewing test deltas](#reviewing-test-deltas)
			
 
				+    -   [Verbose output](#verbose-output)
			
 
				+    -   [Stack traces](#stack-traces)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Lex
			
 
				+
			
 
				+New lexed tokens must be added to
			
 
				+[token_kind.def](/toolchain/lex/token_kind.def). `CARBON_SYMBOL_TOKEN` and
			
 
				+`CARBON_KEYWORD_TOKEN` both provide some built-in lexing logic, while
			
 
				+`CARBON_TOKEN` requires custom lexing support.
			
 
				+
			
 
				+[TokenizedBuffer::Lex](/toolchain/lex/tokenized_buffer.h) is the main dispatch
			
 
				+for lexing, and calls that need to do custom lexing will be dispatched there.
			
 
				+
			
 
				+## Parse
			
 
				+
			
 
				+A parser feature will have state transitions that produce new parse nodes.
			
 
				+
			
 
				+The resulting parse nodes are in
			
 
				+[parse/node_kind.def](/toolchain/parse/node_kind.def) and
			
 
				+[typed_nodes.h](/toolchain/parse/typed_nodes.h). When choosing node structure,
			
 
				+consider how semantics will process it in post-order; this will rule out some
			
 
				+designs. Adding a parse node kind will also require a handler in the `Check`
			
 
				+step.
			
 
				+
			
 
				+The state transitions are in [parse/state.def](/toolchain/parse/state.def). Each
			
 
				+`CARBON_PARSER_STATE` defines a distinct state and has comments for state
			
 
				+transitions. If several states should share handling, name them
			
 
				+`FeatureAsVariant`.
			
 
				+
			
 
				+Adding a state requires adding a `Handle<name>` function in an appropriate
			
 
				+`parse/handle_*.cpp` file, possibly a new file. The macros are used to generate
			
 
				+declarations in the header, so only extra helper functions should be added
			
 
				+there. Every state handler pops the state from the stack before any other
			
 
				+processing.
			
 
				+
			
 
				+### Typed parse node metadata implementation
			
 
				+
			
 
				+As of [#3534](https://github.com/carbon-language/carbon-lang/pull/3534):
			
 
				+
			
 
				+![parse](parse.svg)
			
 
				+
			
 
				+> TODO: Convert this chart to Mermaid.
			
 
				+
			
 
				+-   [common/enum_base.h](/common/enum_base.h) defines the `EnumBase`
			
 
				+    [CRTP](idioms.md#crtp-or-curiously-recurring-template-pattern) class
			
 
				+    extending `Printable` from [common/ostream.h](/common/ostream.h), along with
			
 
				+    `CARBON_ENUM` macros for making enumerations
			
 
				+
			
 
				+-   [parse/node_kind.h](/toolchain/parse/node_kind.h) includes
			
 
				+    [common/enum_base.h](/common/enum_base.h) and defines an enumeration
			
 
				+    `NodeKind`, along with bitmask enum `NodeCategory`.
			
 
				+
			
 
				+    -   The `NodeKind` enumeration is populated with the list of all parse node
			
 
				+        kinds using [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
			
 
				+        [the .def file idiom](idioms.md#def-files)) _declared_ in this file
			
 
				+        using a macro from [common/enum_base.h](/common/enum_base.h)
			
 
				+
			
 
				+    -   `NodeKind` has a member type `NodeKind::Definition` that extends
			
 
				+        `NodeKind` and adds a `NodeCategory` field (and others in the future).
			
 
				+
			
 
				+    -   `NodeKind` has a method `Define` for creating a `NodeKind::Definition`
			
 
				+        with the same enumerant value, plus values for the other fields.
			
 
				+
			
 
				+    -   `HasKindMember<T>` at the bottom of
			
 
				+        [parse/node_kind.h](/toolchain/parse/node_kind.h) uses
			
 
				+        [field detection](idioms.md#field-detection) to determine if the type
			
 
				+        `T` has a `NodeKind::Definition Kind` static constant member.
			
 
				+
			
 
				+        -   Note: both the type and name of these fields must match exactly.
			
 
				+
			
 
				+    -   Note that additional information is needed to define the `category()`
			
 
				+        method (and other methods in the future) of `NodeKind`. This information
			
 
				+        comes from the typed parse node definitions in
			
 
				+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) (described below).
			
 
				+
			
 
				+-   [parse/node_ids.h](/toolchain/parse/node_ids.h) defines a number of types
			
 
				+    that store a _node id_ that identifies a node in the parse tree
			
 
				+
			
 
				+    -   `NodeId` stores a node id with no restrictions
			
 
				+
			
 
				+    -   `NodeIdForKind<Kind>` inherits from `NodeId` and stores the id of a node
			
 
				+        that must have the specified `NodeKind` "`Kind`". Note that this is not
			
 
				+        used directly, instead aliases `FooId` for
			
 
				+        `NodeIdForKind<NodeKind::Foo>` are defined for every node kind using
			
 
				+        [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
			
 
				+        [the .def file idiom](idioms.md#def-files)).
			
 
				+
			
 
				+    -   `NodeIdInCategory<Category>` inherits from `NodeId` and stores the id of
			
 
				+        a node that must overlap the specified `NodeCategory` "`Category`". Note
			
 
				+        that this is not typically used directly, instead this file defines
			
 
				+        aliases `AnyDeclId`, `AnyExprId`, ..., `AnyStatementId`.
			
 
				+
			
 
				+    -   Similarly `NodeIdOneOf<T, U>` and `NodeIdNot<V>` inherit from `NodeId`
			
 
				+        and stores the id of a node restricted to either matching `T::Kind` or
			
 
				+        `U::Kind` or not matching `V::Kind`.
			
 
				+    -   In addition to the node id type definitions above, the struct
			
 
				+        `NodeForId<T>` is declared but not defined.
			
 
				+
			
 
				+-   [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) defines a typed parse
			
 
				+    node struct type for each kind of parse node.
			
 
				+
			
 
				+    -   Each one defines a static constant named `Kind` that is set using a call
			
 
				+        to `Define()` on the corresponding enumerant member of `NodeKind` from
			
 
				+        [parse/node_kind.h](/toolchain/parse/node_kind.h) (which is included by
			
 
				+        this file).
			
 
				+    -   The fields of these types specify the children of the parse node using
			
 
				+        the types from [parse/node_ids.h](/toolchain/parse/node_ids.h).
			
 
				+
			
 
				+    -   The struct `NodeForId<T>` that is declared in
			
 
				+        [parse/node_ids.h](/toolchain/parse/node_ids.h) is defined in this file
			
 
				+        such that `NodeForId<FooId>::TypedNode` is the `Foo` typed parse node
			
 
				+        struct type.
			
 
				+
			
 
				+    -   This file will fail to compile unless every kind of parse node kind
			
 
				+        defined in [parse/node_kind.def](/toolchain/parse/node_kind.def) has a
			
 
				+        corresponding struct type in this file.
			
 
				+
			
 
				+-   [parse/node_kind.cpp](/toolchain/parse/node_kind.cpp) includes both
			
 
				+    [parse/node_kind.h](/toolchain/parse/node_kind.h) and
			
 
				+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h)
			
 
				+
			
 
				+    -   Uses the macro from [common/enum_base.h](/common/enum_base.h), the
			
 
				+        enumerants of `NodeKind` are _defined_ using the list of parse node
			
 
				+        kinds from [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
			
 
				+        [the .def file idiom](idioms.md#def-files)).
			
 
				+
			
 
				+    -   `NodeKind::definition()` is defined. It has a static table of
			
 
				+        `const NodeKind::Definition*` indexed by the enum value, populated by
			
 
				+        taking the address of the `Kind` member of each typed parse node struct
			
 
				+        type, using the list from
			
 
				+        [parse/node_kind.def](/toolchain/parse/node_kind.def).
			
 
				+
			
 
				+    -   `NodeKind::category()` is defined using `NodeKind::definition()`.
			
 
				+
			
 
				+    -   Tested assumption: the tables built in this file are indexed by the enum
			
 
				+        values. We rely on the fact that we get the parse node kinds in the same
			
 
				+        order by consistently using
			
 
				+        [parse/node_kind.def](/toolchain/parse/node_kind.def).
			
 
				+
			
 
				+-   [parse/tree.h](/toolchain/parse/tree.h) includes
			
 
				+    [parse/node_ids.h](/toolchain/parse/node_ids.h). It does not depend on
			
 
				+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) to reduce compilation
			
 
				+    time in those files that don't use the typed parse node struct types.
			
 
				+
			
 
				+    -   Defines `Tree::Extract`... functions that take a node id and return a
			
 
				+        typed parse node struct type from
			
 
				+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h).
			
 
				+
			
 
				+    -   Uses `HasKindMember<T>` to restrict calling `ExtractAs` except on typed
			
 
				+        nodes defined in [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h).
			
 
				+
			
 
				+    -   `Tree::Extract` uses `NodeForId<T>` to get the corresponding typed parse
			
 
				+        node struct type for a `FooId` type defined in
			
 
				+        [parse/node_ids.h](/toolchain/parse/node_ids.h).
			
 
				+
			
 
				+        -   Note that this is done without a dependency on the typed parse node
			
 
				+            struct types by using the forward declaration of `NodeForId<T>` from
			
 
				+            [parse/node_ids.h](/toolchain/parse/node_ids.h).
			
 
				+
			
 
				+    -   The `Tree::Extract`... functions ultimately call
			
 
				+        `Tree::TryExtractNodeFromChildren<T>`, which is a templated function
			
 
				+        only declared in this file. Its definition is in
			
 
				+        [parse/extract.cpp](/toolchain/parse/extract.cpp).
			
 
				+
			
 
				+-   [parse/extract.cpp](/toolchain/parse/extract.cpp) includes
			
 
				+    [parse/tree.h](/toolchain/parse/tree.h) and
			
 
				+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h)
			
 
				+
			
 
				+    -   Defines struct `Extractable<T>` that defines how to extract a field of
			
 
				+        type `T` from a `Tree::SiblingIterator` pointing at the corresponding
			
 
				+        child node.
			
 
				+
			
 
				+    -   `Extractable<T>` is defined for the node id types defined in
			
 
				+        [parse/node_ids.h](/toolchain/parse/node_ids.h).
			
 
				+
			
 
				+    -   In addition, `Extractable<T>` is defined for standard types
			
 
				+        `std::optional<U>` and `llvm::SmallVector<V>`, to support optional and
			
 
				+        repeated children.
			
 
				+
			
 
				+    -   Uses [struct reflection](idioms.md#struct-reflection) to support
			
 
				+        aggregate struct types containing extractable fields. This is used to
			
 
				+        support typed parse node struct types as well as struct fields that they
			
 
				+        contain.
			
 
				+
			
 
				+    -   Uses `HasKindMember<Foo>` to detect accidental uses of a parse node type
			
 
				+        directly as fields of typed parse node struct types -- in those places
			
 
				+        `FooId` should be used instead.
			
 
				+
			
 
				+    -   Defines `Tree::TryExtractNodeFromChildren<T>` and explicitly
			
 
				+        instantiates it for every typed parse node struct type defined in
			
 
				+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) using
			
 
				+        [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
			
 
				+        [the .def file idiom](idioms.md#def-files)). By explicitly instantiating
			
 
				+        this function only in this file, we avoid redundant compilation work,
			
 
				+        which reduces build times, and allow us to keep all the extraction
			
 
				+        machinery as a private implementation detail of this file.
			
 
				+
			
 
				+-   [parse/typed_nodes_test.cpp](/toolchain/parse/typed_nodes_test.cpp)
			
 
				+    validates that each typed parse node struct type has a static `Kind` member
			
 
				+    that defines the correct corresponding `NodeKind`, and that the `category()`
			
 
				+    function agrees between the `NodeKind` and `NodeKind::Definition`.
			
 
				+
			
 
				+Note: this is broadly similar to
			
 
				+[SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation).
			
 
				+
			
 
				+## Check
			
 
				+
			
 
				+Each parse node kind requires adding a `Handle<kind>` function in a
			
 
				+`check/handle_*.cpp` file.
			
 
				+
			
 
				+If the resulting SemIR needs a new instruction:
			
 
				+
			
 
				+-   add a new kind to [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
			
 
				+    -   Add a `CARBON_SEM_IR_INST_KIND(NewInstKindName)` line in alphabetical
			
 
				+        order
			
 
				+-   a new struct definition to
			
 
				+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h), such as:
			
 
				+
			
 
				+    ```cpp
			
 
				+    struct NewInstKindName {
			
 
				+        static constexpr auto Kind = InstKind::NewInstKindName.Define(
			
 
				+            // the name used in textual IR
			
 
				+            "new_inst_kind_name"
			
 
				+            // Optional: , TerminatorKind::KindOfTerminator
			
 
				+            );
			
 
				+
			
 
				+        // Optional: omit if not associated with a parse node.
			
 
				+        Parse::Node parse_node;
			
 
				+
			
 
				+        // Optional: omit if this sem_ir instruction does not produce a value.
			
 
				+        TypeId type_id;
			
 
				+
			
 
				+        // 0-2 id fields, with types from sem_ir/ids.h or sem_ir/builtin_kind.h
			
 
				+        // For example, fields would look like:
			
 
				+        StringId name_id;
			
 
				+        InstId value_id;
			
 
				+    };
			
 
				+    ```
			
 
				+
			
 
				+Adding an instruction will also require a handler in the Lower step.
			
 
				+
			
 
				+Most new instructions will automatically be formatted reasonably by the SemIR
			
 
				+formatter.
			
 
				+
			
 
				+If the resulting SemIR needs a new built-in, add it to
			
 
				+[builtin_inst_kind.def](/toolchain/sem_ir/builtin_inst_kind.def).
			
 
				+
			
 
				+### SemIR typed instruction metadata implementation
			
 
				+
			
 
				+How does this work? As of
			
 
				+[#3310](https://github.com/carbon-language/carbon-lang/pull/3310):
			
 
				+
			
 
				+![check](check.svg)
			
 
				+
			
 
				+> TODO: Convert this chart to Mermaid.
			
 
				+
			
 
				+-   [common/enum_base.h](/common/enum_base.h) defines the `EnumBase`
			
 
				+    [CRTP](idioms.md#crtp-or-curiously-recurring-template-pattern) class
			
 
				+    extending `Printable` from [common/ostream.h](/common/ostream.h), along with
			
 
				+    `CARBON_ENUM` macros for making enumerations
			
 
				+
			
 
				+-   [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) includes
			
 
				+    [common/enum_base.h](/common/enum_base.h) and defines an enumeration
			
 
				+    `InstKind`, along with `InstValueKind` and `TerminatorKind`.
			
 
				+
			
 
				+    -   The `InstKind` enumeration is populated with the list of all instruction
			
 
				+        kinds using [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
			
 
				+        (using [the .def file idiom](idioms.md#def-files)) _declared_ in this
			
 
				+        file using a macro from [common/enum_base.h](/common/enum_base.h)
			
 
				+
			
 
				+    -   `InstKind` has a member type `InstKind::Definition` that extends
			
 
				+        `InstKind` and adds the `ir_name` string field, and a `TerminatorKind`
			
 
				+        field.
			
 
				+
			
 
				+    -   `InstKind` has a method `Define` for creating a `InstKind::Definition`
			
 
				+        with the same enumerant value, plus values for the other fields.
			
 
				+
			
 
				+-   Note that additional information is needed to define the `ir_name()`,
			
 
				+    `value_kind()`, and `terminator_kind()` methods of `InstKind`. This
			
 
				+    information comes from the typed instruction definitions in
			
 
				+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h).
			
 
				+
			
 
				+-   [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) defines a typed
			
 
				+    instruction struct type for each kind of SemIR instruction, as described
			
 
				+    above.
			
 
				+
			
 
				+    -   Each one defines a static constant named `Kind` that is set using a call
			
 
				+        to `Define()` on the corresponding enumerant member of `InstKind` from
			
 
				+        [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) (which is included
			
 
				+        by this file).
			
 
				+
			
 
				+-   `HasParseNodeMember<TypedInst>` and `HasTypeIdMember<TypedInst>` at the
			
 
				+    bottom of [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) use
			
 
				+    [field detection](idioms.md#field-detection) to determine if `TypedInst` has
			
 
				+    a `Parse::Node parse_node` or a `TypeId type_id` field respectively.
			
 
				+
			
 
				+    -   Note: both the type and name of these fields must match exactly.
			
 
				+
			
 
				+-   [sem_ir/inst_kind.cpp](/toolchain/sem_ir/inst_kind.cpp) includes both
			
 
				+    [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) and
			
 
				+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h)
			
 
				+
			
 
				+    -   Uses the macro from [common/enum_base.h](/common/enum_base.h), the
			
 
				+        enumerants of `InstKind` are _defined_ using the list of instruction
			
 
				+        kinds from [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
			
 
				+        (using [the .def file idiom](idioms.md#def-files))
			
 
				+
			
 
				+    -   `InstKind::value_kind()` is defined. It has a static table of
			
 
				+        `InstValueKind` values indexed by the enum value, populated by applying
			
 
				+        `HasTypeIdMember` from
			
 
				+        [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) to every
			
 
				+        instruction kind by using the list from
			
 
				+        [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
			
 
				+    -   `InstKind::definition()` is defined. It has a static table of
			
 
				+        `const InstKind::Definition*` indexed by the enum value, populated by
			
 
				+        taking the address of the `Kind` member of each `TypedInst`, using the
			
 
				+        list from [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
			
 
				+
			
 
				+    -   `InstKind::ir_name()` and `InstKind::terminator_kind()` are defined
			
 
				+        using `InstKind::definition()`.
			
 
				+    -   Tested assumption: the tables built in this file are indexed by the enum
			
 
				+        values. We rely on the fact that we get the instruction kinds in the
			
 
				+        same order by consistently using
			
 
				+        [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
			
 
				+
			
 
				+    -   This file will fail to compile unless every kind of SemIR instruction
			
 
				+        defined in [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def) has a
			
 
				+        corresponding struct type in
			
 
				+        [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h).
			
 
				+
			
 
				+-   `TypedInstArgsInfo<TypedInst>` defined in
			
 
				+    [sem_ir/inst.h](/toolchain/sem_ir/inst.h) uses
			
 
				+    [struct reflection](idioms.md#struct-reflection) to determine the other
			
 
				+    fields from `TypedInst`. It skips the `parse_node` and `type_id` fields
			
 
				+    using `HasParseNodeMember<TypedInst>` and `HasTypeIdMember<TypedInst>`.
			
 
				+
			
 
				+    -   Tested assumption: the `parse_node` and `type_id` are the first fields
			
 
				+        in `TypedInst`, and there are at most two more fields.
			
 
				+
			
 
				+-   [sem_ir/inst.h](/toolchain/sem_ir/inst.h) defines templated conversions
			
 
				+    between `Inst` and each of the typed instruction structs:
			
 
				+
			
 
				+    -   Uses `TypedInstArgsInfo<TypedInst>`, `HasParseNodeMember<TypedInst>`,
			
 
				+        and `HasTypeIdMember<TypedInst>`, and
			
 
				+        [local lambda](idioms.md#local-lambdas-to-reduce-duplicate-code).
			
 
				+
			
 
				+    -   Defines a templated `ToRaw` function that converts the various id field
			
 
				+        types to an `int32_t`.
			
 
				+    -   Defines a templated `FromRaw<T>` function that converts an `int32_t` to
			
 
				+        `T` to perform the opposite conversion.
			
 
				+    -   Tested assumption: The `parse_node` field is first, when present, and
			
 
				+        the `type_id` is next, when present, in each `TypedInst` struct type.
			
 
				+
			
 
				+-   The "tested assumptions" above are all tested by
			
 
				+    [sem_ir/typed_insts_test.cpp](/toolchain/sem_ir/typed_insts_test.cpp)
			
 
				+
			
 
				+## Lower
			
 
				+
			
 
				+Each SemIR instruction requires adding a `Handle<kind>` function in a
			
 
				+`lower/handle_*.cpp` file.
			
 
				+
			
 
				+## Tests and debugging
			
 
				+
			
 
				+### Running tests
			
 
				+
			
 
				+Tests are run in bulk as `bazel test //toolchain/...`. Many tests are using the
			
 
				+file_test infrastructure; see
			
 
				+[testing/file_test/README.md](/testing/file_test/README.md) for information.
			
 
				+
			
 
				+There are several supported ways to run Carbon on a given test file. For
			
 
				+example, with `toolchain/parse/testdata/basics/empty.carbon`:
			
 
				+
			
 
				+-   `bazel test //toolchain/testing:file_test --test_arg=--file_tests=toolchain/parse/testdata/basics/empty.carbon`
			
 
				+    -   Executes an individual test.
			
 
				+-   `bazel run //toolchain/parse:testdata/basics/empty.carbon.run`
			
 
				+    -   Runs `carbon` on the file with standard arguments, printing output to
			
 
				+        console.
			
 
				+    -   This form will often be most useful when iterating over a specific test.
			
 
				+-   `bazel run //toolchain/parse:testdata/basics/empty.carbon.verbose`
			
 
				+    -   Similar to the previous command, but with the `-v` flag implied.
			
 
				+-   `bazel run //toolchain/driver:carbon -- compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon`
			
 
				+    -   Explicitly runs `carbon` with the provided arguments.
			
 
				+-   `bazel-bin/toolchain/driver/carbon compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon`
			
 
				+    -   Similar to the previous command, but without using `bazel`.
			
 
				+
			
 
				+### Updating tests
			
 
				+
			
 
				+The `toolchain/autoupdate_testdata.py` script can be used to update output. It
			
 
				+invokes the `file_test` autoupdate support. See
			
 
				+[testing/file_test/README.md](/testing/file_test/README.md) for file syntax.
			
 
				+
			
 
				+#### Reviewing test deltas
			
 
				+
			
 
				+Using `autoupdate_testdata.py` can be useful to produce deltas during the
			
 
				+development process because it allows `git status` and `git diff` to be used to
			
 
				+examine what changed.
			
 
				+
			
 
				+### Verbose output
			
 
				+
			
 
				+The `-v` flag can be passed to trace state, and should be specified before the
			
 
				+subcommand name: `carbon -v compile ...`. `CARBON_VLOG` is used to print output
			
 
				+in this mode. There is currently no control over the degree of verbosity.
			
 
				+
			
 
				+### Stack traces
			
 
				+
			
 
				+While the iterative processing pattern means function stack traces will have
			
 
				+minimal context for how the current function is reached, we use LLVM's
			
 
				+`PrettyStackTrace` to include details about the state stack. The state stack
			
 
				+will be above the function stack in crash output.
			
--- a/toolchain/docs/check.md
+++ b/toolchain/docs/check.md
@@ -0,0 +1,616 @@
 
				+# Check
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+-   [Postorder processing](#postorder-processing)
			
 
				+-   [Key IR concepts](#key-ir-concepts)
			
 
				+    -   [Parameters and arguments](#parameters-and-arguments)
			
 
				+-   [SemIR textual format](#semir-textual-format)
			
 
				+    -   [Raw form](#raw-form)
			
 
				+    -   [Formatted IR](#formatted-ir)
			
 
				+        -   [Instructions](#instructions)
			
 
				+        -   [Top-level entities](#top-level-entities)
			
 
				+-   [Core loop](#core-loop)
			
 
				+    -   [Node stack](#node-stack)
			
 
				+    -   [Delayed evaluation (not yet implemented)](#delayed-evaluation-not-yet-implemented)
			
 
				+    -   [Templates (not yet implemented)](#templates-not-yet-implemented)
			
 
				+    -   [Rewrites](#rewrites)
			
 
				+-   [Types](#types)
			
 
				+    -   [Type printing (not yet implemented)](#type-printing-not-yet-implemented)
			
 
				+-   [Expression categories](#expression-categories)
			
 
				+    -   [ExprCategory::NotExpression](#exprcategorynotexpression)
			
 
				+    -   [ExprCategory::Value](#exprcategoryvalue)
			
 
				+    -   [ExprCategory::DurableReference and ExprCategory::EphemeralReference](#exprcategorydurablereference-and-exprcategoryephemeralreference)
			
 
				+    -   [ExprCategory::Initializing](#exprcategoryinitializing)
			
 
				+    -   [ExprCategory::Mixed](#exprcategorymixed)
			
 
				+    -   [Value bindings](#value-bindings)
			
 
				+-   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+Check takes the parse tree and generates a semantic intermediate representation,
			
 
				+or SemIR. This will look closer to a series of instructions, in preparation for
			
 
				+transformation to LLVM IR. Semantic analysis and type checking occurs during the
			
 
				+production of SemIR. It also does any validation that requires context.
			
 
				+
			
 
				+## Postorder processing
			
 
				+
			
 
				+The checking step is oriented on postorder processing on the `Parse::Tree` to
			
 
				+iterate through the `Parse::NodeImpl` vectorized storage once, in order, as much
			
 
				+as possible. This is primarily for performance, but also relies on the
			
 
				+[information accumulation principle](/docs/project/principles/information_accumulation.md):
			
 
				+that is, when that principle applies, we should be able to generate IR
			
 
				+immediately because we can rely on the principle that when a line is processed,
			
 
				+the information necessary to semantically check that line is already available.
			
 
				+
			
 
				+Indirectly, what this really means is that we should be able to go from a
			
 
				+Parse::Tree (which cannot be used for name lookups) to a SemIR with name lookups
			
 
				+completed in a single pass. The SemIR should not need to be re-processed to add
			
 
				+more information outside of templates. By doing this, we avoid an additional
			
 
				+processing pass with associated storage needs.
			
 
				+
			
 
				+This single-pass approach also means that the checking step does not make use of
			
 
				+the tree structure of the `Parse::Tree`. In cases where the actions performed
			
 
				+for a parse tree node depend on the context in which that node appears, a node
			
 
				+that is visited earlier in the postorder traversal, such as a bracketing node,
			
 
				+needs to establish the necessary context. In this respect, the sequence of
			
 
				+`Parse::Node`s can be thought of as a byte code input that the check step
			
 
				+interprets to build the `SemIR`.
			
 
				+
			
 
				+## Key IR concepts
			
 
				+
			
 
				+A `SemIR::Inst` is the basic building block that represents a simple
			
 
				+instruction, such as an operator or declaring a literal. For each kind of
			
 
				+instruction, a typedef for that specific kind of instruction is provided in the
			
 
				+`SemIR` namespace. For example, `SemIR::Assign` represents an assignment
			
 
				+instruction, and `SemIR::PointerType` represents a pointer type instruction.
			
 
				+
			
 
				+Each instruction class has up to four public data members describing the
			
 
				+instruction, as described in
			
 
				+[sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) (also see
			
 
				+[adding features for Check](adding_features.md#check)):
			
 
				+
			
 
				+-   A `Parse::Node parse_node;` member that tracks its location is present on
			
 
				+    almost all instructions, except instructions like `SemIR::Builtin` that
			
 
				+    don't have an associated location.
			
 
				+
			
 
				+-   A `SemIR::TypeId type_id;` member that describes the type of the instruction
			
 
				+    is present on all instructions that produce a value. This includes namespace
			
 
				+    instructions, which are modeled as producing a value of "namespace" type,
			
 
				+    even though they can't be used as a first-class value in Carbon expressions.
			
 
				+
			
 
				+-   Up to two additional, kind-specific members. For example `SemIR::Assign` has
			
 
				+    members `InstId lhs_id` and `InstId rhs_id`.
			
 
				+
			
 
				+Instructions are stored as type-erased `SemIR::Inst` objects, which store the
			
 
				+instruction kind and the (up to) four fields described above. This balances the
			
 
				+size of `SemIR::Inst` against the overhead of indirection.
			
 
				+
			
 
				+A `SemIR::InstBlock` can represent a code block. However, it can also be created
			
 
				+when a series of instructions needs to be closely associated, such as a
			
 
				+parameter list.
			
 
				+
			
 
				+A `SemIR::Builtin` represents a language built-in, such as the unconstrained
			
 
				+facet type `type`. We will also have built-in functions which would need to form
			
 
				+the implementation of some library types, such as `i32`. Built-ins are in a
			
 
				+stable index across `SemIR` instances.
			
 
				+
			
 
				+### Parameters and arguments
			
 
				+
			
 
				+Parameters and arguments will be stored as two `SemIR::InstBlock`s each. The
			
 
				+first will contain the full IR, while the second will contain references to the
			
 
				+last instruction for each parameter or argument. The references block will have
			
 
				+a size equal to the number of parameters or arguments, allowing for quick size
			
 
				+comparisons and indexed access.
			
 
				+
			
 
				+## SemIR textual format
			
 
				+
			
 
				+There are two textual ways to view `SemIR`.
			
 
				+
			
 
				+### Raw form
			
 
				+
			
 
				+The raw form of SemIR shows the details of the representation, such as numeric
			
 
				+instruction and block IDs. The representation is intended to very closely match
			
 
				+the `SemIR::File` and `SemIR::Inst` representations. This can be useful when
			
 
				+debugging low-level issues with the `SemIR` representation.
			
 
				+
			
 
				+The driver will print this when passed `--dump-raw-sem-ir`.
			
 
				+
			
 
				+### Formatted IR
			
 
				+
			
 
				+In addition to the raw form, there is a higher-level formatted IR that aims to
			
 
				+be human readable. This is used in most `check` tests to validate the output,
			
 
				+and also expected to be used regularly by toolchain developers to inspect the
			
 
				+result of checking the parse tree.
			
 
				+
			
 
				+The driver will print this when passed `--dump-sem-ir`.
			
 
				+
			
 
				+Unlike the raw form, certain representational choices in the `SemIR` data may
			
 
				+not be visible in this form. However, it is intended to be possible to parse the
			
 
				+`SemIR` output and form an equivalent – but not necessarily identical – `SemIR`
			
 
				+representation, although no such parser currently exists.
			
 
				+
			
 
				+As an example, given the program:
			
 
				+
			
 
				+```carbon
			
 
				+fn Cond() -> bool;
			
 
				+fn Run() -> i32 { return if Cond() then 1 else 2; }
			
 
				+```
			
 
				+
			
 
				+The formatted IR is currently:
			
 
				+
			
 
				+```
			
 
				+constants {
			
 
				+  %.1: i32 = int_literal 1 [template]
			
 
				+  %.2: i32 = int_literal 2 [template]
			
 
				+}
			
 
				+
			
 
				+file {
			
 
				+  package: <namespace> = namespace [template] {
			
 
				+    .Cond = %Cond
			
 
				+    .Run = %Run
			
 
				+  }
			
 
				+  %Cond: <function> = fn_decl @Cond [template] {
			
 
				+    %return.var.loc1: ref bool = var <return slot>
			
 
				+  }
			
 
				+  %Run: <function> = fn_decl @Run [template] {
			
 
				+    %return.var.loc2: ref i32 = var <return slot>
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+fn @Cond() -> bool;
			
 
				+
			
 
				+fn @Run() -> i32 {
			
 
				+!entry:
			
 
				+  %Cond.ref: <function> = name_ref Cond, file.%Cond [template = file.%Cond]
			
 
				+  %.loc2_33.1: init bool = call %Cond.ref()
			
 
				+  %.loc2_26.1: bool = value_of_initializer %.loc2_33.1
			
 
				+  %.loc2_33.2: bool = converted %.loc2_33.1, %.loc2_26.1
			
 
				+  if %.loc2_33.2 br !if.expr.then else br !if.expr.else
			
 
				+
			
 
				+!if.expr.then:
			
 
				+  %.loc2_41: i32 = int_literal 1 [template = constants.%.1]
			
 
				+  br !if.expr.result(%.loc2_41)
			
 
				+
			
 
				+!if.expr.else:
			
 
				+  %.loc2_48: i32 = int_literal 2 [template = constants.%.2]
			
 
				+  br !if.expr.result(%.loc2_48)
			
 
				+
			
 
				+!if.expr.result:
			
 
				+  %.loc2_26.2: i32 = block_arg !if.expr.result
			
 
				+  return %.loc2_26.2
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+There are three kinds of names in formatted IR, which are distinguished by their
			
 
				+leading sigils:
			
 
				+
			
 
				+-   `%name` denotes a value produced by an instruction. These names are
			
 
				+    introduced by a line of the form `%name: <category> <type> = <instruction>`,
			
 
				+    and are scoped to the enclosing top-level entity. `<category>` describes the
			
 
				+    [expression category](#expression-categories), which is `init` for an
			
 
				+    initializing expression, `ref` for a reference expression, or omitted for a
			
 
				+    value expression. Typically, values can only be referenced by instructions
			
 
				+    that their introduction
			
 
				+    [dominates](<https://en.wikipedia.org/wiki/Dominator_(graph_theory)>), but
			
 
				+    some kinds of instruction might have other rules. Names in the `file` block
			
 
				+    can be referenced as `file.%<name>`.
			
 
				+
			
 
				+-   `!name` denotes a label, and `!name:` appears as a prefix of each
			
 
				+    `InstBlock` in a `Function`. These names are scoped to their enclosing
			
 
				+    function, and can be referenced anywhere in that function, but not outside.
			
 
				+
			
 
				+-   `@name` denotes a top-level entity, such as a function, class, or interface.
			
 
				+    The SemIR view of these entities is flattened, so member functions are
			
 
				+    treated as top-level entities.
			
 
				+
			
 
				+Names in formatted IR are all invented by the formatter, and generally are of
			
 
				+the form `<base_name>[.loc<line>[_<col>[.<counter>]]]` where `<line>` and
			
 
				+`<col>` describe the location of the instruction, and `<counter>` is used as a
			
 
				+disambiguator if multiple instructions appear at the same location. Trailing
			
 
				+name components are only included if they are necessary to disambiguate the
			
 
				+name. `<base_name>` is a guessed good name for the instruction, often derived
			
 
				+from source-level identifiers, and is empty if no guess was made.
			
 
				+
			
 
				+#### Instructions
			
 
				+
			
 
				+There is usually one line in a `InstBlock` for each `Inst`. You can find the
			
 
				+documentation for the different kinds of instructions in
			
 
				+[toolchain/sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h). For example,
			
 
				+given a formatted SemIR line like:
			
 
				+
			
 
				+```
			
 
				+%N: i32 = assoc_const_decl N [template]
			
 
				+```
			
 
				+
			
 
				+you would look for a `struct` definition that uses `"assoc_const_decl"` as its
			
 
				+`ir_name`. In this case, this is the `AssociatedConstantDecl` instruction:
			
 
				+
			
 
				+```cpp
			
 
				+// An associated constant declaration in an interface, such as `let T:! type;`.
			
 
				+struct AssociatedConstantDecl {
			
 
				+  static constexpr auto Kind =
			
 
				+      InstKind::AssociatedConstantDecl.Define<Parse::NodeId>(
			
 
				+          {.ir_name = "assoc_const_decl", .is_lowered = false});
			
 
				+
			
 
				+  TypeId type_id;
			
 
				+  NameId name_id;
			
 
				+};
			
 
				+```
			
 
				+
			
 
				+Since this instruction produces a value, it has a `TypeId type_id` field, which
			
 
				+corresponds to the type written between the `:` and the `=`. In the example
			
 
				+above, that type is `i32`. The other arguments to the instruction are written
			
 
				+after the `ir_name` -- in this example the `name_id` is `N`. From this we find
			
 
				+that the instruction corresponds to an associated constant declaration in an
			
 
				+interface like `let N:! i32;`.
			
 
				+
			
 
				+Instructions producing a constant value, like `assoc_const_decl` above, are
			
 
				+followed by their phase, either `[symbolic]` or `[template]`, and then `=` the
			
 
				+value if it is the value of a different instruction.
			
 
				+
			
 
				+Instructions that do not produce a value, such as the `br` and `return`
			
 
				+instructions above, omit the leading `%name: ... =` prefix, as they cannot be
			
 
				+named by other instructions. These instructions do not have a `TypeId type_id`
			
 
				+field, like the `AdaptDecl` instruction:
			
 
				+
			
 
				+```cpp
			
 
				+// An adapted type declaration in a class, of the form `adapt T;`.
			
 
				+struct AdaptDecl {
			
 
				+  static constexpr auto Kind = InstKind::AdaptDecl.Define<Parse::AdaptDeclId>(
			
 
				+      {.ir_name = "adapt_decl", .is_lowered = false});
			
 
				+
			
 
				+  // No type_id; this is not a value.
			
 
				+  TypeId adapted_type_id;
			
 
				+};
			
 
				+```
			
 
				+
			
 
				+An `adapt SomeClass;` declaration would have the corresponding SemIR formatted
			
 
				+as:
			
 
				+
			
 
				+```
			
 
				+adapt_decl %SomeClass
			
 
				+```
			
 
				+
			
 
				+Some instructions have special argument handling. For example, some invalid
			
 
				+arguments will be omitted. Or an `InstBlockId` argument will be rendered inline,
			
 
				+commonly enclosed in braces `{`...`}` or parens `(`...`)`. In other cases, the
			
 
				+formatter will combine instructions together to make the IR more readable:
			
 
				+
			
 
				+-   A terminator sequence in a block, comprising a sequence of `BranchIf`
			
 
				+    instructions followed by a `Branch` or `BranchWithArg` instruction, is
			
 
				+    collapsed into a single
			
 
				+    `if %cond br !label1 else if ... else br !labelN(%arg)` line.
			
 
				+-   A struct type, formed by a sequence of `StructTypeField` instructions
			
 
				+    followed by a `StructType` instruction, is collapsed into a single
			
 
				+    `struct_type{.field1: %value1, ..., .fieldN: %valueN}` line.
			
 
				+
			
 
				+These exceptions may be found in
			
 
				+[toolchain/sem_ir/formatter.cpp](/toolchain/sem_ir/formatter.cpp).
			
 
				+
			
 
				+#### Top-level entities
			
 
				+
			
 
				+**Question:** Are these too in flux to document at this time?
			
 
				+
			
 
				+-   `constants`: TODO
			
 
				+-   `imports`: TODO
			
 
				+-   `file`: TODO
			
 
				+-   entities
			
 
				+    -   TODO: may be preceded by `extern`.
			
 
				+    -   TODO: may be preceded by `generic`.
			
 
				+        -   These may have an optional `!definition:` section containing the
			
 
				+            generic's `definition_block_id`.
			
 
				+    -   `fn`: TODO; followed by `= "`...`"` for builtins
			
 
				+    -   `class`: TODO
			
 
				+    -   `interface`: TODO
			
 
				+    -   `impl`: TODO
			
 
				+-   `specific`: TODO
			
 
				+    -   body in braces `{`...`}` has a bunch of
			
 
				+        ``<generic parameter> => <specific value>` assignment lines
			
 
				+    -   The first lines of the body describe the declaration
			
 
				+    -   If there is a valid definition, there are additional definition
			
 
				+        assignments after a `!definition:` line.
			
 
				+
			
 
				+## Core loop
			
 
				+
			
 
				+The core loop is `Check::CheckParseTree`. This loops through the `Parse::Tree`
			
 
				+and calls a `Handle`... function corresponding to the `NodeKind` of each node.
			
 
				+Communication between these functions for different nodes working together is
			
 
				+through the `Context` object defined in
			
 
				+[check/context.h](/toolchain/check/context.h), which stores things in a
			
 
				+collection of stacks. The common pattern is that the children of a node are
			
 
				+processed first. They produce information that is then consumed when processing
			
 
				+the parent node.
			
 
				+
			
 
				+One example of this pattern is expressions. Each subexpression outputs SemIR
			
 
				+instructions to compute the value of that subexpression to the current
			
 
				+instruction block, added to the top of the `InstBlockStack` stored in the
			
 
				+`Context` object. It leaves an instruction id on the top of the
			
 
				+[node stack](#node-stack) pointing to the instruction that produces the value of
			
 
				+that subexpression. Those are consumed by parent operations, like an
			
 
				+[RPN](https://en.wikipedia.org/wiki/Reverse_Polish_notation) calculator. For
			
 
				+example, the expression `1 * 2 + 3` corresponds to this parse tree:
			
 
				+
			
 
				+```yaml
			
 
				+    {kind: 'IntegerLiteral', text: '1'},
			
 
				+    {kind: 'IntegerLiteral', text: '2'},
			
 
				+  {kind: 'InfixOperator', text: '*', subtree_size: 3},
			
 
				+  {kind: 'IntegerLiteral', text: '3'},
			
 
				+{kind: 'InfixOperator', text: '+', subtree_size: 5},
			
 
				+```
			
 
				+
			
 
				+This parse tree is processed by one call to a `Handle` function per node:
			
 
				+
			
 
				+-   The first node is an integer literal, so the core loop calls
			
 
				+    `HandleIntegerLiteral`.
			
 
				+
			
 
				+    -   It calls `context::AddInstAndPush` to output a `SemIR::IntegerLiteral`
			
 
				+        instruction to the current instruction block, and pushes the parse node
			
 
				+        along with the instruction id to the [node stack](#node-stack).
			
 
				+
			
 
				+-   The second node is also an integer literal, which outputs a second
			
 
				+    instruction and pushes another entry onto the node stack.
			
 
				+
			
 
				+-   `HandleInfixOperator` pops the two entries off of the node stack, outputs
			
 
				+    any conversion instructions that are needed, and uses
			
 
				+    `context::AddInstAndPush` to create and push the instruction id representing
			
 
				+    the output of a multiplication instruction. That multiplication instruction
			
 
				+    takes the instruction ids it popped off the stack at the beginning as
			
 
				+    arguments.
			
 
				+
			
 
				+-   Another integer literal instruction is created for `3` and pushed onto the
			
 
				+    stack.
			
 
				+
			
 
				+-   `HandleInfixOperator` is called again. It pops the two instruction ids off
			
 
				+    the stack to use as the arguments to the multiplication instruction it
			
 
				+    creates and pushes.
			
 
				+
			
 
				+In this way, the handle functions coordinate producing their output using the
			
 
				+instruction block stack and node block stack from the context.
			
 
				+
			
 
				+A similar pattern uses bracketing nodes to support parent nodes that can have a
			
 
				+variable number of children. For example, a `return` statement can produce parse
			
 
				+trees following a few different patterns:
			
 
				+
			
 
				+-   `return;`
			
 
				+
			
 
				+    ```yaml
			
 
				+      {kind: 'ReturnStatementStart', text: 'return'},
			
 
				+    {kind: 'ReturnStatement', text: ';', subtree_size: 2},
			
 
				+    ```
			
 
				+
			
 
				+-   `return x;`
			
 
				+
			
 
				+    ```yaml
			
 
				+      {kind: 'ReturnStatementStart', text: 'return'},
			
 
				+      {kind: 'NameExpr', text: 'x'},
			
 
				+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
			
 
				+    ```
			
 
				+
			
 
				+-   `return var;`
			
 
				+
			
 
				+    ```yaml
			
 
				+      {kind: 'ReturnStatementStart', text: 'return'},
			
 
				+      {kind: 'ReturnVarModifier', text: 'var'},
			
 
				+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
			
 
				+    ```
			
 
				+
			
 
				+In all three cases, the introducer node `ReturnStatementStart` pushes an entry
			
 
				+on the [node stack](#node-stack) with just the parse node and no id, called a
			
 
				+_solo parse node_. The handler for the parent `ReturnStatement` node can pop and
			
 
				+process entries from the node stack until it finds that solo parse node from
			
 
				+`ReturnStatementStart` that indicates it is done.
			
 
				+
			
 
				+Another pattern that arises is state is set up by an introducer node, updated by
			
 
				+its siblings, and then consumed by the bracketing parent node. FIXME: example
			
 
				+
			
 
				+### Node stack
			
 
				+
			
 
				+The node stack, defined in [check/node_stack.h](/toolchain/check/node_stack.h),
			
 
				+stores pairs of a `Parse::Node` and an id. The type of the id is determined by
			
 
				+the `NodeKind` of the parse node. It is the default, general-purpose stack used
			
 
				+by `Handle`... functions in the check stage. Using a single stack is beneficial
			
 
				+since it improves locality of reference and reduces allocations. However,
			
 
				+additional stacks are used to ensure we never need to search through the stack
			
 
				+to find data -- we always want to be operating on the top of the stack (or a
			
 
				+fixed offset).
			
 
				+
			
 
				+The node stack contains any state pushed by siblings of the current
			
 
				+`Parse::Node` at the top, and state pushed by siblings of ancestors below. The
			
 
				+boundaries between what is a sibling of the current `Parse::Node` versus what is
			
 
				+a sibling of an ancestor are not explicitly determined. Instead, the handler for
			
 
				+the parent node knows how many nodes it must pop from the stack based either on
			
 
				+knowing the fixed number of children for that node kind or popping nodes until
			
 
				+it reaches a bracketing node. The arity or bracketing node kind for each parent
			
 
				+node is documented in [parse/node_kind.def](/toolchain/parse/node_kind.def).
			
 
				+
			
 
				+When each `Parse::Node` is evaluated, the SemIR for it is typically immediately
			
 
				+generated as `SemIR::Inst`s. To help generate the IR to an appropriate context,
			
 
				+scopes have separate `SemIR::InstBlock`s.
			
 
				+
			
 
				+### Delayed evaluation (not yet implemented)
			
 
				+
			
 
				+Sometimes, nodes will need to have delayed evaluation; for example, an inline
			
 
				+definition of a class member function needs to be evaluated after the class is
			
 
				+fully declared. The `SemIR::Inst`s cannot be immediately generated because they
			
 
				+may include name references to the class. We're likely to store a reference to
			
 
				+the relevant `Parse::Node` for each definition for re-evaluation after the class
			
 
				+scope completes. This means that nodes in a definition would be traversed twice,
			
 
				+once while determining that they're inline and without full checking or IR
			
 
				+generation, then again with full checking and IR generation.
			
 
				+
			
 
				+### Templates (not yet implemented)
			
 
				+
			
 
				+Templates need to have partial semantic checking when declared, but can't be
			
 
				+fully implemented before they're instantiated against a specific type.
			
 
				+
			
 
				+We are likely to generate a partial IR for templates, allowing for checking with
			
 
				+the incomplete information in the IR. Instantiation will likely use that IR and
			
 
				+fill in the missing information, but it could also reevaluate the original
			
 
				+`Parse::Node`s with the known template state.
			
 
				+
			
 
				+### Rewrites
			
 
				+
			
 
				+Carbon relies on rewrites of code, such as rewriting the destination of an
			
 
				+initializer to a specific target object once that object is known.
			
 
				+
			
 
				+We have two ways to achieve this. One is to track the IR location of a
			
 
				+placeholder instruction and, if it needs updating, replace it with a "rewrite"
			
 
				+`SemIR::Inst` that points to a new `SemIR::InstBlock` containing the required IR
			
 
				+and specifying which value is the result of that rewrite. This is expressed in
			
 
				+SemIR as a `splice_block` instruction. Another is to track the list of
			
 
				+instructions to be created separately from the node block stack, and merge those
			
 
				+instructions into the current block once we have decided on their contents.
			
 
				+
			
 
				+## Types
			
 
				+
			
 
				+Type expressions are treated like any other expression, and are modeled as
			
 
				+`SemIR::Inst`s. The types computed by type expressions are deduplicated,
			
 
				+resulting in a canonical `SemIR::TypeId` for each distinct type.
			
 
				+
			
 
				+### Type printing (not yet implemented)
			
 
				+
			
 
				+The `TypeId` preserves only the identity of the type, not its spelling, and so
			
 
				+printing it will produce a fully-resolved type name, which isn't a great user
			
 
				+experience as it doesn't reflect how the type was written in the source code.
			
 
				+
			
 
				+Instead, when printing a type name for use in a diagnostic, we will start with
			
 
				+one of two `InstId`s:
			
 
				+
			
 
				+-   A `InstId` for a type expression that describes the way the type was
			
 
				+    computed.
			
 
				+-   A `InstId` for an expression that has the given type.
			
 
				+
			
 
				+In the former case, the type is pretty-printed by walking the type expression
			
 
				+and printing it. In the latter case, the type of the expression is reconstructed
			
 
				+based on the form of the expression: for example, to print the type of `&x`, we
			
 
				+print the type of `x` and append a `*`, being careful to take potential
			
 
				+precedence issues into account.
			
 
				+
			
 
				+TODO: This requires being able to print the type of, for example,
			
 
				+`x.foo[0].bar`, by printing only the desired portion of the type of `x`, and
			
 
				+similarly may require handling the case where the type of an expression involves
			
 
				+generic parameters whose arguments are specified by that expression. In effect,
			
 
				+the type computation performed when checking an operation is duplicated into the
			
 
				+type printing logic, but is simpler because errors don't need to be detected.
			
 
				+
			
 
				+This approach means we don't need to preserve a fully-sugared type for each
			
 
				+expression instruction. Instead, we compute that type when we need to print it.
			
 
				+
			
 
				+## Expression categories
			
 
				+
			
 
				+Each `SemIR::Inst` that has an associated type also has an expression category,
			
 
				+which describes how it produces a value of that type. These
			
 
				+`SemIR::ExprCategory` values correspond to the Carbon expression categories
			
 
				+defined in proposal
			
 
				+[#2006](https://github.com/carbon-language/carbon-lang/pull/2006):
			
 
				+
			
 
				+### ExprCategory::NotExpression
			
 
				+
			
 
				+This instruction is not an expression instruction, and doesn't have an
			
 
				+expression category. This is used for namespaces, control flow instructions, and
			
 
				+other constructs that represent some non-expression-level semantics.
			
 
				+
			
 
				+### ExprCategory::Value
			
 
				+
			
 
				+This instruction produces a value using the type's value representation.
			
 
				+Lowering the instruction will produce an LLVM value using that value
			
 
				+representation.
			
 
				+
			
 
				+### ExprCategory::DurableReference and ExprCategory::EphemeralReference
			
 
				+
			
 
				+This instruction produces a reference to an object. Lowering will produce a
			
 
				+pointer to an object representation.
			
 
				+
			
 
				+### ExprCategory::Initializing
			
 
				+
			
 
				+This instruction represents the initialization of an object. Depending on the
			
 
				+initializing representation for the type, the initializing expression
			
 
				+instruction will do one of the following:
			
 
				+
			
 
				+-   For an in-place initializing representation, the instruction will store a
			
 
				+    value to the target of the initialization.
			
 
				+
			
 
				+-   For a by-copy initializing representation, the instruction will produce an
			
 
				+    object representation by value that can be stored into the target. This is
			
 
				+    currently only used in cases where the object representation and the value
			
 
				+    representation are the same.
			
 
				+
			
 
				+-   For a type with no initializing representation, such as an empty struct or
			
 
				+    tuple, it does neither of the above things.
			
 
				+
			
 
				+Regardless of the initializing representation, an initializing expression should
			
 
				+be consumed by another instruction that finishes the initialization. For a
			
 
				+by-copy initialization, this final instruction represents the store into the
			
 
				+target, whereas in the other cases it is only used to track in SemIR how the
			
 
				+initialization was used. When an in-place initializer uses a by-copy initializer
			
 
				+as a subexpression, an `initialize_from` instruction is inserted to perform this
			
 
				+final store.
			
 
				+
			
 
				+### ExprCategory::Mixed
			
 
				+
			
 
				+This instruction represents a language construct that doesn't have a single
			
 
				+expression category. This is used for struct and tuple literals, where the
			
 
				+elements of the literal can have different expression categories. Instructions
			
 
				+with a mixed expression category are treated as a special case in conversion,
			
 
				+which recurses into the elements of those instructions before performing
			
 
				+conversions.
			
 
				+
			
 
				+### Value bindings
			
 
				+
			
 
				+A value binding represents a conversion from a reference expression to the value
			
 
				+stored in that expression. There are three important cases here:
			
 
				+
			
 
				+-   For types with a by-copy value representation, such as `i32`, a value
			
 
				+    binding represents a load from the address indicated by the reference
			
 
				+    expression.
			
 
				+
			
 
				+-   For types with a by-pointer value representation, such as arrays and large
			
 
				+    structs and tuples, a value binding implicitly takes the address of the
			
 
				+    reference expression.
			
 
				+
			
 
				+-   For structs and tuples, the value representation is a struct or tuple of the
			
 
				+    elements' value representations, which is not necessarily the same as a
			
 
				+    struct or tuple of the elements' object representations. In the case where
			
 
				+    the value representation is not a copy of, or pointer to, the object
			
 
				+    representation, `value_binding` instructions are not used, and a
			
 
				+    `tuple_value` or `struct_value` instruction is used to construct a value
			
 
				+    representation instead. `value_binding` should still be used in the case
			
 
				+    where the value and object representation are the same, but this is not yet
			
 
				+    implemented.
			
 
				+
			
 
				+## Handling Parse::Tree errors (not yet implemented)
			
 
				+
			
 
				+`Parse::Tree` errors will typically indicate that checking would error for a
			
 
				+given context. We'll want to be careful about how this is handled, but we'll
			
 
				+likely want to generate diagnostics for valid child nodes, then reduce
			
 
				+diagnostics once invalid nodes are encountered. We should be able to reasonably
			
 
				+abandon generated IR of the valid children when we encounter an invalid parent,
			
 
				+without severe effects on surrounding checks.
			
 
				+
			
 
				+For example, an invalid line of code in a function might generate some
			
 
				+incomplete IR in the function's `SemIR::InstBlock`, but that IR won't negatively
			
 
				+interfere with checking later valid lines in the same function.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Using a traditional AST representation
			
 
				+
			
 
				+Clang creates an AST as part of compilation. In Carbon, it's something we could
			
 
				+do as a step between parsing and checking, possibly replacing the SemIR. It's
			
 
				+likely that doing so would be simpler, amongst other possible trade-offs.
			
 
				+However, we think the SemIR approach is going to yield higher performance,
			
 
				+enough so that it's the chosen approach.
			
--- a/toolchain/docs/check.svg
+++ b/toolchain/docs/check.svg
--- a/toolchain/docs/diagnostics.md
+++ b/toolchain/docs/diagnostics.md
@@ -0,0 +1,230 @@
 
				+# Diagnostics
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+-   [DiagnosticEmitter](#diagnosticemitter)
			
 
				+-   [DiagnosticConsumers](#diagnosticconsumers)
			
 
				+-   [Producing diagnostics](#producing-diagnostics)
			
 
				+-   [Diagnostic registry](#diagnostic-registry)
			
 
				+-   [CARBON_DIAGNOSTIC placement](#carbon_diagnostic-placement)
			
 
				+-   [Diagnostic context](#diagnostic-context)
			
 
				+-   [Diagnostic parameter types](#diagnostic-parameter-types)
			
 
				+-   [Diagnostic message style guide](#diagnostic-message-style-guide)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+The diagnostic code is used by the toolchain to produce output.
			
 
				+
			
 
				+## DiagnosticEmitter
			
 
				+
			
 
				+[DiagnosticEmitters](/toolchain/diagnostics/diagnostic_emitter.h) handle the
			
 
				+main formatting of a message. It's parameterized on a location type, for which a
			
 
				+DiagnosticLocationTranslator must be provided that can translate the location
			
 
				+type into a standardized DiagnosticLocation of file, line, and column.
			
 
				+
			
 
				+When emitting, the resulting formatted message is passed to a
			
 
				+DiagnosticConsumer.
			
 
				+
			
 
				+## DiagnosticConsumers
			
 
				+
			
 
				+DiagnosticConsumers handle output of diagnostic messages after they've been
			
 
				+formatted by an Emitter. Important consumers are:
			
 
				+
			
 
				+-   [ConsoleDiagnosticConsumer](/toolchain/diagnostics/diagnostic_emitter.h):
			
 
				+    prints diagnostics to console.
			
 
				+
			
 
				+-   [ErrorTrackingDiagnosticConsumer](/toolchain/diagnostics/diagnostic_emitter.h):
			
 
				+    counts the number of errors produced, particularly so that it can be
			
 
				+    determined whether any errors were encountered.
			
 
				+
			
 
				+-   [SortingDiagnosticConsumer](/toolchain/diagnostics/sorting_diagnostic_consumer.h):
			
 
				+    sorts diagnostics by line so that diagnostics are seen in terminal based on
			
 
				+    their order in the file rather than the order they were produced.
			
 
				+
			
 
				+-   [NullDiagnosticConsumer](/toolchain/diagnostics/null_diagnostics.h):
			
 
				+    suppresses diagnostics, particularly for tests.
			
 
				+
			
 
				+Note that `SortingDiagnosticConsumer` is used by default by `carbon compile`. In
			
 
				+cases where one error leads to another error at an earlier location, for example
			
 
				+if an error in a function call argument leads to an error in the function call,
			
 
				+this can result in confusing diagnostic output where a consequence of the error
			
 
				+is reported before the cause. Usually this should be handled by tracking that an
			
 
				+error occurred and suppressing the follow-on diagnostic. During toolchain
			
 
				+development, it can be useful to disable the sorting so that the diagnostic
			
 
				+order matches the order in which the file was processed. This can be done using
			
 
				+`carbon compile –stream-errors`.
			
 
				+
			
 
				+## Producing diagnostics
			
 
				+
			
 
				+Diagnostics are used to surface issues from compilation. A simple diagnostic
			
 
				+looks like:
			
 
				+
			
 
				+```cpp
			
 
				+CARBON_DIAGNOSTIC(InvalidCode, Error, "Code is invalid");
			
 
				+emitter.Emit(location, InvalidCode);
			
 
				+```
			
 
				+
			
 
				+Here, `CARBON_DIAGNOSTIC` defines a static instance of a diagnostic named
			
 
				+`InvalidCode` with the associated severity (`Error` or `Warning`).
			
 
				+
			
 
				+The `Emit` call produces a single instance of the diagnostic. When emitted,
			
 
				+`"Code is invalid"` will be the message used. The type of `location` depends on
			
 
				+the `DiagnosticEmitter`.
			
 
				+
			
 
				+A diagnostic with an argument looks like:
			
 
				+
			
 
				+```cpp
			
 
				+CARBON_DIAGNOSTIC(InvalidCharacter, Error, "Invalid character {0}.", char);
			
 
				+emitter.Emit(location, InvalidCharacter, invalid_char);
			
 
				+```
			
 
				+
			
 
				+Here, the additional `char` argument to `CARBON_DIAGNOSTIC` specifies the type
			
 
				+of an argument to expect for message formatting. The `invalid_char` argument to
			
 
				+`Emit` provides the matching value. It's then passed along with the diagnostic
			
 
				+message format to `llvm::formatv` to produce the final diagnostic message.
			
 
				+
			
 
				+## Diagnostic registry
			
 
				+
			
 
				+There is a [registry](/toolchain/diagnostics/diagnostic_kind.def) which all
			
 
				+diagnostics must be added to. Each diagnostic has a line like:
			
 
				+
			
 
				+```cpp
			
 
				+CARBON_DIAGNOSTIC_KIND(InvalidCode)
			
 
				+```
			
 
				+
			
 
				+This produces a central enumeration of all diagnostics. The eventual intent is
			
 
				+to require tests for every diagnostic that can be produced, but that isn't
			
 
				+currently implemented.
			
 
				+
			
 
				+## CARBON_DIAGNOSTIC placement
			
 
				+
			
 
				+Idiomatically, `CARBON_DIAGNOSTIC` will be adjacent to the `Emit` call. However,
			
 
				+this is only because many diagnostics can only be produced in one code location.
			
 
				+If they can be produced in multiple locations, they will be at a higher scope so
			
 
				+that multiple `Emit` calls can reference them. When in a function,
			
 
				+`CARBON_DIAGNOSTIC` should be placed as close as possible to the usage so that
			
 
				+it's easier to see the associated output.
			
 
				+
			
 
				+## Diagnostic context
			
 
				+
			
 
				+Diagnostics can provide additional context for errors by attaching notes, which
			
 
				+have their own location information. A diagnostic with a note looks like:
			
 
				+
			
 
				+```cpp
			
 
				+CARBON_DIAGNOSTIC(CallArgCountMismatch, Error,
			
 
				+                  "{0} argument(s) passed to function expecting "
			
 
				+                  "{1} argument(s).",
			
 
				+                  int, int);
			
 
				+CARBON_DIAGNOSTIC(InCallToFunction, Note,
			
 
				+                  "Calling function declared here.");
			
 
				+context.emitter()
			
 
				+    .Build(call_parse_node, CallArgCountMismatch, arg_refs.size(),
			
 
				+           param_refs.size())
			
 
				+    .Note(param_parse_node, InCallToFunction)
			
 
				+    .Emit();
			
 
				+```
			
 
				+
			
 
				+The error and the note are registered as two separate diagnostics, but a single
			
 
				+overall diagnostic object is built and emitted, so that the error and the note
			
 
				+can be treated as a single unit.
			
 
				+
			
 
				+Diagnostic context information can also be registered in a scope, so that all
			
 
				+diagnostics produced in that scope attach a specific note. For example:
			
 
				+
			
 
				+```cpp
			
 
				+DiagnosticAnnotationScope annotate_diagnostics(
			
 
				+    &context.emitter(), [&](auto& builder) {
			
 
				+      CARBON_DIAGNOSTIC(
			
 
				+          InCallToFunctionParam, Note,
			
 
				+          "Initializing parameter {0} of function declared here.", int);
			
 
				+      builder.Note(param_parse_node, InCallToFunctionParam,
			
 
				+                   diag_param_index + 1);
			
 
				+    });
			
 
				+```
			
 
				+
			
 
				+This is useful when delegating to another part of Check that may produce many
			
 
				+different kinds of diagnostic.
			
 
				+
			
 
				+## Diagnostic parameter types
			
 
				+
			
 
				+Here are some types you might consider for the parameters to a diagnostic:
			
 
				+
			
 
				+-   `llvm::StringLiteral`. Note that we don't use `llvm::StringRef` to avoid
			
 
				+    lifetime issues.
			
 
				+-   `std::string`
			
 
				+-   Carbon types `T` that implement `llvm::format_provider<T>` like:
			
 
				+    -   `Lex::TokenKind`
			
 
				+    -   `Lex::NumericLiteral::Radix`
			
 
				+    -   `Parse::RelativeLocation`
			
 
				+-   integer types: `int`, `uint64_t`, `int64_t`, `size_t`
			
 
				+-   `char`
			
 
				+-   Other
			
 
				+    [types supported by llvm::formatv](https://llvm.org/doxygen/FormatVariadic_8h_source.html)
			
 
				+
			
 
				+## Diagnostic message style guide
			
 
				+
			
 
				+In order to provide a consistent experience, Carbon diagnostics should be
			
 
				+written in the following style:
			
 
				+
			
 
				+-   Start diagnostics with a capital letter or quoted code, and end them with a
			
 
				+    period.
			
 
				+
			
 
				+-   Quoted code should be enclosed in backticks, for example:
			
 
				+    ``"`{0}` is bad."``
			
 
				+
			
 
				+-   Phrase diagnostics as bullet points rather than full sentences. Leave out
			
 
				+    articles unless they're necessary for clarity.
			
 
				+
			
 
				+-   Diagnostics should describe the situation the toolchain observed and the
			
 
				+    language rule that was violated, although either can be omitted if it's
			
 
				+    clear from the other. For example:
			
 
				+
			
 
				+    -   `"Redeclaration of X."` describes the situation and implies that
			
 
				+        redeclarations are not permitted.
			
 
				+
			
 
				+    -   ``"`self` can only be declared in an implicit parameter list."``
			
 
				+        describes the language rule and implies that you declared `self`
			
 
				+        somewhere else.
			
 
				+
			
 
				+    -   It's OK for a diagnostic to guess at the developer's intent and provide
			
 
				+        a hint after explaining the situation and the rule, but not as a
			
 
				+        substitute for that. For example,
			
 
				+        ``"Add an `as String` cast to format this integer as a string."`` is not
			
 
				+        sufficient as an error message, but
			
 
				+        ``"Cannot add i32 to String. Add an `as String` cast to format this integer as a string."``
			
 
				+        could be acceptable.
			
 
				+
			
 
				+-   TODO: Should diagnostics be atemporal and non-sequential ("multiple
			
 
				+    declarations of X", "additional declaration here"), present tense but
			
 
				+    sequential ("redeclaration of X", "previous declaration is here"), or
			
 
				+    temporal ("redeclaration of X", "previous declaration was here")? We could
			
 
				+    try to sidestep difference between the latter two by avoiding verbs with
			
 
				+    tense ("previously declared here", "Y declared here", with no is/was).
			
 
				+
			
 
				+-   TODO: Word choices:
			
 
				+
			
 
				+    -   For disallowed constructs, do we say they're not permitted / not allowed
			
 
				+        / not valid / not legal / illegal / ill-formed / disallowed? Do we say
			
 
				+        "X cannot be Y" or "X may not be Y" or "X must not be Y" or "X shall not
			
 
				+        be Y"?
			
 
				+
			
 
				+-   TODO: Is structuring diagnostics such that inputs can be parsed without
			
 
				+    string parsing important? that is, when is passing strings in as part of the
			
 
				+    message templating okay?
			
 
				+
			
 
				+-   TODO: When do we put identifiers or expressions in diagnostics, versus
			
 
				+    requiring notes pointing at relevant code? Is it only avoided for values, or
			
 
				+    only allowed for types?
			
 
				+
			
 
				+-   TODO: Lots more things to decide, give examples.
			
--- a/toolchain/docs/driver.md
+++ b/toolchain/docs/driver.md
@@ -0,0 +1,22 @@
 
				+# Driver
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+The driver provides commands and ties together the toolchain's flow. Running a
			
 
				+command such as `carbon compile --phase=lower <file>` will run through the flow
			
 
				+and print output. Several dump flags, such as `--dump-parse-tree`, print output
			
 
				+in YAML format for easier parsing.
			
--- a/toolchain/docs/idioms.md
+++ b/toolchain/docs/idioms.md
@@ -0,0 +1,424 @@
 
				+# Idioms
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+-   [C++ dialect](#c-dialect)
			
 
				+-   [Abbreviations used in the code (AKA Carbon abbreviation decoder ring)](#abbreviations-used-in-the-code-aka-carbon-abbreviation-decoder-ring)
			
 
				+-   [`.def` files](#def-files)
			
 
				+    -   [EnumBase types](#enumbase-types)
			
 
				+-   [Index types](#index-types)
			
 
				+-   [ValueStore](#valuestore)
			
 
				+-   [Template metaprogramming](#template-metaprogramming)
			
 
				+    -   [Struct reflection](#struct-reflection)
			
 
				+    -   [Field detection](#field-detection)
			
 
				+-   [Local lambdas to reduce duplicate code](#local-lambdas-to-reduce-duplicate-code)
			
 
				+-   [Immediately invoked function expressions (IIFE)](#immediately-invoked-function-expressions-iife)
			
 
				+-   [Declarations in conditions](#declarations-in-conditions)
			
 
				+-   [CRTP or "Curiously recurring template pattern"](#crtp-or-curiously-recurring-template-pattern)
			
 
				+-   [Multiple inheritance](#multiple-inheritance)
			
 
				+-   [Defining constants usable in constexpr contexts](#defining-constants-usable-in-constexpr-contexts)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+The toolchain implementation uses some implementation techniques that may not be
			
 
				+commonly found in typical C++ code.
			
 
				+
			
 
				+## C++ dialect
			
 
				+
			
 
				+The toolchain implementation does not use some C++ features, following
			
 
				+[Google's C++ style guide](https://google.github.io/styleguide/cppguide.html):
			
 
				+
			
 
				+-   [Exceptions](https://google.github.io/styleguide/cppguide.html#Exceptions)
			
 
				+-   [Virtual base classes](https://google.github.io/styleguide/cppguide.html#Inheritance)
			
 
				+-   [RTTI](https://google.github.io/styleguide/cppguide.html#Run-Time_Type_Information__RTTI_)
			
 
				+
			
 
				+## Abbreviations used in the code (AKA Carbon abbreviation decoder ring)
			
 
				+
			
 
				+Note that abbreviations are typically only used in code, not comments (except
			
 
				+when referring to an entity from the code).
			
 
				+
			
 
				+-   **Addr**: "address"
			
 
				+-   **Arg**: "argument"
			
 
				+-   **Decl**: "declaration"
			
 
				+-   **Expr**: "expression"
			
 
				+    -   **SubExpr**: "subexpression"
			
 
				+-   **Float**: "floating point"
			
 
				+-   **Init**: "initialization"
			
 
				+-   **Inst**: "instruction"
			
 
				+-   **Int**: "integer"
			
 
				+-   **Loc**: "location"
			
 
				+-   **Param**: "parameter"
			
 
				+-   **Paren**: "parenthesis"
			
 
				+-   **Ref**: "reference"
			
 
				+    -   **Deref**: "dereference"
			
 
				+-   **Subst**: "substitute"
			
 
				+
			
 
				+Phrase abbreviations (where we have an abbreviation for a phrase, where we
			
 
				+wouldn't perform all of the abbreviations of those words individually):
			
 
				+
			
 
				+-   **InitRepr**: "initializing representation"
			
 
				+-   **ObjectRepr**: "object representation"
			
 
				+-   **SemIR**: "semantics intermediate representation"
			
 
				+-   **ValueRepr**: "value representation"
			
 
				+
			
 
				+## `.def` files
			
 
				+
			
 
				+The Carbon toolchain uses a technique related to
			
 
				+[X-macros](https://en.wikipedia.org/wiki/X_macro) to generate code that operates
			
 
				+over a collection of types, enumerators, or another similar list of names. This
			
 
				+works as follows:
			
 
				+
			
 
				+-   A `.def` file is provided, that is intended to be repeatedly included by way
			
 
				+    of `#include`.
			
 
				+-   The user of the `.def` defines a macro, with a name and a form specified by
			
 
				+    the `.def` file, for example
			
 
				+    `#define CARBON_EACH_WIDGET(Name) Scope::Name,`.
			
 
				+-   A `#include` of the `.def` file expands to `CARBON_EACH_WIDGET(Name1)`,
			
 
				+    `CARBON_EACH_WIDGET(Name2)`, ... for each widget name, and then `#undef`s
			
 
				+    the `CARBON_EACH_WIDGET` macro.
			
 
				+
			
 
				+For example:
			
 
				+
			
 
				+```cpp
			
 
				+enum Widgets {
			
 
				+#define CARBON_EACH_WIDGET(Name) Name,
			
 
				+#include "widgets.def"
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+... would expand to an enumeration definition with one enumerator per widget
			
 
				+name.
			
 
				+
			
 
				+### EnumBase types
			
 
				+
			
 
				+Most `.def` files will have a corresponding [EnumBase](/common/enum_base.h)
			
 
				+child class (if `widgets.def` has X-macros, `widgets.h` and `widgets.cpp` has
			
 
				+the `EnumBase` child class). These work similarly to an `enum class`, with the
			
 
				+addition of a `name()` function and `<<` stream operator support. Many also have
			
 
				+further utility functions for information related to the enum value.
			
 
				+
			
 
				+In code, these types and values can be used directly in a `switch`. They will
			
 
				+convert to an internal _actual_ `enum class` for the `switch`, and receive
			
 
				+corresponding compiler safety checks that all enum values are handled.
			
 
				+
			
 
				+## Index types
			
 
				+
			
 
				+Carbon makes frequent use of
			
 
				+[IndexBase and IdBase](/toolchain/base/index_base.h). The `IndexBase` and
			
 
				+`IdBase` types are small wrappers around `int32_t` to provide a measure of
			
 
				+type-checking when passing around indices to vector-like storage types. The only
			
 
				+difference is that `IndexBase` supports all comparison operators, whereas
			
 
				+`IdBase` only supports equality comparison.
			
 
				+
			
 
				+Variable naming will often have `_id` at the end to indicate that it corresponds
			
 
				+to an `IdBase`. This may include the full type, as in `operand_inst_id` being an
			
 
				+`InstId` for an operand.
			
 
				+
			
 
				+A block is an array of ids. These will be indicated with either a `_block`
			
 
				+suffix or pluralization (for example, `param_refs` pluralizing `refs`).
			
 
				+
			
 
				+The `ref` concept in a name means that there is an underlying instruction block,
			
 
				+but only a subset of instructions are present in the `refs` block. For example,
			
 
				+function parameters have a sequence, and also have a `refs` block with one entry
			
 
				+per parameter. The `refs` block allows parameters to be counted and accessed
			
 
				+directly, rather than through vector iteration.
			
 
				+
			
 
				+## ValueStore
			
 
				+
			
 
				+Many of Carbon's data types are stored in a
			
 
				+[ValueStore](/toolchain/base/value_store.h) or related type with similar
			
 
				+semantics (`sem_ir` has [several such classes](/toolchain/base/value_store.h)).
			
 
				+`ValueStore` links an indexing type to a value type with vector-like storage.
			
 
				+The indices typically use `IdBase`.
			
 
				+
			
 
				+`ValueStore`s APIs follow the shape of simple array access and mutation:
			
 
				+
			
 
				+-   `Add` which takes a value and returns the index.
			
 
				+-   `Set` which takes a value and index to modify.
			
 
				+-   `Get` takes an index and returns a reference to the value (possibly a
			
 
				+    constant reference).
			
 
				+-   Other vector-like functionality, including `size` or `Reserve`
			
 
				+
			
 
				+ValueStores should be named after the type they contain. The index type used on
			
 
				+the value store should have a `using ValueType...` which indicates the stored
			
 
				+type. When taking a return of one of these functions, it's common to use `auto`
			
 
				+and rely on the name of the storage type to imply the returned type.
			
 
				+
			
 
				+Some name mirroring examples are:
			
 
				+
			
 
				+-   `ints` is a `ValueStore<IntId>`, which has an index type of `IntId` and a
			
 
				+    value type of `llvm::APInt`.
			
 
				+
			
 
				+-   `functions` is a `ValueStore<SemIR::FunctionId>`, which has an index type of
			
 
				+    `SemIR::FunctionId` and a value type of `SemIR::` `Function`.
			
 
				+
			
 
				+-   `strings` is a `ValueStore<StringId>`, which has an index type of
			
 
				+    `StringId`, but for copy-related reasons, uses `llvm::StringRef` for values.
			
 
				+
			
 
				+A fairly complete list of `ValueStore` uses should be available on
			
 
				+[checking's Context class](https://github.com/search?q=repository%3Acarbon-language%2Fcarbon-lang%20path%3Acheck%2Fcontext.h%20symbol%3Aidentifiers&type=code).
			
 
				+
			
 
				+## Template metaprogramming
			
 
				+
			
 
				+FIXME: show example patterns
			
 
				+
			
 
				+-   TypedInstArgsInfo from toolchain/sem_ir/inst.h
			
 
				+-   templated using
			
 
				+-   std::declval
			
 
				+-   decltype
			
 
				+-   static_assert
			
 
				+-   if constexpr
			
 
				+-   template specialization, for example `Inst::FromRaw<T>` (maybe also type
			
 
				+    traits?)
			
 
				+
			
 
				+### Struct reflection
			
 
				+
			
 
				+The toolchain uses a primitive form of struct reflection to operate generically
			
 
				+over the fields in a typed `SemIR` instruction. This is implemented in
			
 
				+`common/struct_reflection.h`, and the interface to the functionality is
			
 
				+`StructReflection::AsTuple(your_struct)`, which converts the given struct into a
			
 
				+`std::tuple` containing the same fields in the same order.
			
 
				+
			
 
				+### Field detection
			
 
				+
			
 
				+The presence of specific fields in a struct with a specified type is detected
			
 
				+using the following idiom:
			
 
				+
			
 
				+```cpp
			
 
				+template <typename T, typename = FieldType T::*>
			
 
				+constexpr bool HasField = false;
			
 
				+template <typename T>
			
 
				+constexpr bool HasField<T, decltype(&T::field)> = true;
			
 
				+```
			
 
				+
			
 
				+This is intended to check the same property as the following concept, which we
			
 
				+can't use because we currently need to compile in C++17 mode:
			
 
				+
			
 
				+```cpp
			
 
				+template <typename T> concept HasField = requires (T x) {
			
 
				+  { x.field } -> std::same_as<FieldType>;
			
 
				+};
			
 
				+```
			
 
				+
			
 
				+To detect a field with a specific name with a type derived from a specified base
			
 
				+type, use this idiom:
			
 
				+
			
 
				+```cpp
			
 
				+// HasField<T> is true if T has a `U field` field,
			
 
				+// where `U` extends `BaseClass`.
			
 
				+template <typename T, bool Enabled = true>
			
 
				+inline constexpr bool HasField = false;
			
 
				+template <typename T>
			
 
				+inline constexpr bool HasField<
			
 
				+    T, bool(std::is_base_of_v<BaseClass, decltype(T::field)>)> = true;
			
 
				+```
			
 
				+
			
 
				+The equivalent concept is:
			
 
				+
			
 
				+```cpp
			
 
				+template <typename T> concept HasField = requires (T x) {
			
 
				+  { x.field } -> std::derived_from<BaseClass>;
			
 
				+};
			
 
				+```
			
 
				+
			
 
				+## Local lambdas to reduce duplicate code
			
 
				+
			
 
				+Sometimes code that would be repeated in a function is factored into a local
			
 
				+variable containing a
			
 
				+[lambda](https://en.cppreference.com/w/cpp/language/lambda):
			
 
				+
			
 
				+```cpp
			
 
				+auto common_code = [&](AType param1, AnotherType param2) {
			
 
				+  // code that would otherwise be repeated
			
 
				+  ...
			
 
				+}
			
 
				+if (something) {
			
 
				+  common_code(...);
			
 
				+}
			
 
				+if (something_else) {
			
 
				+  common_code(...)
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Compared to defining a new function, this has the advantage of being able to be
			
 
				+declared in context and access the local variables of the enclosing function.
			
 
				+
			
 
				+## Immediately invoked function expressions (IIFE)
			
 
				+
			
 
				+Instead of creating a separate function with its own name that will be called
			
 
				+once to produce the initial value for a variable, the function can be declared
			
 
				+inline and then immediately called.
			
 
				+
			
 
				+This can be used for complex initialization, as in:
			
 
				+
			
 
				+```cpp
			
 
				+// variable declaration
			
 
				+static const llvm::ArrayRef<std::byte> entropy_bytes =
			
 
				+// initializer starts with a lambda
			
 
				+    []() -> llvm::ArrayRef<std::byte> {
			
 
				+  static llvm::SmallVector<std::byte> bytes;
			
 
				+
			
 
				+  // a bunch of code
			
 
				+
			
 
				+  // return the value to initialize the variable with
			
 
				+  return bytes;
			
 
				+
			
 
				+// finish defining the lambda, and then immediately invoke it
			
 
				+}();
			
 
				+```
			
 
				+
			
 
				+It can also be used inside a `CARBON_DCHECK` to avoid computation that is only
			
 
				+needed in debug builds:
			
 
				+
			
 
				+```cpp
			
 
				+CARBON_DCHECK([&] {
			
 
				+  // a bunch of code
			
 
				+
			
 
				+  // condition that will be tested by CARBON_DCHECK
			
 
				+  return complicated && multiple_parts;
			
 
				+
			
 
				+// finish defining the lambda, and then immediately invoke it
			
 
				+}()) << "Complicated things went wrong";
			
 
				+```
			
 
				+
			
 
				+See a description of this technique on
			
 
				+[wikipedia](https://en.wikipedia.org/wiki/Immediately_invoked_function_expression).
			
 
				+
			
 
				+## Declarations in conditions
			
 
				+
			
 
				+The condition part of an `if` statement may contain a declaration with an
			
 
				+initializer followed by a semicolon (`;`) and then the proper boolean condition
			
 
				+expression, as in:
			
 
				+
			
 
				+```cpp
			
 
				+if (auto verify = tree.Verify(); !verify.ok()) {
			
 
				+```
			
 
				+
			
 
				+The condition can be replaced by a declaration entirely, as in:
			
 
				+
			
 
				+```cpp
			
 
				+if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal)) {
			
 
				+// Equivalent to:
			
 
				+if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal); equals) {
			
 
				+```
			
 
				+
			
 
				+or
			
 
				+
			
 
				+```cpp
			
 
				+if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>()) {
			
 
				+// Equivalent to:
			
 
				+if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>(); literal) {
			
 
				+```
			
 
				+
			
 
				+This is a common way of handling a function that returns an optional value.
			
 
				+
			
 
				+See
			
 
				+[https://en.cppreference.com/w/cpp/language/if](https://en.cppreference.com/w/cpp/language/if)
			
 
				+
			
 
				+## CRTP or "Curiously recurring template pattern"
			
 
				+
			
 
				+[Curiously Recurring Template Pattern - cppreference.com](https://en.cppreference.com/w/cpp/language/crtp)
			
 
				+
			
 
				+[Curiously recurring template pattern - Wikipedia](https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern)
			
 
				+
			
 
				+[Google search](https://www.google.com/search?q=crtp+c%2B%2B)
			
 
				+
			
 
				+Examples:
			
 
				+
			
 
				+-   `template <typename DerivedT, ...>` in [enum_base.h](/common/enum_base.h)
			
 
				+-   `template <typename DerivedT>` in [ostream.h](/common/ostream.h)
			
 
				+
			
 
				+## Multiple inheritance
			
 
				+
			
 
				+We use multiple inheritance to support uses of
			
 
				+[CRTP](#crtp-or-curiously-recurring-template-pattern).
			
 
				+
			
 
				+Example:
			
 
				+
			
 
				+```cpp
			
 
				+struct NameScopeId : public IndexBase, public Printable<NameScopeId> {
			
 
				+```
			
 
				+
			
 
				+## Defining constants usable in constexpr contexts
			
 
				+
			
 
				+To declare a constant usable at compile time in `constexpr` contexts as a static
			
 
				+class member, we use this pattern:
			
 
				+
			
 
				+Declaration:
			
 
				+
			
 
				+```cpp
			
 
				+class Foo {
			
 
				+  // ...
			
 
				+  static const std::array<ElementType, ElementCount> MyTable;
			
 
				+  static constexpr auto ComputeMyTable()
			
 
				+      -> std::array<ElementType, ElementCount> { ... }
			
 
				+};
			
 
				+```
			
 
				+
			
 
				+Definition:
			
 
				+
			
 
				+```cpp
			
 
				+constexpr std::array<ElementType, ElementCount>
			
 
				+    Foo::MyTable = Foo::ComputeMyTable();
			
 
				+```
			
 
				+
			
 
				+Note the `const` on the declaration does not match the `constexpr` on
			
 
				+definition, and that the definition is outside of the class body. This allows
			
 
				+the initializer to depend on the definition of the class.
			
 
				+
			
 
				+Further note that this only works with static members of classes, not static
			
 
				+variables in functions.
			
 
				+
			
 
				+Due to [a Clang bug](https://github.com/llvm/llvm-project/issues/85461), this
			
 
				+technique does not work in a class template. The following pattern can be used
			
 
				+instead:
			
 
				+
			
 
				+```cpp
			
 
				+template <typename T>
			
 
				+class Foo {
			
 
				+  // ...
			
 
				+  template <typename Self = Foo>
			
 
				+  static constexpr auto MyValueImpl = Self();
			
 
				+  static constexpr const Foo& MyValue = MyValueImpl<>;
			
 
				+  // ...
			
 
				+};
			
 
				+```
			
 
				+
			
 
				+The parameters of the variable template can be chosen to allow reuse of the same
			
 
				+variable template for multiple static data members.
			
 
				+
			
 
				+Examples:
			
 
				+
			
 
				+-   `NodeStack::IdKindTable` in
			
 
				+    [check/node_stack.h](/toolchain/check/node_stack.h)
			
 
				+-   `BuiltinKind::ValidCount` in
			
 
				+    [sem_ir/builtin_inst_kind.h](/toolchain/sem_ir/builtin_inst_kind.h)
			
 
				+
			
 
				+A global constant may use a single definition without a separate declaration:
			
 
				+
			
 
				+```cpp
			
 
				+static constexpr std::array<bool, 256> IsIdStartByteTable = [] {
			
 
				+  std::array<bool, 256> table = {};
			
 
				+  // ...
			
 
				+  return table;
			
 
				+}();
			
 
				+```
			
 
				+
			
 
				+Note this example is using an
			
 
				+[immediately invoked function expression](#immediately-invoked-function-expressions-iife)
			
 
				+to compute the initial value, which is common.
			
 
				+
			
 
				+Examples:
			
 
				+
			
 
				+-   [lex/lex.cpp](/toolchain/lex/lex.cpp)
			
--- a/toolchain/docs/lex.md
+++ b/toolchain/docs/lex.md
@@ -0,0 +1,44 @@
 
				+# Lex
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+-   [Bracket matching](#bracket-matching)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Bracket matching in parser](#bracket-matching-in-parser)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+Lexing converts input source code into tokenized output. Literals, such as
			
 
				+string literals, have their value parsed and form a single token at this stage.
			
 
				+
			
 
				+## Bracket matching
			
 
				+
			
 
				+The lexer handles matching for `()`, `[]`, and `{}`. When a bracket lacks a
			
 
				+match, it will insert a "recovery" token to produce a match. As a consequence,
			
 
				+the lexer's output should always have matched brackets, even with invalid code.
			
 
				+
			
 
				+While bracket matching could use hints such as contextual clues from
			
 
				+indentation, that is not yet implemented.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Bracket matching in parser
			
 
				+
			
 
				+Bracket matching could have also been implemented in the parser, with some
			
 
				+awareness of parse state. However, that would shift some of the complexity of
			
 
				+recovery in other error situations, such as where the parser searches for the
			
 
				+next comma in a list. That needs to skip over bracketed ranges. We don't think
			
 
				+the trade-offs would yield a net benefit, so any change in this direction would
			
 
				+need to show concrete improvement, for example better diagnostics for common
			
 
				+issues.
			
--- a/toolchain/docs/lower.md
+++ b/toolchain/docs/lower.md
@@ -0,0 +1,25 @@
 
				+# Lower
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+Lowering takes the SemIR and produces LLVM IR. At present, this is done in a
			
 
				+single pass, although it's possible we may need to do a second pass so that we
			
 
				+can first generate type information for function arguments.
			
 
				+
			
 
				+Lowering is done per `SemIR::InstBlock`. This minimizes changes to the
			
 
				+`IRBuilder` insertion point, something that is both expensive and potentially
			
 
				+fragile.
			
--- a/toolchain/docs/parse.md
+++ b/toolchain/docs/parse.md
@@ -0,0 +1,802 @@
 
				+# Parse
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Overview](#overview)
			
 
				+-   [Parse stack](#parse-stack)
			
 
				+-   [Postorder tree](#postorder-tree)
			
 
				+-   [Bracketing inside the tree](#bracketing-inside-the-tree)
			
 
				+-   [Visual example](#visual-example)
			
 
				+-   [Handling invalid parses](#handling-invalid-parses)
			
 
				+-   [How is this accomplished?](#how-is-this-accomplished)
			
 
				+    -   [Introducer](#introducer)
			
 
				+    -   [Optional modifiers before an introducer](#optional-modifiers-before-an-introducer)
			
 
				+    -   [Something required in context](#something-required-in-context)
			
 
				+    -   [Optional clauses](#optional-clauses)
			
 
				+        -   [Case 1: introducer to optional clause is used as parent node](#case-1-introducer-to-optional-clause-is-used-as-parent-node)
			
 
				+        -   [Case 2: parent node is required token after optional clause, with different parent node kinds for different options](#case-2-parent-node-is-required-token-after-optional-clause-with-different-parent-node-kinds-for-different-options)
			
 
				+        -   [Case 3: optional sibling](#case-3-optional-sibling)
			
 
				+    -   [Operators](#operators)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Overview
			
 
				+
			
 
				+Parsing uses tokens to produce a parse tree that faithfully represents the tree
			
 
				+structure of the source program, interpreted according to the Carbon grammar. No
			
 
				+semantics are associated with the tree structure at this level, and no name
			
 
				+lookup is performed.
			
 
				+
			
 
				+The parse tree's structure corresponds to the grammar of the Carbon language. On
			
 
				+valid input, there will be a 1:1 correspondence between parse tree nodes and
			
 
				+tokens.
			
 
				+
			
 
				+A parse tree is considered _structurally valid_ if all nodes have the number of
			
 
				+children that their node kind requires. On invalid input, nodes may be added
			
 
				+that don't correspond to a token to maintain a structurally valid parse tree.
			
 
				+When a parse tree node is marked as having an error, it will still be
			
 
				+structurally valid, but its children may not match a valid grammar. Code trying
			
 
				+to handle children of erroneous nodes must be prepared to handle atypical
			
 
				+structures, but it may still be helpful for tools such as syntax highlighters or
			
 
				+refactoring tools.
			
 
				+
			
 
				+In general, we favor doing the checking for whether something is allowed _in a
			
 
				+particular context_ in [the check stage](check.md) instead of the parse stage,
			
 
				+unless the context is very local. This is for a few reasons:
			
 
				+
			
 
				+-   We anticipate that the parse stage will be used to operate on invalid code
			
 
				+    while still preserving as much of the intent of the author as possible, for
			
 
				+    example in an IDE or a code formatter.
			
 
				+-   To keep as much code out of the parse stage as possible, so it is simple and
			
 
				+    fast.
			
 
				+-   We are building all the infrastructure to keep track of context in the check
			
 
				+    stage.
			
 
				+
			
 
				+These reasons explain what local context is okay: where we already have the
			
 
				+contextual information at hand so there is no performance cost, and we can
			
 
				+output a parse tree that still captures faithfully what the user wrote.
			
 
				+Examples:
			
 
				+
			
 
				+-   All declaration modifiers are allowed in any order on any declaration in the
			
 
				+    parse stage. Diagnosing duplicated modifiers, modifiers that conflict with
			
 
				+    other modifiers, or modifiers that can't be used on a particular declaration
			
 
				+    is postponed until the check stage.
			
 
				+-   Rejecting a keyword after `fn` where a name is expected is done at the parse
			
 
				+    stage.
			
 
				+
			
 
				+## Parse stack
			
 
				+
			
 
				+The core parser loop is `Parse::Tree::Parse`. In the loop, it pops the next
			
 
				+state off the stack, and dispatches to the appropriate `Handle` function.
			
 
				+
			
 
				+A typical handler function pops the state first, leaving the stack ready for the
			
 
				+next state. It may add nodes to the parse tree, based on the current code. If it
			
 
				+needs to trigger other states, it will push them onto the stack; because it's a
			
 
				+stack, the _next_ state is always pushed _last_.
			
 
				+
			
 
				+Operator expressions store information about current operator precedence in the
			
 
				+stack as well. While this isn't necessary for most parser states, and could be
			
 
				+stored separately, it's currently together because it has no impact on the size
			
 
				+of a stack entry and is thus more efficient to store in one place.
			
 
				+
			
 
				+## Postorder tree
			
 
				+
			
 
				+The parse tree's storage layout is in postorder. For example, given the code:
			
 
				+
			
 
				+```carbon
			
 
				+fn foo() -> f64 {
			
 
				+  return 42;
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+The node order is (with indentation to indicate nesting):
			
 
				+
			
 
				+<!-- Prevent prettier from changing indents. -->
			
 
				+<!-- prettier-ignore-start -->
			
 
				+
			
 
				+```yaml
			
 
				+[
			
 
				+  {kind: 'FileStart', text: ''},
			
 
				+      {kind: 'FunctionIntroducer', text: 'fn'},
			
 
				+      {kind: 'Name', text: 'foo'},
			
 
				+        {kind: 'ParamListStart', text: '('},
			
 
				+      {kind: 'ParamList', text: ')', subtree_size: 2},
			
 
				+        {kind: 'Literal', text: 'f64'},
			
 
				+      {kind: 'ReturnType', text: '->', subtree_size: 2},
			
 
				+    {kind: 'FunctionDefinitionStart', text: '{', subtree_size: 7},
			
 
				+      {kind: 'ReturnStatementStart', text: 'return'},
			
 
				+      {kind: 'Literal', text: '42'},
			
 
				+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
			
 
				+  {kind: 'FunctionDefinition', text: '}', subtree_size: 11},
			
 
				+  {kind: 'FileEnd', text: ''},
			
 
				+]
			
 
				+```
			
 
				+
			
 
				+<!-- prettier-ignore-end -->
			
 
				+
			
 
				+In this example, `FileStart`, `FunctionDefinition`, and `FileEnd` are "root"
			
 
				+nodes for the tree. Function components are children of `FunctionDefinition`.
			
 
				+
			
 
				+It's produced in this way because it's an efficient layout to produce with
			
 
				+vectorized storage, requiring little context to be maintained during parsing.
			
 
				+Because it's stored in postorder, it's also most efficient to process the parsed
			
 
				+output in postorder; this affects checking.
			
 
				+
			
 
				+The parse tree is printed in postorder by default because it matches how the
			
 
				+parse tree is expected to be processed within the toolchain , and so can make it
			
 
				+easier to reason about. However, the `--preorder` flag may be used in contexts
			
 
				+where a preorder representation would be easier to handle.
			
 
				+
			
 
				+## Bracketing inside the tree
			
 
				+
			
 
				+The parse tree is designed to be walked in postorder by checking, allowing
			
 
				+checking to be more efficient. To support this, checking sometimes requires
			
 
				+context on the meaning of a node when it is encountered.
			
 
				+
			
 
				+Each `ParseNodeKind` has either a bracketing node, or a specific child count.
			
 
				+This helps document and enforce the expected tree structure.
			
 
				+
			
 
				+When a bracketing node is indicated, it is the opening bracket: it will always
			
 
				+be the first child of the parent, and that will be the only time it occurs in
			
 
				+the parent's children (it may still occur in children of children). When
			
 
				+checking encounters the opening bracket, this means it can make contextual
			
 
				+decisions for the later children of the node.
			
 
				+
			
 
				+Nodes can also have a specific child count, for example, infix operators always
			
 
				+have two children: the lhs and rhs expressions. Many nodes have a child count of
			
 
				+0; this just means they're leaf nodes, and will never have children.
			
 
				+
			
 
				+Because the tree structure is always valid, these are treated as contracts. Some
			
 
				+nodes exist only to be used to construct valid tree structures for invalid
			
 
				+input, such as `StructFieldUnknown`.
			
 
				+
			
 
				+Although each subtree's size is also tracked as part of the node, we're
			
 
				+currently trying to avoid relying on it and may eliminate it if it turns out to
			
 
				+be unnecessary and a meaningful cost for the compiler.
			
 
				+
			
 
				+## Visual example
			
 
				+
			
 
				+To try to explain the transition from code to Parse Tree, consider the
			
 
				+statement:
			
 
				+
			
 
				+```carbon
			
 
				+var x: i32 = y + 1;
			
 
				+```
			
 
				+
			
 
				+Lexing creates distinct tokens for each syntactic element, which will form the
			
 
				+basis of the parse tree:
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
			
 
				++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+</pre>
			
 
				+
			
 
				+First the `var` keyword is used as a "bracketing" node (VariableIntroducer).
			
 
				+When this is seen in a postorder traversal, it tells us to expect the basics of
			
 
				+a variable declaration structure.
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				+        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+        |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
			
 
				+        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+
			
 
				+<b>Parse tree:</b>
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				++-----+
			
 
				+| var |
			
 
				++-----+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+</pre>
			
 
				+
			
 
				+Next, we can consider the pattern binding. Here, `x` is the identifier and `i32`
			
 
				+is the type expression. The `:` provides a parent node that must always contain
			
 
				+two children, the name and type expression. Because it always has two direct
			
 
				+children, it doesn't need to be bracketed.
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				+                                +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+                                |  =  | |  y  | |  +  | |  1  | |  ;  |
			
 
				+                                +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+
			
 
				+<b>Parse tree:</b>
			
 
				+
			
 
				+        +-----+ +-----+
			
 
				+        |  x  | | i32 |
			
 
				+        +-----+ +-----+
			
 
				+           |       |
			
 
				+           +-------+-------+
			
 
				+                           |
			
 
				++-----+                 +-----+
			
 
				+| var |                 |  :  |
			
 
				++-----+                 +-----+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+</pre>
			
 
				+
			
 
				+We use the `=` as a separator (instead of a node with children like `:`) to help
			
 
				+indicate the transition from binding to assignment expression, which is
			
 
				+important for expression parsing during checking.
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				+                                        +-----+ +-----+ +-----+ +-----+
			
 
				+                                        |  y  | |  +  | |  1  | |  ;  |
			
 
				+                                        +-----+ +-----+ +-----+ +-----+
			
 
				+
			
 
				+<b>Parse tree:</b>
			
 
				+
			
 
				+        +-----+ +-----+
			
 
				+        |  x  | | i32 |
			
 
				+        +-----+ +-----+
			
 
				+           |       |
			
 
				+           +-------+-------+
			
 
				+                           |
			
 
				++-----+                 +-----+ +-----+
			
 
				+| var |                 |  :  | |  =  |
			
 
				++-----+                 +-----+ +-----+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+</pre>
			
 
				+
			
 
				+The expression is a subtree with `+` as the parent, and the two operands as
			
 
				+child nodes.
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				+                                                                +-----+
			
 
				+                                                                |  ;  |
			
 
				+                                                                +-----+
			
 
				+
			
 
				+<b>Parse tree:</b>
			
 
				+
			
 
				+        +-----+ +-----+                 +-----+ +-----+
			
 
				+        |  x  | | i32 |                 |  y  | |  1  |
			
 
				+        +-----+ +-----+                 +-----+ +-----+
			
 
				+           |       |                       |       |
			
 
				+           +-------+-------+               +-------+-------+
			
 
				+                           |                               |
			
 
				++-----+                 +-----+ +-----+                 +-----+
			
 
				+| var |                 |  :  | |  =  |                 |  +  |
			
 
				++-----+                 +-----+ +-----+                 +-----+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+</pre>
			
 
				+
			
 
				+Finally, the `;` is used as the "root" of the variable declaration. It's
			
 
				+explicitly tracked as the `;` for a variable declaration so that it's
			
 
				+unambiguously bracketed by `var`.
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+<b>Parse tree:</b>
			
 
				+
			
 
				+        +-----+ +-----+                 +-----+ +-----+
			
 
				+        |  x  | | i32 |                 |  y  | |  1  |
			
 
				+        +-----+ +-----+                 +-----+ +-----+
			
 
				+           |       |                       |       |
			
 
				+           +-------+-------+               +-------+-------+
			
 
				+                           |                               |
			
 
				++-----+                 +-----+ +-----+                 +-----+
			
 
				+| var |                 |  :  | |  =  |                 |  +  |
			
 
				++-----+                 +-----+ +-----+                 +-----+
			
 
				+   |                       |       |                       |
			
 
				+   +-----------------------+-------+-----------------------+-------+
			
 
				+                                                                   |
			
 
				+                                                                +-----+
			
 
				+                                                                |  ;  |
			
 
				+                                                                +-----+
			
 
				+</pre>
			
 
				+
			
 
				+This is the completed parse tree.
			
 
				+
			
 
				+In storage, this tree will be flat and in postorder. Because the order hasn't
			
 
				+changed much from the original code, we can do the reordering for postorder with
			
 
				+a minimal number of nodes being delayed for later output: it will be linear with
			
 
				+respect to the depth of the parse tree.
			
 
				+
			
 
				+<pre>
			
 
				+<b>Tokens:</b>
			
 
				+
			
 
				++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
			
 
				++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+
			
 
				+<b>Parse tree:</b>
			
 
				+
			
 
				+        +-----+ +-----+                 +-----+ +-----+
			
 
				+        |  x  | | i32 |                 |  y  | |  1  |
			
 
				+        +-----+ +-----+                 +-----+ +-----+
			
 
				+           |       |                       |       |
			
 
				+           +-------+-------+               +-------+-------+
			
 
				+                           |                               |
			
 
				++-----+                 +-----+ +-----+                 +-----+
			
 
				+| var |                 |  :  | |  =  |                 |  +  |
			
 
				++-----+                 +-----+ +-----+                 +-----+
			
 
				+   |                       |       |                       |
			
 
				+   +-----------------------+-------+-----------------------+-------+
			
 
				+                                                                   |
			
 
				+                                                                +-----+
			
 
				+                                                                |  ;  |
			
 
				+                                                                +-----+
			
 
				+
			
 
				+<b>Flattened for storage:</b>
			
 
				+
			
 
				++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+| var | |  x  | | i32 | |  :  | |  =  | |  y  | |  1  | |  +  | |  ;  |
			
 
				++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
			
 
				+</pre>
			
 
				+
			
 
				+The structural concepts of bracketing nodes (`var` and `;`) and parent nodes
			
 
				+with a known child count (`:` and `+` with 2 children, but also `=` with 0
			
 
				+children) will allow checking to reconstruct the tree as it encounters nodes
			
 
				+during the postorder.
			
 
				+
			
 
				+There are other structures that could have been used here, such as `=` being
			
 
				+parent of the `var` and pattern nodes, and `;` being the parent of the `=` and
			
 
				+assignment expression nodes. In that example alternative, the storage order
			
 
				+would be the same; it would only change the tree representation. The current
			
 
				+structure is influenced by choices in checking.
			
 
				+
			
 
				+## Handling invalid parses
			
 
				+
			
 
				+On an invalid parse, the output tree should still try to mirror the intended
			
 
				+tree structure when possible. There's a balance here, and it's not expected to
			
 
				+try too hard to make things correct, but outputting nodes is preferred. There
			
 
				+are `InvalidParse` nodes which may be used to provide a node when the planned
			
 
				+node kind is too difficult to get correct child counts (bracketed subtrees may
			
 
				+not need an `InvalidParse` node).
			
 
				+
			
 
				+When marking a child node with `has_error=true`, parent nodes may also be marked
			
 
				+with `has_error=true`, but try to be conservative about this. As a rule of
			
 
				+thumb, if checking could continue on a parent node without needing the child
			
 
				+node to be fully checked (possibly with incomplete information), then the parent
			
 
				+node should not be marked as `has_error=true`. The goal remains providing
			
 
				+something similar to a well-formed parse tree.
			
 
				+
			
 
				+In general, a parent node must have the immediate children described in
			
 
				+[parse/typed_nodes.h](/toolchain/parse/typed_nodes.h), unless it is marked
			
 
				+`has_error=true`. If this is violated for a particular parse tree, an error will
			
 
				+be raised in `Tree::Verify`. Note that an `InvalidParse` node is allowed as a
			
 
				+declaration or expression, and an `InvalidParseSubtree` is allowed as a
			
 
				+declaration. These invalid nodes can be added to more node categories as needed.
			
 
				+
			
 
				+Child states may indicate an error to their parent using `ReturnErrorOnState`.
			
 
				+This is particularly intended for when a child state emits a diagnostic, to
			
 
				+prevent the parent state from emitting redundant diagnostics; for example, an
			
 
				+invalid expression might have more invalid tokens following it, and the parent
			
 
				+might skip those without emitting diagnostics.
			
 
				+
			
 
				+## How is this accomplished?
			
 
				+
			
 
				+The specific approach to producing the desired tree depends on the kind of
			
 
				+grammar rule being implemented, as well as the desired output tree structure.
			
 
				+
			
 
				+### Introducer
			
 
				+
			
 
				+**Example:** `if (c) { ... }`
			
 
				+
			
 
				+Here `if` is the introducer. Many other possible introducers could occur in that
			
 
				+position, such as `while` or `var`, and we want to dispatch based on which token
			
 
				+is present. See
			
 
				+[parse/handle_statement.cpp](/toolchain/parse/handle_statement.cpp).
			
 
				+
			
 
				+The first step is to identify the introducer token, typically using a `switch`
			
 
				+or `if` on the `Lex::TokenKind` at the current position:
			
 
				+
			
 
				+```cpp
			
 
				+switch (context.PositionKind()) {
			
 
				+  case Lex::TokenKind::___: {
			
 
				+    ...
			
 
				+    break;
			
 
				+  }
			
 
				+  ...
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+There should be a `default:` (or `else`) case so every kind of token is handled.
			
 
				+This may be an error, in which case:
			
 
				+
			
 
				+-   A [diagnostic](diagnostics.md) should be emitted.
			
 
				+
			
 
				+-   An invalid parse node should be added, using something like:
			
 
				+
			
 
				+    ```cpp
			
 
				+    context.AddLeafNode(NodeKind::InvalidParse, context.Consume(),
			
 
				+                        /*has_error=*/true);
			
 
				+    ```
			
 
				+
			
 
				+-   At least one node should be consumed, particularly if it will continue with
			
 
				+    this state at this position, to avoid an infinite loop.
			
 
				+
			
 
				+The default case may also be delegated to another state. For example, in the
			
 
				+state where a statement is expected, if no keyword introducer is recognized, it
			
 
				+switches to the expression-statement state.
			
 
				+
			
 
				+Depending on the introducer, different actions can be taken. The most common
			
 
				+case is to:
			
 
				+
			
 
				+-   Call `context.PushState(State::___);` to mark the beginning of the statement
			
 
				+    or declaration and indicate the state that will handle the tokens after the
			
 
				+    introducer.
			
 
				+
			
 
				+-   Call `context.AddLeafNode(NodeKind::___, context.Consume());` to output a
			
 
				+    bracketing node for this introducer.
			
 
				+
			
 
				+The next state can then add sibling nodes until it gets to the end of the
			
 
				+declaration or statement. The last token, often a semicolon `;`, is used as a
			
 
				+parent node to match the bracketing node of the introducer.
			
 
				+
			
 
				+If the introducer token won't be used as a bracketing node, it can be
			
 
				+temporarily skipped after `context.PushState` by calling
			
 
				+`context.ConsumeAndDiscard()` instead of `context.AddLeafNode`. It must be added
			
 
				+to the output tree as a node by some later state, unless an error occurs. For
			
 
				+example, a `for` statement uses the `for` token as the root of the tree -- it
			
 
				+doesn't need a bracketing node since it has a fixed child count. Note that the
			
 
				+token was saved when the state was pushed, and can be retrieved when adding a
			
 
				+node as in this example:
			
 
				+
			
 
				+```cpp
			
 
				+auto state = context.PopState();
			
 
				+context.AddNode(NodeKind::ForStatement, state.token, state.subtree_start,
			
 
				+                state.has_error);
			
 
				+```
			
 
				+
			
 
				+If this state is for an element of a scope like the statements in a code block,
			
 
				+most introducer tokens indicate that the current state should be repeated, to
			
 
				+handle the next statement, but some other token, like a close curly brace (`}`)
			
 
				+means that the state should be exited.
			
 
				+
			
 
				+### Optional modifiers before an introducer
			
 
				+
			
 
				+**Example:** `virtual fn Foo();`
			
 
				+
			
 
				+Here `fn` is the introducer, and `virtual` is an optional modifier that appears
			
 
				+before. See
			
 
				+[parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp).
			
 
				+
			
 
				+Use this pattern when the goal is to produce a subtree that starts with the
			
 
				+introducer as a bracketing node, as in the previous case, followed by nodes for
			
 
				+any modifiers. Note that bracketing is needed here, since the optional modifier
			
 
				+nodes mean that there is not a fixed child count for the parent node. This means
			
 
				+shuffling the introducer node before an unknown number of modifier nodes. This
			
 
				+is accomplished by emitting a placeholder node for the introducer, processing
			
 
				+all the modifiers until reaching the introducer, filling in the placeholder with
			
 
				+the information about the introducer, and then finishing the rest of the
			
 
				+declaration or statement.
			
 
				+
			
 
				+-   **Step 1**: Save the current value of `context.tree().size()`. This could be
			
 
				+    accomplished by calling `context.PushState()`, which saves that value in the
			
 
				+    `subtree_start` field of `Context::StateStackEntry`; or by constructing a
			
 
				+    `Context::StateStackEntry` value directly, as is done in
			
 
				+    [parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp).
			
 
				+    This marks the position of the placeholder node we are going to replace, as
			
 
				+    well as the beginning of the subtree we are eventually going to emit for
			
 
				+    this declaration or statement.
			
 
				+
			
 
				+-   **Step 2**: Emit the placeholder node using
			
 
				+    `context.AddLeafNode(NodeKind::Placeholder, *context.position());`. The
			
 
				+    `NodeKind` and `Lex::TokenIndex` values will be overwritten later.
			
 
				+
			
 
				+-   **Step 3**: Process tokens until we hit the introducer. All of the nodes we
			
 
				+    emit at this point will appear as siblings after the introducer token in the
			
 
				+    output tree.
			
 
				+
			
 
				+-   **Step 4 - success**: If an introducer token is found, replace the
			
 
				+    placeholder node using something like:
			
 
				+
			
 
				+    ```cpp
			
 
				+    context.ReplacePlaceholderNode(state.subtree_start, introducer_kind,
			
 
				+                                   context.Consume());
			
 
				+    ```
			
 
				+
			
 
				+    -   `state.subtree_start` is the value of `context.tree().size()` saved in
			
 
				+        step 1, which marks the position of the placeholder node in the output
			
 
				+        parse tree.
			
 
				+
			
 
				+    -   `introducer_kind` is the `NodeKind` for the introducer of this
			
 
				+        declaration or statement, a leaf node that will act as a bracketing node
			
 
				+        at the beginning of the subtree for this declaration or statement
			
 
				+
			
 
				+-   **Step 4 - error**: If we run into something other than a modifier or
			
 
				+    introducer before finding an introducer, we need to do error handling:
			
 
				+
			
 
				+    ```cpp
			
 
				+    context.ReplacePlaceholderNode(subtree_start, NodeKind::InvalidParseStart,
			
 
				+                                   *context.position(), /*has_error=*/true);
			
 
				+    ```
			
 
				+
			
 
				+    -   Emit a [diagnostic](diagnostics.md).
			
 
				+
			
 
				+    -   Replace the placeholder node (similar to step 4) with an
			
 
				+        `InvalidParseStart` node. It will be associated with the unexpected
			
 
				+        token that triggered this error.
			
 
				+
			
 
				+    -   Consume input token up to the likely end of the end of the current
			
 
				+        statement or declaration. For example, we might consume up to a `;` or a
			
 
				+        token at a lesser indent level using `context.SkipPastLikelyEnd(...)`.
			
 
				+        It is important that we consume at least one token in the error case,
			
 
				+        otherwise we could have an infinite loop of generating the same error on
			
 
				+        the same token.
			
 
				+
			
 
				+    -   Emit a `InvalidParseSubtree` node. This will be the parent of any
			
 
				+        emitted modifier nodes, and will be bracketed by the `InvalidParseStart`
			
 
				+        node emitted above. It should be associated with the last token
			
 
				+        consumed.
			
 
				+
			
 
				+        ```cpp
			
 
				+        // Set `iter` to the last token consumed, one before the current position.
			
 
				+        auto iter = context.position();
			
 
				+        --iter;
			
 
				+        context.AddNode(NodeKind::InvalidParseSubtree, *iter, subtree_start,
			
 
				+                        /*has_error=*/true);
			
 
				+        ```
			
 
				+
			
 
				+-   **Step 5**: (If success at step 4) Push whatever states are to be used to
			
 
				+    parse the rest of the declaration. The first state pushed (the last state to
			
 
				+    be processed) will handle the end of this declaration. That pushed state
			
 
				+    should have a `subtree_start` field set to the value of
			
 
				+    `context.tree().size()` saved in step 1.
			
 
				+
			
 
				+-   **Step 6**: When handling the state for the end of the declaration, emit the
			
 
				+    root node of subtree:
			
 
				+
			
 
				+    ```cpp
			
 
				+    state = context.PopState();
			
 
				+    context.AddNode(NodeKind::___, context.Consume(),
			
 
				+                    state.subtree_start, state.has_error);
			
 
				+    ```
			
 
				+
			
 
				+    -   This `state.subtree_start` will mark everything since the bracketing
			
 
				+        introducer node as the children of this node.
			
 
				+
			
 
				+### Something required in context
			
 
				+
			
 
				+FIXME
			
 
				+
			
 
				+Example: name after introducer
			
 
				+[parse/handle_decl_name_and_params.cpp](/toolchain/parse/handle_decl_name_and_params.cpp)
			
 
				+
			
 
				+Example: "`[` _implicit parameter list_ `]`" after `impl forall`
			
 
				+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp)
			
 
				+
			
 
				+### Optional clauses
			
 
				+
			
 
				+#### Case 1: introducer to optional clause is used as parent node
			
 
				+
			
 
				+**Example:** The optional `-> <return type expression>` in a function signature
			
 
				+uses this pattern, so `fn foo() -> u32;` is transformed to:
			
 
				+
			
 
				+```yaml
			
 
				+  {kind: 'FunctionIntroducer', text: 'fn'},
			
 
				+  {kind: 'IdentifierName', text: 'foo'},
			
 
				+    {kind: 'TuplePatternStart', text: '('},
			
 
				+  {kind: 'TuplePattern', text: ')', subtree_size: 2},
			
 
				+    {kind: 'UnsignedIntTypeLiteral', text: 'u32'},
			
 
				+  {kind: 'ReturnType', text: '->', subtree_size: 2},
			
 
				+{kind: 'FunctionDecl', text: ';', subtree_size: 7},
			
 
				+```
			
 
				+
			
 
				+Note how the `->` token becomes a `ReturnType` node in the output tree, and is
			
 
				+moved after the `u32` type expression that becomes its child. Compare with the
			
 
				+parse tree output for `fn foo();` which has no `ReturnType` node:
			
 
				+
			
 
				+```yaml
			
 
				+  {kind: 'FunctionIntroducer', text: 'fn'},
			
 
				+  {kind: 'IdentifierName', text: 'foo'},
			
 
				+    {kind: 'TuplePatternStart', text: '('},
			
 
				+  {kind: 'TuplePattern', text: ')', subtree_size: 2},
			
 
				+{kind: 'FunctionDecl', text: ';', subtree_size: 5},
			
 
				+```
			
 
				+
			
 
				+Here is the code from
			
 
				+[parse/handle_function.cpp](/toolchain/parse/handle_function.cpp) that does
			
 
				+this:
			
 
				+
			
 
				+```cpp
			
 
				+auto HandleFunctionAfterParams(Context& context) -> void {
			
 
				+  ...
			
 
				+  // If there is a return type, parse the expression before adding the return
			
 
				+  // type node.
			
 
				+  if (context.PositionIs(Lex::TokenKind::MinusGreater)) {
			
 
				+    context.PushState(State::FunctionReturnTypeFinish);
			
 
				+    context.ConsumeAndDiscard();
			
 
				+    context.PushStateForExpr(PrecedenceGroup::ForType());
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+auto HandleFunctionReturnTypeFinish(Context& context) -> void {
			
 
				+  auto state = context.PopState();
			
 
				+
			
 
				+  context.AddNode(NodeKind::ReturnType, state.token, state.subtree_start,
			
 
				+                  state.has_error);
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+The `->` token is saved by `context.PushState(`...`)`, so it is available as
			
 
				+`state.token` when calling
			
 
				+`context.AddNode(NodeKind::ReturnType, state.token,`...`)` later in
			
 
				+`HandleFunctionReturnTypeFinish`.
			
 
				+
			
 
				+Also see how the optional initializer is handled on `var`, treating the `=` as
			
 
				+its introducer in `HandleVarAfterPattern` and `HandleVarInitializer` in
			
 
				+[parse/handle_var.cpp](/toolchain/parse/handle_var.cpp).
			
 
				+
			
 
				+#### Case 2: parent node is required token after optional clause, with different parent node kinds for different options
			
 
				+
			
 
				+**Example:** The optional type expression before `as` in `impl as` is
			
 
				+represented by producing two different output parse nodes for `as`. It outputs a
			
 
				+`DefaultSelfImplAs` node with no children when the type expression is absent,
			
 
				+and otherwise a `TypeImplAs` parse node with the type expression as its child.
			
 
				+
			
 
				+So `impl bool as Interface;` is transformed to:
			
 
				+
			
 
				+```yaml
			
 
				+  {kind: 'ImplIntroducer', text: 'impl'},
			
 
				+    {kind: 'BoolTypeLiteral', text: 'bool'},
			
 
				+  {kind: 'TypeImplAs', text: 'as', subtree_size: 2},
			
 
				+  {kind: 'IdentifierNameExpr', text: 'Interface'},
			
 
				+{kind: 'ImplDecl', text: ';', subtree_size: 5},
			
 
				+```
			
 
				+
			
 
				+while `impl as Interface;` is transformed to:
			
 
				+
			
 
				+```yaml
			
 
				+  {kind: 'ImplIntroducer', text: 'impl'},
			
 
				+  {kind: 'DefaultSelfImplAs', text: 'as'},
			
 
				+  {kind: 'IdentifierNameExpr', text: 'Interface'},
			
 
				+{kind: 'ImplDecl', text: ';', subtree_size: 4},
			
 
				+```
			
 
				+
			
 
				+This is handled by the `ExpectAsOrTypeExpression` code from
			
 
				+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp):
			
 
				+
			
 
				+```cpp
			
 
				+if (context.PositionIs(Lex::TokenKind::As)) {
			
 
				+  // as <expression> ...
			
 
				+  context.AddLeafNode(NodeKind::DefaultSelfImplAs, context.Consume());
			
 
				+  context.PushState(State::Expr);
			
 
				+} else {
			
 
				+  // <expression> as <expression>...
			
 
				+  context.PushState(State::ImplBeforeAs);
			
 
				+  context.PushStateForExpr(PrecedenceGroup::ForImplAs());
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+and then `HandleImplBeforeAs` creates the parent node in the second case:
			
 
				+
			
 
				+```cpp
			
 
				+auto state = context.PopState();
			
 
				+if (auto as = context.ConsumeIf(Lex::TokenKind::As)) {
			
 
				+  context.AddNode(NodeKind::TypeImplAs, *as, state.subtree_start,
			
 
				+                  state.has_error);
			
 
				+  context.PushState(State::Expr);
			
 
				+} else {
			
 
				+  if (!state.has_error) {
			
 
				+    CARBON_DIAGNOSTIC(ImplExpectedAs, Error,
			
 
				+                      "Expected `as` in `impl` declaration.");
			
 
				+    context.emitter().Emit(*context.position(), ImplExpectedAs);
			
 
				+  }
			
 
				+  context.ReturnErrorOnState();
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Note (1) that the `state.subtree_start` value comes from the
			
 
				+`context.PushState(State::ImplBeforeAs);` before parsing the type expression,
			
 
				+and that is how that type expression ends up as the child of the created
			
 
				+`TypeImplAs` node. Unlike
			
 
				+[the previous case 1](#case-1-introducer-to-optional-clause-is-used-as-parent-node),
			
 
				+though, the parent node uses the token after the optional expression, rather
			
 
				+than an introducer token for the optional clause.
			
 
				+
			
 
				+Note (2) how `HandleImplBeforeAs` handles three cases of errors:
			
 
				+
			
 
				+-   `as` present but an error in the child type expression -> error on the
			
 
				+    output `TypeImplAs` node, but not propagated to the parent.
			
 
				+-   Error from no `as` present but the type expression was okay -> create a new
			
 
				+    error.
			
 
				+-   There was error from the child type expression and no `as` present -> no new
			
 
				+    diagnostic, we suppress errors once one is emitted until we can recover.
			
 
				+
			
 
				+If there is no `as` token, we don't output either a `TypeImplAs` or a
			
 
				+`DefaultSelfImplAs` node, as required by the parent node, so in those cases we
			
 
				+mark the parent as having an error.
			
 
				+
			
 
				+#### Case 3: optional sibling
			
 
				+
			
 
				+> TODO: This was changed by
			
 
				+> [#3678](https://github.com/carbon-language/carbon-lang/pull/3678) and needs to
			
 
				+> be updated.
			
 
				+
			
 
				+**Example:** The optional type expression before `as` in `impl as` is output as
			
 
				+an optional sibling subtree between the `ImplIntroducer` node for the `impl`
			
 
				+introducer and the `ImplAs` node for the required `as` keyword.
			
 
				+
			
 
				+`impl bool as Interface;` is transformed to:
			
 
				+
			
 
				+```yaml
			
 
				+  {kind: 'ImplIntroducer', text: 'impl'},
			
 
				+  {kind: 'BoolTypeLiteral', text: 'bool'},
			
 
				+  {kind: 'ImplAs', text: 'as'},
			
 
				+  {kind: 'IdentifierNameExpr', text: 'Interface'},
			
 
				+{kind: 'ImplDecl', text: ';', subtree_size: 5},
			
 
				+```
			
 
				+
			
 
				+while `impl as Interface;` is transformed to:
			
 
				+
			
 
				+```yaml
			
 
				+  {kind: 'ImplIntroducer', text: 'impl'},
			
 
				+  {kind: 'ImplAs', text: 'as'},
			
 
				+  {kind: 'IdentifierNameExpr', text: 'Interface'},
			
 
				+{kind: 'ImplDecl', text: ';', subtree_size: 4},
			
 
				+```
			
 
				+
			
 
				+This is handled by the `ExpectAsOrTypeExpression` code from
			
 
				+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp):
			
 
				+
			
 
				+```cpp
			
 
				+if (context.PositionIs(Lex::TokenKind::As)) {
			
 
				+  // as <expression> ...
			
 
				+  context.AddLeafNode(NodeKind::ImplAs, context.Consume());
			
 
				+  context.PushState(State::Expr);
			
 
				+} else {
			
 
				+  // <expression> as <expression>...
			
 
				+  context.PushState(State::ImplBeforeAs);
			
 
				+  context.PushStateForExpr(PrecedenceGroup::ForImplAs());
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+and then `HandleImplBeforeAs` follows
			
 
				+[the "something required in context" pattern](#something-required-in-context) to
			
 
				+deal with the `as` that follows when the type expression is present.
			
 
				+
			
 
				+### Operators
			
 
				+
			
 
				+FIXME
			
 
				+
			
 
				+An independent description of our approach:
			
 
				+["Better operator precedence" on scattered-thoughts.net](https://www.scattered-thoughts.net/writing/better-operator-precedence/)
			
--- a/toolchain/docs/parse.svg
+++ b/toolchain/docs/parse.svg
--- a/website/prebuild.py
+++ b/website/prebuild.py
@@ -189,7 +189,9 @@ def main() -> None:
 
				 
			
 
				     # Reset the order for the implementation children.
			
 
				     nav_order[0] = 0
			
 
				-    label_subdir("toolchain", next(nav_order), parent_title="Implementation")
			
 
				+    label_subdir(
			
 
				+        "toolchain/docs", next(nav_order), parent_title="Implementation"
			
 
				+    )
			
 
				     label_subdir("explorer", next(nav_order), parent_title="Implementation")
			
 
				     label_subdir("testing", next(nav_order), parent_title="Implementation")