ソースを参照

Move toolchain architecture to markdown (#4242)

Note I'm mostly trying to capture [the
docs](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw&tab=t.0)
as they exist today, not fixing issues with the docs. I think the doc
itself hasn't changed much lately (i.e., for months). Trying to organize
it a little better though, particularly so that it shows up reasonably
when looking in github or the website.

---------

Co-authored-by: Geoff Romer <gromer@google.com>
Jon Ross-Perkins 1 年間 前
コミット
a24816a1f4

+ 1 - 3
toolchain/README.md

@@ -6,6 +6,4 @@ Exceptions. See /LICENSE for license information.
 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -->
 
-A design is currently maintained in
-[Google Drive](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit?resourcekey=0-kHyqOESbOHmzZphUbtLrTw).
-It'll be migrated to markdown once we are confident in its stability.
+See [docs](docs/).

+ 94 - 0
toolchain/docs/README.md

@@ -0,0 +1,94 @@
+# Toolchain architecture
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Goals](#goals)
+-   [High-level architecture](#high-level-architecture)
+    -   [Design patterns](#design-patterns)
+-   [Adding features](#adding-features)
+
+<!-- tocstop -->
+
+## Goals
+
+The toolchain represents the production portion of Carbon. At a high level, the
+toolchain's top priorities are:
+
+-   Correctness.
+-   Quality of generated code, including performance.
+-   Compilation performance.
+-   Quality of diagnostics for incorrect or questionable code.
+
+TODO: Add an expanded document that details the goals and priorities and link to
+it here.
+
+## High-level architecture
+
+The main components are:
+
+-   [Driver](driver.md): Provides commands and ties together compilation flow.
+-   [Diagnostics](diagnostics.md): Produces diagnostic output.
+-   Compilation flow:
+
+    1. Source: Load the file into a
+       [SourceBuffer](/toolchain/source/source_buffer.h).
+    2. [Lex](lex.md): Transform a SourceBuffer into a
+       [Lex::TokenizedBuffer](/toolchain/lex/tokenized_buffer.h).
+    3. [Parse](parse.md): Transform a TokenizedBuffer into a
+       [Parse::Tree](/toolchain/parse/tree.h).
+    4. [Check](check.md): Transform a Tree to produce
+       [SemIR::File](/toolchain/sem_ir/file.h).
+    5. [Lower](lower.md): Transform the SemIR to an
+       [LLVM Module](https://llvm.org/doxygen/classllvm_1_1Module.html).
+    6. CodeGen: Transform the LLVM Module into an Object File.
+
+### Design patterns
+
+A few common design patterns are:
+
+-   Distinct steps: Each step of processing produces an output structure,
+    avoiding callbacks passing data between structures.
+
+    -   For example, the parser takes a `Lex::TokenizedBuffer` as input and
+        produces a `Parse::Tree` as output.
+
+    -   Performance: It should yield better locality versus a callback approach.
+
+    -   Understandability: Each step has a clear input and output, versus
+        callbacks which obscure the flow of data.
+
+-   Vectorized storage: Data is stored in vectors and flyweights are passed
+    around, avoiding more typical heap allocation with pointers.
+
+    -   For example, the parse tree is stored as a
+        `llvm::SmallVector<Parse::Tree::NodeImpl>` indexed by `Parse::Node`
+        which wraps an `int32_t`.
+
+    -   Performance: Vectorization both minimizes memory allocation overhead and
+        enables better read caching because adjacent entries will be cached
+        together.
+
+-   Iterative processing: We rely on state stacks and iterative loops for
+    parsing, avoiding recursive function calls.
+
+    -   For example, the parser has a `Parse::State` enum tracked in
+        `state_stack_`, and loops in `Parse::Tree::Parse`.
+
+    -   Scalability: Complex code must not cause recursion issues. We have
+        experience in Clang seeing stack frame recursion limits being hit in
+        unexpected ways, and non-recursive approaches largely avoid that risk.
+
+See also [Idioms](idioms.md) for abbreviations and more implementation
+techniques.
+
+## Adding features
+
+We have a [walkthrough for adding features](adding_features.md).

+ 433 - 0
toolchain/docs/adding_features.md

@@ -0,0 +1,433 @@
+# Adding features
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Lex](#lex)
+-   [Parse](#parse)
+    -   [Typed parse node metadata implementation](#typed-parse-node-metadata-implementation)
+-   [Check](#check)
+    -   [SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation)
+-   [Lower](#lower)
+-   [Tests and debugging](#tests-and-debugging)
+    -   [Running tests](#running-tests)
+    -   [Updating tests](#updating-tests)
+        -   [Reviewing test deltas](#reviewing-test-deltas)
+    -   [Verbose output](#verbose-output)
+    -   [Stack traces](#stack-traces)
+
+<!-- tocstop -->
+
+## Lex
+
+New lexed tokens must be added to
+[token_kind.def](/toolchain/lex/token_kind.def). `CARBON_SYMBOL_TOKEN` and
+`CARBON_KEYWORD_TOKEN` both provide some built-in lexing logic, while
+`CARBON_TOKEN` requires custom lexing support.
+
+[TokenizedBuffer::Lex](/toolchain/lex/tokenized_buffer.h) is the main dispatch
+for lexing, and calls that need to do custom lexing will be dispatched there.
+
+## Parse
+
+A parser feature will have state transitions that produce new parse nodes.
+
+The resulting parse nodes are in
+[parse/node_kind.def](/toolchain/parse/node_kind.def) and
+[typed_nodes.h](/toolchain/parse/typed_nodes.h). When choosing node structure,
+consider how semantics will process it in post-order; this will rule out some
+designs. Adding a parse node kind will also require a handler in the `Check`
+step.
+
+The state transitions are in [parse/state.def](/toolchain/parse/state.def). Each
+`CARBON_PARSER_STATE` defines a distinct state and has comments for state
+transitions. If several states should share handling, name them
+`FeatureAsVariant`.
+
+Adding a state requires adding a `Handle<name>` function in an appropriate
+`parse/handle_*.cpp` file, possibly a new file. The macros are used to generate
+declarations in the header, so only extra helper functions should be added
+there. Every state handler pops the state from the stack before any other
+processing.
+
+### Typed parse node metadata implementation
+
+As of [#3534](https://github.com/carbon-language/carbon-lang/pull/3534):
+
+![parse](parse.svg)
+
+> TODO: Convert this chart to Mermaid.
+
+-   [common/enum_base.h](/common/enum_base.h) defines the `EnumBase`
+    [CRTP](idioms.md#crtp-or-curiously-recurring-template-pattern) class
+    extending `Printable` from [common/ostream.h](/common/ostream.h), along with
+    `CARBON_ENUM` macros for making enumerations
+
+-   [parse/node_kind.h](/toolchain/parse/node_kind.h) includes
+    [common/enum_base.h](/common/enum_base.h) and defines an enumeration
+    `NodeKind`, along with bitmask enum `NodeCategory`.
+
+    -   The `NodeKind` enumeration is populated with the list of all parse node
+        kinds using [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
+        [the .def file idiom](idioms.md#def-files)) _declared_ in this file
+        using a macro from [common/enum_base.h](/common/enum_base.h)
+
+    -   `NodeKind` has a member type `NodeKind::Definition` that extends
+        `NodeKind` and adds a `NodeCategory` field (and others in the future).
+
+    -   `NodeKind` has a method `Define` for creating a `NodeKind::Definition`
+        with the same enumerant value, plus values for the other fields.
+
+    -   `HasKindMember<T>` at the bottom of
+        [parse/node_kind.h](/toolchain/parse/node_kind.h) uses
+        [field detection](idioms.md#field-detection) to determine if the type
+        `T` has a `NodeKind::Definition Kind` static constant member.
+
+        -   Note: both the type and name of these fields must match exactly.
+
+    -   Note that additional information is needed to define the `category()`
+        method (and other methods in the future) of `NodeKind`. This information
+        comes from the typed parse node definitions in
+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) (described below).
+
+-   [parse/node_ids.h](/toolchain/parse/node_ids.h) defines a number of types
+    that store a _node id_ that identifies a node in the parse tree
+
+    -   `NodeId` stores a node id with no restrictions
+
+    -   `NodeIdForKind<Kind>` inherits from `NodeId` and stores the id of a node
+        that must have the specified `NodeKind` "`Kind`". Note that this is not
+        used directly, instead aliases `FooId` for
+        `NodeIdForKind<NodeKind::Foo>` are defined for every node kind using
+        [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
+        [the .def file idiom](idioms.md#def-files)).
+
+    -   `NodeIdInCategory<Category>` inherits from `NodeId` and stores the id of
+        a node that must overlap the specified `NodeCategory` "`Category`". Note
+        that this is not typically used directly, instead this file defines
+        aliases `AnyDeclId`, `AnyExprId`, ..., `AnyStatementId`.
+
+    -   Similarly `NodeIdOneOf<T, U>` and `NodeIdNot<V>` inherit from `NodeId`
+        and stores the id of a node restricted to either matching `T::Kind` or
+        `U::Kind` or not matching `V::Kind`.
+    -   In addition to the node id type definitions above, the struct
+        `NodeForId<T>` is declared but not defined.
+
+-   [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) defines a typed parse
+    node struct type for each kind of parse node.
+
+    -   Each one defines a static constant named `Kind` that is set using a call
+        to `Define()` on the corresponding enumerant member of `NodeKind` from
+        [parse/node_kind.h](/toolchain/parse/node_kind.h) (which is included by
+        this file).
+    -   The fields of these types specify the children of the parse node using
+        the types from [parse/node_ids.h](/toolchain/parse/node_ids.h).
+
+    -   The struct `NodeForId<T>` that is declared in
+        [parse/node_ids.h](/toolchain/parse/node_ids.h) is defined in this file
+        such that `NodeForId<FooId>::TypedNode` is the `Foo` typed parse node
+        struct type.
+
+    -   This file will fail to compile unless every kind of parse node kind
+        defined in [parse/node_kind.def](/toolchain/parse/node_kind.def) has a
+        corresponding struct type in this file.
+
+-   [parse/node_kind.cpp](/toolchain/parse/node_kind.cpp) includes both
+    [parse/node_kind.h](/toolchain/parse/node_kind.h) and
+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h)
+
+    -   Uses the macro from [common/enum_base.h](/common/enum_base.h), the
+        enumerants of `NodeKind` are _defined_ using the list of parse node
+        kinds from [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
+        [the .def file idiom](idioms.md#def-files)).
+
+    -   `NodeKind::definition()` is defined. It has a static table of
+        `const NodeKind::Definition*` indexed by the enum value, populated by
+        taking the address of the `Kind` member of each typed parse node struct
+        type, using the list from
+        [parse/node_kind.def](/toolchain/parse/node_kind.def).
+
+    -   `NodeKind::category()` is defined using `NodeKind::definition()`.
+
+    -   Tested assumption: the tables built in this file are indexed by the enum
+        values. We rely on the fact that we get the parse node kinds in the same
+        order by consistently using
+        [parse/node_kind.def](/toolchain/parse/node_kind.def).
+
+-   [parse/tree.h](/toolchain/parse/tree.h) includes
+    [parse/node_ids.h](/toolchain/parse/node_ids.h). It does not depend on
+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) to reduce compilation
+    time in those files that don't use the typed parse node struct types.
+
+    -   Defines `Tree::Extract`... functions that take a node id and return a
+        typed parse node struct type from
+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h).
+
+    -   Uses `HasKindMember<T>` to restrict calling `ExtractAs` except on typed
+        nodes defined in [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h).
+
+    -   `Tree::Extract` uses `NodeForId<T>` to get the corresponding typed parse
+        node struct type for a `FooId` type defined in
+        [parse/node_ids.h](/toolchain/parse/node_ids.h).
+
+        -   Note that this is done without a dependency on the typed parse node
+            struct types by using the forward declaration of `NodeForId<T>` from
+            [parse/node_ids.h](/toolchain/parse/node_ids.h).
+
+    -   The `Tree::Extract`... functions ultimately call
+        `Tree::TryExtractNodeFromChildren<T>`, which is a templated function
+        only declared in this file. Its definition is in
+        [parse/extract.cpp](/toolchain/parse/extract.cpp).
+
+-   [parse/extract.cpp](/toolchain/parse/extract.cpp) includes
+    [parse/tree.h](/toolchain/parse/tree.h) and
+    [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h)
+
+    -   Defines struct `Extractable<T>` that defines how to extract a field of
+        type `T` from a `Tree::SiblingIterator` pointing at the corresponding
+        child node.
+
+    -   `Extractable<T>` is defined for the node id types defined in
+        [parse/node_ids.h](/toolchain/parse/node_ids.h).
+
+    -   In addition, `Extractable<T>` is defined for standard types
+        `std::optional<U>` and `llvm::SmallVector<V>`, to support optional and
+        repeated children.
+
+    -   Uses [struct reflection](idioms.md#struct-reflection) to support
+        aggregate struct types containing extractable fields. This is used to
+        support typed parse node struct types as well as struct fields that they
+        contain.
+
+    -   Uses `HasKindMember<Foo>` to detect accidental uses of a parse node type
+        directly as fields of typed parse node struct types -- in those places
+        `FooId` should be used instead.
+
+    -   Defines `Tree::TryExtractNodeFromChildren<T>` and explicitly
+        instantiates it for every typed parse node struct type defined in
+        [parse/typed_nodes.h](/toolchain/parse/typed_nodes.h) using
+        [parse/node_kind.def](/toolchain/parse/node_kind.def) (using
+        [the .def file idiom](idioms.md#def-files)). By explicitly instantiating
+        this function only in this file, we avoid redundant compilation work,
+        which reduces build times, and allow us to keep all the extraction
+        machinery as a private implementation detail of this file.
+
+-   [parse/typed_nodes_test.cpp](/toolchain/parse/typed_nodes_test.cpp)
+    validates that each typed parse node struct type has a static `Kind` member
+    that defines the correct corresponding `NodeKind`, and that the `category()`
+    function agrees between the `NodeKind` and `NodeKind::Definition`.
+
+Note: this is broadly similar to
+[SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation).
+
+## Check
+
+Each parse node kind requires adding a `Handle<kind>` function in a
+`check/handle_*.cpp` file.
+
+If the resulting SemIR needs a new instruction:
+
+-   add a new kind to [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
+    -   Add a `CARBON_SEM_IR_INST_KIND(NewInstKindName)` line in alphabetical
+        order
+-   a new struct definition to
+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h), such as:
+
+    ```cpp
+    struct NewInstKindName {
+        static constexpr auto Kind = InstKind::NewInstKindName.Define(
+            // the name used in textual IR
+            "new_inst_kind_name"
+            // Optional: , TerminatorKind::KindOfTerminator
+            );
+
+        // Optional: omit if not associated with a parse node.
+        Parse::Node parse_node;
+
+        // Optional: omit if this sem_ir instruction does not produce a value.
+        TypeId type_id;
+
+        // 0-2 id fields, with types from sem_ir/ids.h or sem_ir/builtin_kind.h
+        // For example, fields would look like:
+        StringId name_id;
+        InstId value_id;
+    };
+    ```
+
+Adding an instruction will also require a handler in the Lower step.
+
+Most new instructions will automatically be formatted reasonably by the SemIR
+formatter.
+
+If the resulting SemIR needs a new built-in, add it to
+[builtin_inst_kind.def](/toolchain/sem_ir/builtin_inst_kind.def).
+
+### SemIR typed instruction metadata implementation
+
+How does this work? As of
+[#3310](https://github.com/carbon-language/carbon-lang/pull/3310):
+
+![check](check.svg)
+
+> TODO: Convert this chart to Mermaid.
+
+-   [common/enum_base.h](/common/enum_base.h) defines the `EnumBase`
+    [CRTP](idioms.md#crtp-or-curiously-recurring-template-pattern) class
+    extending `Printable` from [common/ostream.h](/common/ostream.h), along with
+    `CARBON_ENUM` macros for making enumerations
+
+-   [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) includes
+    [common/enum_base.h](/common/enum_base.h) and defines an enumeration
+    `InstKind`, along with `InstValueKind` and `TerminatorKind`.
+
+    -   The `InstKind` enumeration is populated with the list of all instruction
+        kinds using [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
+        (using [the .def file idiom](idioms.md#def-files)) _declared_ in this
+        file using a macro from [common/enum_base.h](/common/enum_base.h)
+
+    -   `InstKind` has a member type `InstKind::Definition` that extends
+        `InstKind` and adds the `ir_name` string field, and a `TerminatorKind`
+        field.
+
+    -   `InstKind` has a method `Define` for creating a `InstKind::Definition`
+        with the same enumerant value, plus values for the other fields.
+
+-   Note that additional information is needed to define the `ir_name()`,
+    `value_kind()`, and `terminator_kind()` methods of `InstKind`. This
+    information comes from the typed instruction definitions in
+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h).
+
+-   [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) defines a typed
+    instruction struct type for each kind of SemIR instruction, as described
+    above.
+
+    -   Each one defines a static constant named `Kind` that is set using a call
+        to `Define()` on the corresponding enumerant member of `InstKind` from
+        [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) (which is included
+        by this file).
+
+-   `HasParseNodeMember<TypedInst>` and `HasTypeIdMember<TypedInst>` at the
+    bottom of [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) use
+    [field detection](idioms.md#field-detection) to determine if `TypedInst` has
+    a `Parse::Node parse_node` or a `TypeId type_id` field respectively.
+
+    -   Note: both the type and name of these fields must match exactly.
+
+-   [sem_ir/inst_kind.cpp](/toolchain/sem_ir/inst_kind.cpp) includes both
+    [sem_ir/inst_kind.h](/toolchain/sem_ir/inst_kind.h) and
+    [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h)
+
+    -   Uses the macro from [common/enum_base.h](/common/enum_base.h), the
+        enumerants of `InstKind` are _defined_ using the list of instruction
+        kinds from [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
+        (using [the .def file idiom](idioms.md#def-files))
+
+    -   `InstKind::value_kind()` is defined. It has a static table of
+        `InstValueKind` values indexed by the enum value, populated by applying
+        `HasTypeIdMember` from
+        [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) to every
+        instruction kind by using the list from
+        [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
+    -   `InstKind::definition()` is defined. It has a static table of
+        `const InstKind::Definition*` indexed by the enum value, populated by
+        taking the address of the `Kind` member of each `TypedInst`, using the
+        list from [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
+
+    -   `InstKind::ir_name()` and `InstKind::terminator_kind()` are defined
+        using `InstKind::definition()`.
+    -   Tested assumption: the tables built in this file are indexed by the enum
+        values. We rely on the fact that we get the instruction kinds in the
+        same order by consistently using
+        [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
+
+    -   This file will fail to compile unless every kind of SemIR instruction
+        defined in [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def) has a
+        corresponding struct type in
+        [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h).
+
+-   `TypedInstArgsInfo<TypedInst>` defined in
+    [sem_ir/inst.h](/toolchain/sem_ir/inst.h) uses
+    [struct reflection](idioms.md#struct-reflection) to determine the other
+    fields from `TypedInst`. It skips the `parse_node` and `type_id` fields
+    using `HasParseNodeMember<TypedInst>` and `HasTypeIdMember<TypedInst>`.
+
+    -   Tested assumption: the `parse_node` and `type_id` are the first fields
+        in `TypedInst`, and there are at most two more fields.
+
+-   [sem_ir/inst.h](/toolchain/sem_ir/inst.h) defines templated conversions
+    between `Inst` and each of the typed instruction structs:
+
+    -   Uses `TypedInstArgsInfo<TypedInst>`, `HasParseNodeMember<TypedInst>`,
+        and `HasTypeIdMember<TypedInst>`, and
+        [local lambda](idioms.md#local-lambdas-to-reduce-duplicate-code).
+
+    -   Defines a templated `ToRaw` function that converts the various id field
+        types to an `int32_t`.
+    -   Defines a templated `FromRaw<T>` function that converts an `int32_t` to
+        `T` to perform the opposite conversion.
+    -   Tested assumption: The `parse_node` field is first, when present, and
+        the `type_id` is next, when present, in each `TypedInst` struct type.
+
+-   The "tested assumptions" above are all tested by
+    [sem_ir/typed_insts_test.cpp](/toolchain/sem_ir/typed_insts_test.cpp)
+
+## Lower
+
+Each SemIR instruction requires adding a `Handle<kind>` function in a
+`lower/handle_*.cpp` file.
+
+## Tests and debugging
+
+### Running tests
+
+Tests are run in bulk as `bazel test //toolchain/...`. Many tests are using the
+file_test infrastructure; see
+[testing/file_test/README.md](/testing/file_test/README.md) for information.
+
+There are several supported ways to run Carbon on a given test file. For
+example, with `toolchain/parse/testdata/basics/empty.carbon`:
+
+-   `bazel test //toolchain/testing:file_test --test_arg=--file_tests=toolchain/parse/testdata/basics/empty.carbon`
+    -   Executes an individual test.
+-   `bazel run //toolchain/parse:testdata/basics/empty.carbon.run`
+    -   Runs `carbon` on the file with standard arguments, printing output to
+        console.
+    -   This form will often be most useful when iterating over a specific test.
+-   `bazel run //toolchain/parse:testdata/basics/empty.carbon.verbose`
+    -   Similar to the previous command, but with the `-v` flag implied.
+-   `bazel run //toolchain/driver:carbon -- compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon`
+    -   Explicitly runs `carbon` with the provided arguments.
+-   `bazel-bin/toolchain/driver/carbon compile --phase=parse --dump-parse-tree toolchain/parse/testdata/basics/empty.carbon`
+    -   Similar to the previous command, but without using `bazel`.
+
+### Updating tests
+
+The `toolchain/autoupdate_testdata.py` script can be used to update output. It
+invokes the `file_test` autoupdate support. See
+[testing/file_test/README.md](/testing/file_test/README.md) for file syntax.
+
+#### Reviewing test deltas
+
+Using `autoupdate_testdata.py` can be useful to produce deltas during the
+development process because it allows `git status` and `git diff` to be used to
+examine what changed.
+
+### Verbose output
+
+The `-v` flag can be passed to trace state, and should be specified before the
+subcommand name: `carbon -v compile ...`. `CARBON_VLOG` is used to print output
+in this mode. There is currently no control over the degree of verbosity.
+
+### Stack traces
+
+While the iterative processing pattern means function stack traces will have
+minimal context for how the current function is reached, we use LLVM's
+`PrettyStackTrace` to include details about the state stack. The state stack
+will be above the function stack in crash output.

+ 616 - 0
toolchain/docs/check.md

@@ -0,0 +1,616 @@
+# Check
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+-   [Postorder processing](#postorder-processing)
+-   [Key IR concepts](#key-ir-concepts)
+    -   [Parameters and arguments](#parameters-and-arguments)
+-   [SemIR textual format](#semir-textual-format)
+    -   [Raw form](#raw-form)
+    -   [Formatted IR](#formatted-ir)
+        -   [Instructions](#instructions)
+        -   [Top-level entities](#top-level-entities)
+-   [Core loop](#core-loop)
+    -   [Node stack](#node-stack)
+    -   [Delayed evaluation (not yet implemented)](#delayed-evaluation-not-yet-implemented)
+    -   [Templates (not yet implemented)](#templates-not-yet-implemented)
+    -   [Rewrites](#rewrites)
+-   [Types](#types)
+    -   [Type printing (not yet implemented)](#type-printing-not-yet-implemented)
+-   [Expression categories](#expression-categories)
+    -   [ExprCategory::NotExpression](#exprcategorynotexpression)
+    -   [ExprCategory::Value](#exprcategoryvalue)
+    -   [ExprCategory::DurableReference and ExprCategory::EphemeralReference](#exprcategorydurablereference-and-exprcategoryephemeralreference)
+    -   [ExprCategory::Initializing](#exprcategoryinitializing)
+    -   [ExprCategory::Mixed](#exprcategorymixed)
+    -   [Value bindings](#value-bindings)
+-   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
+
+<!-- tocstop -->
+
+## Overview
+
+Check takes the parse tree and generates a semantic intermediate representation,
+or SemIR. This will look closer to a series of instructions, in preparation for
+transformation to LLVM IR. Semantic analysis and type checking occurs during the
+production of SemIR. It also does any validation that requires context.
+
+## Postorder processing
+
+The checking step is oriented on postorder processing on the `Parse::Tree` to
+iterate through the `Parse::NodeImpl` vectorized storage once, in order, as much
+as possible. This is primarily for performance, but also relies on the
+[information accumulation principle](/docs/project/principles/information_accumulation.md):
+that is, when that principle applies, we should be able to generate IR
+immediately because we can rely on the principle that when a line is processed,
+the information necessary to semantically check that line is already available.
+
+Indirectly, what this really means is that we should be able to go from a
+Parse::Tree (which cannot be used for name lookups) to a SemIR with name lookups
+completed in a single pass. The SemIR should not need to be re-processed to add
+more information outside of templates. By doing this, we avoid an additional
+processing pass with associated storage needs.
+
+This single-pass approach also means that the checking step does not make use of
+the tree structure of the `Parse::Tree`. In cases where the actions performed
+for a parse tree node depend on the context in which that node appears, a node
+that is visited earlier in the postorder traversal, such as a bracketing node,
+needs to establish the necessary context. In this respect, the sequence of
+`Parse::Node`s can be thought of as a byte code input that the check step
+interprets to build the `SemIR`.
+
+## Key IR concepts
+
+A `SemIR::Inst` is the basic building block that represents a simple
+instruction, such as an operator or declaring a literal. For each kind of
+instruction, a typedef for that specific kind of instruction is provided in the
+`SemIR` namespace. For example, `SemIR::Assign` represents an assignment
+instruction, and `SemIR::PointerType` represents a pointer type instruction.
+
+Each instruction class has up to four public data members describing the
+instruction, as described in
+[sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h) (also see
+[adding features for Check](adding_features.md#check)):
+
+-   A `Parse::Node parse_node;` member that tracks its location is present on
+    almost all instructions, except instructions like `SemIR::Builtin` that
+    don't have an associated location.
+
+-   A `SemIR::TypeId type_id;` member that describes the type of the instruction
+    is present on all instructions that produce a value. This includes namespace
+    instructions, which are modeled as producing a value of "namespace" type,
+    even though they can't be used as a first-class value in Carbon expressions.
+
+-   Up to two additional, kind-specific members. For example `SemIR::Assign` has
+    members `InstId lhs_id` and `InstId rhs_id`.
+
+Instructions are stored as type-erased `SemIR::Inst` objects, which store the
+instruction kind and the (up to) four fields described above. This balances the
+size of `SemIR::Inst` against the overhead of indirection.
+
+A `SemIR::InstBlock` can represent a code block. However, it can also be created
+when a series of instructions needs to be closely associated, such as a
+parameter list.
+
+A `SemIR::Builtin` represents a language built-in, such as the unconstrained
+facet type `type`. We will also have built-in functions which would need to form
+the implementation of some library types, such as `i32`. Built-ins are in a
+stable index across `SemIR` instances.
+
+### Parameters and arguments
+
+Parameters and arguments will be stored as two `SemIR::InstBlock`s each. The
+first will contain the full IR, while the second will contain references to the
+last instruction for each parameter or argument. The references block will have
+a size equal to the number of parameters or arguments, allowing for quick size
+comparisons and indexed access.
+
+## SemIR textual format
+
+There are two textual ways to view `SemIR`.
+
+### Raw form
+
+The raw form of SemIR shows the details of the representation, such as numeric
+instruction and block IDs. The representation is intended to very closely match
+the `SemIR::File` and `SemIR::Inst` representations. This can be useful when
+debugging low-level issues with the `SemIR` representation.
+
+The driver will print this when passed `--dump-raw-sem-ir`.
+
+### Formatted IR
+
+In addition to the raw form, there is a higher-level formatted IR that aims to
+be human readable. This is used in most `check` tests to validate the output,
+and also expected to be used regularly by toolchain developers to inspect the
+result of checking the parse tree.
+
+The driver will print this when passed `--dump-sem-ir`.
+
+Unlike the raw form, certain representational choices in the `SemIR` data may
+not be visible in this form. However, it is intended to be possible to parse the
+`SemIR` output and form an equivalent – but not necessarily identical – `SemIR`
+representation, although no such parser currently exists.
+
+As an example, given the program:
+
+```carbon
+fn Cond() -> bool;
+fn Run() -> i32 { return if Cond() then 1 else 2; }
+```
+
+The formatted IR is currently:
+
+```
+constants {
+  %.1: i32 = int_literal 1 [template]
+  %.2: i32 = int_literal 2 [template]
+}
+
+file {
+  package: <namespace> = namespace [template] {
+    .Cond = %Cond
+    .Run = %Run
+  }
+  %Cond: <function> = fn_decl @Cond [template] {
+    %return.var.loc1: ref bool = var <return slot>
+  }
+  %Run: <function> = fn_decl @Run [template] {
+    %return.var.loc2: ref i32 = var <return slot>
+  }
+}
+
+fn @Cond() -> bool;
+
+fn @Run() -> i32 {
+!entry:
+  %Cond.ref: <function> = name_ref Cond, file.%Cond [template = file.%Cond]
+  %.loc2_33.1: init bool = call %Cond.ref()
+  %.loc2_26.1: bool = value_of_initializer %.loc2_33.1
+  %.loc2_33.2: bool = converted %.loc2_33.1, %.loc2_26.1
+  if %.loc2_33.2 br !if.expr.then else br !if.expr.else
+
+!if.expr.then:
+  %.loc2_41: i32 = int_literal 1 [template = constants.%.1]
+  br !if.expr.result(%.loc2_41)
+
+!if.expr.else:
+  %.loc2_48: i32 = int_literal 2 [template = constants.%.2]
+  br !if.expr.result(%.loc2_48)
+
+!if.expr.result:
+  %.loc2_26.2: i32 = block_arg !if.expr.result
+  return %.loc2_26.2
+}
+```
+
+There are three kinds of names in formatted IR, which are distinguished by their
+leading sigils:
+
+-   `%name` denotes a value produced by an instruction. These names are
+    introduced by a line of the form `%name: <category> <type> = <instruction>`,
+    and are scoped to the enclosing top-level entity. `<category>` describes the
+    [expression category](#expression-categories), which is `init` for an
+    initializing expression, `ref` for a reference expression, or omitted for a
+    value expression. Typically, values can only be referenced by instructions
+    that their introduction
+    [dominates](<https://en.wikipedia.org/wiki/Dominator_(graph_theory)>), but
+    some kinds of instruction might have other rules. Names in the `file` block
+    can be referenced as `file.%<name>`.
+
+-   `!name` denotes a label, and `!name:` appears as a prefix of each
+    `InstBlock` in a `Function`. These names are scoped to their enclosing
+    function, and can be referenced anywhere in that function, but not outside.
+
+-   `@name` denotes a top-level entity, such as a function, class, or interface.
+    The SemIR view of these entities is flattened, so member functions are
+    treated as top-level entities.
+
+Names in formatted IR are all invented by the formatter, and generally are of
+the form `<base_name>[.loc<line>[_<col>[.<counter>]]]` where `<line>` and
+`<col>` describe the location of the instruction, and `<counter>` is used as a
+disambiguator if multiple instructions appear at the same location. Trailing
+name components are only included if they are necessary to disambiguate the
+name. `<base_name>` is a guessed good name for the instruction, often derived
+from source-level identifiers, and is empty if no guess was made.
+
+#### Instructions
+
+There is usually one line in a `InstBlock` for each `Inst`. You can find the
+documentation for the different kinds of instructions in
+[toolchain/sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h). For example,
+given a formatted SemIR line like:
+
+```
+%N: i32 = assoc_const_decl N [template]
+```
+
+you would look for a `struct` definition that uses `"assoc_const_decl"` as its
+`ir_name`. In this case, this is the `AssociatedConstantDecl` instruction:
+
+```cpp
+// An associated constant declaration in an interface, such as `let T:! type;`.
+struct AssociatedConstantDecl {
+  static constexpr auto Kind =
+      InstKind::AssociatedConstantDecl.Define<Parse::NodeId>(
+          {.ir_name = "assoc_const_decl", .is_lowered = false});
+
+  TypeId type_id;
+  NameId name_id;
+};
+```
+
+Since this instruction produces a value, it has a `TypeId type_id` field, which
+corresponds to the type written between the `:` and the `=`. In the example
+above, that type is `i32`. The other arguments to the instruction are written
+after the `ir_name` -- in this example the `name_id` is `N`. From this we find
+that the instruction corresponds to an associated constant declaration in an
+interface like `let N:! i32;`.
+
+Instructions producing a constant value, like `assoc_const_decl` above, are
+followed by their phase, either `[symbolic]` or `[template]`, and then `=` the
+value if it is the value of a different instruction.
+
+Instructions that do not produce a value, such as the `br` and `return`
+instructions above, omit the leading `%name: ... =` prefix, as they cannot be
+named by other instructions. These instructions do not have a `TypeId type_id`
+field, like the `AdaptDecl` instruction:
+
+```cpp
+// An adapted type declaration in a class, of the form `adapt T;`.
+struct AdaptDecl {
+  static constexpr auto Kind = InstKind::AdaptDecl.Define<Parse::AdaptDeclId>(
+      {.ir_name = "adapt_decl", .is_lowered = false});
+
+  // No type_id; this is not a value.
+  TypeId adapted_type_id;
+};
+```
+
+An `adapt SomeClass;` declaration would have the corresponding SemIR formatted
+as:
+
+```
+adapt_decl %SomeClass
+```
+
+Some instructions have special argument handling. For example, some invalid
+arguments will be omitted. Or an `InstBlockId` argument will be rendered inline,
+commonly enclosed in braces `{`...`}` or parens `(`...`)`. In other cases, the
+formatter will combine instructions together to make the IR more readable:
+
+-   A terminator sequence in a block, comprising a sequence of `BranchIf`
+    instructions followed by a `Branch` or `BranchWithArg` instruction, is
+    collapsed into a single
+    `if %cond br !label1 else if ... else br !labelN(%arg)` line.
+-   A struct type, formed by a sequence of `StructTypeField` instructions
+    followed by a `StructType` instruction, is collapsed into a single
+    `struct_type{.field1: %value1, ..., .fieldN: %valueN}` line.
+
+These exceptions may be found in
+[toolchain/sem_ir/formatter.cpp](/toolchain/sem_ir/formatter.cpp).
+
+#### Top-level entities
+
+**Question:** Are these too in flux to document at this time?
+
+-   `constants`: TODO
+-   `imports`: TODO
+-   `file`: TODO
+-   entities
+    -   TODO: may be preceded by `extern`.
+    -   TODO: may be preceded by `generic`.
+        -   These may have an optional `!definition:` section containing the
+            generic's `definition_block_id`.
+    -   `fn`: TODO; followed by `= "`...`"` for builtins
+    -   `class`: TODO
+    -   `interface`: TODO
+    -   `impl`: TODO
+-   `specific`: TODO
+    -   body in braces `{`...`}` has a bunch of
+        ``<generic parameter> => <specific value>` assignment lines
+    -   The first lines of the body describe the declaration
+    -   If there is a valid definition, there are additional definition
+        assignments after a `!definition:` line.
+
+## Core loop
+
+The core loop is `Check::CheckParseTree`. This loops through the `Parse::Tree`
+and calls a `Handle`... function corresponding to the `NodeKind` of each node.
+Communication between these functions for different nodes working together is
+through the `Context` object defined in
+[check/context.h](/toolchain/check/context.h), which stores things in a
+collection of stacks. The common pattern is that the children of a node are
+processed first. They produce information that is then consumed when processing
+the parent node.
+
+One example of this pattern is expressions. Each subexpression outputs SemIR
+instructions to compute the value of that subexpression to the current
+instruction block, added to the top of the `InstBlockStack` stored in the
+`Context` object. It leaves an instruction id on the top of the
+[node stack](#node-stack) pointing to the instruction that produces the value of
+that subexpression. Those are consumed by parent operations, like an
+[RPN](https://en.wikipedia.org/wiki/Reverse_Polish_notation) calculator. For
+example, the expression `1 * 2 + 3` corresponds to this parse tree:
+
+```yaml
+    {kind: 'IntegerLiteral', text: '1'},
+    {kind: 'IntegerLiteral', text: '2'},
+  {kind: 'InfixOperator', text: '*', subtree_size: 3},
+  {kind: 'IntegerLiteral', text: '3'},
+{kind: 'InfixOperator', text: '+', subtree_size: 5},
+```
+
+This parse tree is processed by one call to a `Handle` function per node:
+
+-   The first node is an integer literal, so the core loop calls
+    `HandleIntegerLiteral`.
+
+    -   It calls `context::AddInstAndPush` to output a `SemIR::IntegerLiteral`
+        instruction to the current instruction block, and pushes the parse node
+        along with the instruction id to the [node stack](#node-stack).
+
+-   The second node is also an integer literal, which outputs a second
+    instruction and pushes another entry onto the node stack.
+
+-   `HandleInfixOperator` pops the two entries off of the node stack, outputs
+    any conversion instructions that are needed, and uses
+    `context::AddInstAndPush` to create and push the instruction id representing
+    the output of a multiplication instruction. That multiplication instruction
+    takes the instruction ids it popped off the stack at the beginning as
+    arguments.
+
+-   Another integer literal instruction is created for `3` and pushed onto the
+    stack.
+
+-   `HandleInfixOperator` is called again. It pops the two instruction ids off
+    the stack to use as the arguments to the multiplication instruction it
+    creates and pushes.
+
+In this way, the handle functions coordinate producing their output using the
+instruction block stack and node block stack from the context.
+
+A similar pattern uses bracketing nodes to support parent nodes that can have a
+variable number of children. For example, a `return` statement can produce parse
+trees following a few different patterns:
+
+-   `return;`
+
+    ```yaml
+      {kind: 'ReturnStatementStart', text: 'return'},
+    {kind: 'ReturnStatement', text: ';', subtree_size: 2},
+    ```
+
+-   `return x;`
+
+    ```yaml
+      {kind: 'ReturnStatementStart', text: 'return'},
+      {kind: 'NameExpr', text: 'x'},
+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
+    ```
+
+-   `return var;`
+
+    ```yaml
+      {kind: 'ReturnStatementStart', text: 'return'},
+      {kind: 'ReturnVarModifier', text: 'var'},
+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
+    ```
+
+In all three cases, the introducer node `ReturnStatementStart` pushes an entry
+on the [node stack](#node-stack) with just the parse node and no id, called a
+_solo parse node_. The handler for the parent `ReturnStatement` node can pop and
+process entries from the node stack until it finds that solo parse node from
+`ReturnStatementStart` that indicates it is done.
+
+Another pattern that arises is state is set up by an introducer node, updated by
+its siblings, and then consumed by the bracketing parent node. FIXME: example
+
+### Node stack
+
+The node stack, defined in [check/node_stack.h](/toolchain/check/node_stack.h),
+stores pairs of a `Parse::Node` and an id. The type of the id is determined by
+the `NodeKind` of the parse node. It is the default, general-purpose stack used
+by `Handle`... functions in the check stage. Using a single stack is beneficial
+since it improves locality of reference and reduces allocations. However,
+additional stacks are used to ensure we never need to search through the stack
+to find data -- we always want to be operating on the top of the stack (or a
+fixed offset).
+
+The node stack contains any state pushed by siblings of the current
+`Parse::Node` at the top, and state pushed by siblings of ancestors below. The
+boundaries between what is a sibling of the current `Parse::Node` versus what is
+a sibling of an ancestor are not explicitly determined. Instead, the handler for
+the parent node knows how many nodes it must pop from the stack based either on
+knowing the fixed number of children for that node kind or popping nodes until
+it reaches a bracketing node. The arity or bracketing node kind for each parent
+node is documented in [parse/node_kind.def](/toolchain/parse/node_kind.def).
+
+When each `Parse::Node` is evaluated, the SemIR for it is typically immediately
+generated as `SemIR::Inst`s. To help generate the IR to an appropriate context,
+scopes have separate `SemIR::InstBlock`s.
+
+### Delayed evaluation (not yet implemented)
+
+Sometimes, nodes will need to have delayed evaluation; for example, an inline
+definition of a class member function needs to be evaluated after the class is
+fully declared. The `SemIR::Inst`s cannot be immediately generated because they
+may include name references to the class. We're likely to store a reference to
+the relevant `Parse::Node` for each definition for re-evaluation after the class
+scope completes. This means that nodes in a definition would be traversed twice,
+once while determining that they're inline and without full checking or IR
+generation, then again with full checking and IR generation.
+
+### Templates (not yet implemented)
+
+Templates need to have partial semantic checking when declared, but can't be
+fully implemented before they're instantiated against a specific type.
+
+We are likely to generate a partial IR for templates, allowing for checking with
+the incomplete information in the IR. Instantiation will likely use that IR and
+fill in the missing information, but it could also reevaluate the original
+`Parse::Node`s with the known template state.
+
+### Rewrites
+
+Carbon relies on rewrites of code, such as rewriting the destination of an
+initializer to a specific target object once that object is known.
+
+We have two ways to achieve this. One is to track the IR location of a
+placeholder instruction and, if it needs updating, replace it with a "rewrite"
+`SemIR::Inst` that points to a new `SemIR::InstBlock` containing the required IR
+and specifying which value is the result of that rewrite. This is expressed in
+SemIR as a `splice_block` instruction. Another is to track the list of
+instructions to be created separately from the node block stack, and merge those
+instructions into the current block once we have decided on their contents.
+
+## Types
+
+Type expressions are treated like any other expression, and are modeled as
+`SemIR::Inst`s. The types computed by type expressions are deduplicated,
+resulting in a canonical `SemIR::TypeId` for each distinct type.
+
+### Type printing (not yet implemented)
+
+The `TypeId` preserves only the identity of the type, not its spelling, and so
+printing it will produce a fully-resolved type name, which isn't a great user
+experience as it doesn't reflect how the type was written in the source code.
+
+Instead, when printing a type name for use in a diagnostic, we will start with
+one of two `InstId`s:
+
+-   A `InstId` for a type expression that describes the way the type was
+    computed.
+-   A `InstId` for an expression that has the given type.
+
+In the former case, the type is pretty-printed by walking the type expression
+and printing it. In the latter case, the type of the expression is reconstructed
+based on the form of the expression: for example, to print the type of `&x`, we
+print the type of `x` and append a `*`, being careful to take potential
+precedence issues into account.
+
+TODO: This requires being able to print the type of, for example,
+`x.foo[0].bar`, by printing only the desired portion of the type of `x`, and
+similarly may require handling the case where the type of an expression involves
+generic parameters whose arguments are specified by that expression. In effect,
+the type computation performed when checking an operation is duplicated into the
+type printing logic, but is simpler because errors don't need to be detected.
+
+This approach means we don't need to preserve a fully-sugared type for each
+expression instruction. Instead, we compute that type when we need to print it.
+
+## Expression categories
+
+Each `SemIR::Inst` that has an associated type also has an expression category,
+which describes how it produces a value of that type. These
+`SemIR::ExprCategory` values correspond to the Carbon expression categories
+defined in proposal
+[#2006](https://github.com/carbon-language/carbon-lang/pull/2006):
+
+### ExprCategory::NotExpression
+
+This instruction is not an expression instruction, and doesn't have an
+expression category. This is used for namespaces, control flow instructions, and
+other constructs that represent some non-expression-level semantics.
+
+### ExprCategory::Value
+
+This instruction produces a value using the type's value representation.
+Lowering the instruction will produce an LLVM value using that value
+representation.
+
+### ExprCategory::DurableReference and ExprCategory::EphemeralReference
+
+This instruction produces a reference to an object. Lowering will produce a
+pointer to an object representation.
+
+### ExprCategory::Initializing
+
+This instruction represents the initialization of an object. Depending on the
+initializing representation for the type, the initializing expression
+instruction will do one of the following:
+
+-   For an in-place initializing representation, the instruction will store a
+    value to the target of the initialization.
+
+-   For a by-copy initializing representation, the instruction will produce an
+    object representation by value that can be stored into the target. This is
+    currently only used in cases where the object representation and the value
+    representation are the same.
+
+-   For a type with no initializing representation, such as an empty struct or
+    tuple, it does neither of the above things.
+
+Regardless of the initializing representation, an initializing expression should
+be consumed by another instruction that finishes the initialization. For a
+by-copy initialization, this final instruction represents the store into the
+target, whereas in the other cases it is only used to track in SemIR how the
+initialization was used. When an in-place initializer uses a by-copy initializer
+as a subexpression, an `initialize_from` instruction is inserted to perform this
+final store.
+
+### ExprCategory::Mixed
+
+This instruction represents a language construct that doesn't have a single
+expression category. This is used for struct and tuple literals, where the
+elements of the literal can have different expression categories. Instructions
+with a mixed expression category are treated as a special case in conversion,
+which recurses into the elements of those instructions before performing
+conversions.
+
+### Value bindings
+
+A value binding represents a conversion from a reference expression to the value
+stored in that expression. There are three important cases here:
+
+-   For types with a by-copy value representation, such as `i32`, a value
+    binding represents a load from the address indicated by the reference
+    expression.
+
+-   For types with a by-pointer value representation, such as arrays and large
+    structs and tuples, a value binding implicitly takes the address of the
+    reference expression.
+
+-   For structs and tuples, the value representation is a struct or tuple of the
+    elements' value representations, which is not necessarily the same as a
+    struct or tuple of the elements' object representations. In the case where
+    the value representation is not a copy of, or pointer to, the object
+    representation, `value_binding` instructions are not used, and a
+    `tuple_value` or `struct_value` instruction is used to construct a value
+    representation instead. `value_binding` should still be used in the case
+    where the value and object representation are the same, but this is not yet
+    implemented.
+
+## Handling Parse::Tree errors (not yet implemented)
+
+`Parse::Tree` errors will typically indicate that checking would error for a
+given context. We'll want to be careful about how this is handled, but we'll
+likely want to generate diagnostics for valid child nodes, then reduce
+diagnostics once invalid nodes are encountered. We should be able to reasonably
+abandon generated IR of the valid children when we encounter an invalid parent,
+without severe effects on surrounding checks.
+
+For example, an invalid line of code in a function might generate some
+incomplete IR in the function's `SemIR::InstBlock`, but that IR won't negatively
+interfere with checking later valid lines in the same function.
+
+## Alternatives considered
+
+### Using a traditional AST representation
+
+Clang creates an AST as part of compilation. In Carbon, it's something we could
+do as a step between parsing and checking, possibly replacing the SemIR. It's
+likely that doing so would be simpler, amongst other possible trade-offs.
+However, we think the SemIR approach is going to yield higher performance,
+enough so that it's the chosen approach.

ファイルの差分が大きいため隠しています
+ 0 - 0
toolchain/docs/check.svg


+ 230 - 0
toolchain/docs/diagnostics.md

@@ -0,0 +1,230 @@
+# Diagnostics
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+-   [DiagnosticEmitter](#diagnosticemitter)
+-   [DiagnosticConsumers](#diagnosticconsumers)
+-   [Producing diagnostics](#producing-diagnostics)
+-   [Diagnostic registry](#diagnostic-registry)
+-   [CARBON_DIAGNOSTIC placement](#carbon_diagnostic-placement)
+-   [Diagnostic context](#diagnostic-context)
+-   [Diagnostic parameter types](#diagnostic-parameter-types)
+-   [Diagnostic message style guide](#diagnostic-message-style-guide)
+
+<!-- tocstop -->
+
+## Overview
+
+The diagnostic code is used by the toolchain to produce output.
+
+## DiagnosticEmitter
+
+[DiagnosticEmitters](/toolchain/diagnostics/diagnostic_emitter.h) handle the
+main formatting of a message. It's parameterized on a location type, for which a
+DiagnosticLocationTranslator must be provided that can translate the location
+type into a standardized DiagnosticLocation of file, line, and column.
+
+When emitting, the resulting formatted message is passed to a
+DiagnosticConsumer.
+
+## DiagnosticConsumers
+
+DiagnosticConsumers handle output of diagnostic messages after they've been
+formatted by an Emitter. Important consumers are:
+
+-   [ConsoleDiagnosticConsumer](/toolchain/diagnostics/diagnostic_emitter.h):
+    prints diagnostics to console.
+
+-   [ErrorTrackingDiagnosticConsumer](/toolchain/diagnostics/diagnostic_emitter.h):
+    counts the number of errors produced, particularly so that it can be
+    determined whether any errors were encountered.
+
+-   [SortingDiagnosticConsumer](/toolchain/diagnostics/sorting_diagnostic_consumer.h):
+    sorts diagnostics by line so that diagnostics are seen in terminal based on
+    their order in the file rather than the order they were produced.
+
+-   [NullDiagnosticConsumer](/toolchain/diagnostics/null_diagnostics.h):
+    suppresses diagnostics, particularly for tests.
+
+Note that `SortingDiagnosticConsumer` is used by default by `carbon compile`. In
+cases where one error leads to another error at an earlier location, for example
+if an error in a function call argument leads to an error in the function call,
+this can result in confusing diagnostic output where a consequence of the error
+is reported before the cause. Usually this should be handled by tracking that an
+error occurred and suppressing the follow-on diagnostic. During toolchain
+development, it can be useful to disable the sorting so that the diagnostic
+order matches the order in which the file was processed. This can be done using
+`carbon compile –stream-errors`.
+
+## Producing diagnostics
+
+Diagnostics are used to surface issues from compilation. A simple diagnostic
+looks like:
+
+```cpp
+CARBON_DIAGNOSTIC(InvalidCode, Error, "Code is invalid");
+emitter.Emit(location, InvalidCode);
+```
+
+Here, `CARBON_DIAGNOSTIC` defines a static instance of a diagnostic named
+`InvalidCode` with the associated severity (`Error` or `Warning`).
+
+The `Emit` call produces a single instance of the diagnostic. When emitted,
+`"Code is invalid"` will be the message used. The type of `location` depends on
+the `DiagnosticEmitter`.
+
+A diagnostic with an argument looks like:
+
+```cpp
+CARBON_DIAGNOSTIC(InvalidCharacter, Error, "Invalid character {0}.", char);
+emitter.Emit(location, InvalidCharacter, invalid_char);
+```
+
+Here, the additional `char` argument to `CARBON_DIAGNOSTIC` specifies the type
+of an argument to expect for message formatting. The `invalid_char` argument to
+`Emit` provides the matching value. It's then passed along with the diagnostic
+message format to `llvm::formatv` to produce the final diagnostic message.
+
+## Diagnostic registry
+
+There is a [registry](/toolchain/diagnostics/diagnostic_kind.def) which all
+diagnostics must be added to. Each diagnostic has a line like:
+
+```cpp
+CARBON_DIAGNOSTIC_KIND(InvalidCode)
+```
+
+This produces a central enumeration of all diagnostics. The eventual intent is
+to require tests for every diagnostic that can be produced, but that isn't
+currently implemented.
+
+## CARBON_DIAGNOSTIC placement
+
+Idiomatically, `CARBON_DIAGNOSTIC` will be adjacent to the `Emit` call. However,
+this is only because many diagnostics can only be produced in one code location.
+If they can be produced in multiple locations, they will be at a higher scope so
+that multiple `Emit` calls can reference them. When in a function,
+`CARBON_DIAGNOSTIC` should be placed as close as possible to the usage so that
+it's easier to see the associated output.
+
+## Diagnostic context
+
+Diagnostics can provide additional context for errors by attaching notes, which
+have their own location information. A diagnostic with a note looks like:
+
+```cpp
+CARBON_DIAGNOSTIC(CallArgCountMismatch, Error,
+                  "{0} argument(s) passed to function expecting "
+                  "{1} argument(s).",
+                  int, int);
+CARBON_DIAGNOSTIC(InCallToFunction, Note,
+                  "Calling function declared here.");
+context.emitter()
+    .Build(call_parse_node, CallArgCountMismatch, arg_refs.size(),
+           param_refs.size())
+    .Note(param_parse_node, InCallToFunction)
+    .Emit();
+```
+
+The error and the note are registered as two separate diagnostics, but a single
+overall diagnostic object is built and emitted, so that the error and the note
+can be treated as a single unit.
+
+Diagnostic context information can also be registered in a scope, so that all
+diagnostics produced in that scope attach a specific note. For example:
+
+```cpp
+DiagnosticAnnotationScope annotate_diagnostics(
+    &context.emitter(), [&](auto& builder) {
+      CARBON_DIAGNOSTIC(
+          InCallToFunctionParam, Note,
+          "Initializing parameter {0} of function declared here.", int);
+      builder.Note(param_parse_node, InCallToFunctionParam,
+                   diag_param_index + 1);
+    });
+```
+
+This is useful when delegating to another part of Check that may produce many
+different kinds of diagnostic.
+
+## Diagnostic parameter types
+
+Here are some types you might consider for the parameters to a diagnostic:
+
+-   `llvm::StringLiteral`. Note that we don't use `llvm::StringRef` to avoid
+    lifetime issues.
+-   `std::string`
+-   Carbon types `T` that implement `llvm::format_provider<T>` like:
+    -   `Lex::TokenKind`
+    -   `Lex::NumericLiteral::Radix`
+    -   `Parse::RelativeLocation`
+-   integer types: `int`, `uint64_t`, `int64_t`, `size_t`
+-   `char`
+-   Other
+    [types supported by llvm::formatv](https://llvm.org/doxygen/FormatVariadic_8h_source.html)
+
+## Diagnostic message style guide
+
+In order to provide a consistent experience, Carbon diagnostics should be
+written in the following style:
+
+-   Start diagnostics with a capital letter or quoted code, and end them with a
+    period.
+
+-   Quoted code should be enclosed in backticks, for example:
+    ``"`{0}` is bad."``
+
+-   Phrase diagnostics as bullet points rather than full sentences. Leave out
+    articles unless they're necessary for clarity.
+
+-   Diagnostics should describe the situation the toolchain observed and the
+    language rule that was violated, although either can be omitted if it's
+    clear from the other. For example:
+
+    -   `"Redeclaration of X."` describes the situation and implies that
+        redeclarations are not permitted.
+
+    -   ``"`self` can only be declared in an implicit parameter list."``
+        describes the language rule and implies that you declared `self`
+        somewhere else.
+
+    -   It's OK for a diagnostic to guess at the developer's intent and provide
+        a hint after explaining the situation and the rule, but not as a
+        substitute for that. For example,
+        ``"Add an `as String` cast to format this integer as a string."`` is not
+        sufficient as an error message, but
+        ``"Cannot add i32 to String. Add an `as String` cast to format this integer as a string."``
+        could be acceptable.
+
+-   TODO: Should diagnostics be atemporal and non-sequential ("multiple
+    declarations of X", "additional declaration here"), present tense but
+    sequential ("redeclaration of X", "previous declaration is here"), or
+    temporal ("redeclaration of X", "previous declaration was here")? We could
+    try to sidestep difference between the latter two by avoiding verbs with
+    tense ("previously declared here", "Y declared here", with no is/was).
+
+-   TODO: Word choices:
+
+    -   For disallowed constructs, do we say they're not permitted / not allowed
+        / not valid / not legal / illegal / ill-formed / disallowed? Do we say
+        "X cannot be Y" or "X may not be Y" or "X must not be Y" or "X shall not
+        be Y"?
+
+-   TODO: Is structuring diagnostics such that inputs can be parsed without
+    string parsing important? that is, when is passing strings in as part of the
+    message templating okay?
+
+-   TODO: When do we put identifiers or expressions in diagnostics, versus
+    requiring notes pointing at relevant code? Is it only avoided for values, or
+    only allowed for types?
+
+-   TODO: Lots more things to decide, give examples.

+ 22 - 0
toolchain/docs/driver.md

@@ -0,0 +1,22 @@
+# Driver
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+
+<!-- tocstop -->
+
+## Overview
+
+The driver provides commands and ties together the toolchain's flow. Running a
+command such as `carbon compile --phase=lower <file>` will run through the flow
+and print output. Several dump flags, such as `--dump-parse-tree`, print output
+in YAML format for easier parsing.

+ 424 - 0
toolchain/docs/idioms.md

@@ -0,0 +1,424 @@
+# Idioms
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+-   [C++ dialect](#c-dialect)
+-   [Abbreviations used in the code (AKA Carbon abbreviation decoder ring)](#abbreviations-used-in-the-code-aka-carbon-abbreviation-decoder-ring)
+-   [`.def` files](#def-files)
+    -   [EnumBase types](#enumbase-types)
+-   [Index types](#index-types)
+-   [ValueStore](#valuestore)
+-   [Template metaprogramming](#template-metaprogramming)
+    -   [Struct reflection](#struct-reflection)
+    -   [Field detection](#field-detection)
+-   [Local lambdas to reduce duplicate code](#local-lambdas-to-reduce-duplicate-code)
+-   [Immediately invoked function expressions (IIFE)](#immediately-invoked-function-expressions-iife)
+-   [Declarations in conditions](#declarations-in-conditions)
+-   [CRTP or "Curiously recurring template pattern"](#crtp-or-curiously-recurring-template-pattern)
+-   [Multiple inheritance](#multiple-inheritance)
+-   [Defining constants usable in constexpr contexts](#defining-constants-usable-in-constexpr-contexts)
+
+<!-- tocstop -->
+
+## Overview
+
+The toolchain implementation uses some implementation techniques that may not be
+commonly found in typical C++ code.
+
+## C++ dialect
+
+The toolchain implementation does not use some C++ features, following
+[Google's C++ style guide](https://google.github.io/styleguide/cppguide.html):
+
+-   [Exceptions](https://google.github.io/styleguide/cppguide.html#Exceptions)
+-   [Virtual base classes](https://google.github.io/styleguide/cppguide.html#Inheritance)
+-   [RTTI](https://google.github.io/styleguide/cppguide.html#Run-Time_Type_Information__RTTI_)
+
+## Abbreviations used in the code (AKA Carbon abbreviation decoder ring)
+
+Note that abbreviations are typically only used in code, not comments (except
+when referring to an entity from the code).
+
+-   **Addr**: "address"
+-   **Arg**: "argument"
+-   **Decl**: "declaration"
+-   **Expr**: "expression"
+    -   **SubExpr**: "subexpression"
+-   **Float**: "floating point"
+-   **Init**: "initialization"
+-   **Inst**: "instruction"
+-   **Int**: "integer"
+-   **Loc**: "location"
+-   **Param**: "parameter"
+-   **Paren**: "parenthesis"
+-   **Ref**: "reference"
+    -   **Deref**: "dereference"
+-   **Subst**: "substitute"
+
+Phrase abbreviations (where we have an abbreviation for a phrase, where we
+wouldn't perform all of the abbreviations of those words individually):
+
+-   **InitRepr**: "initializing representation"
+-   **ObjectRepr**: "object representation"
+-   **SemIR**: "semantics intermediate representation"
+-   **ValueRepr**: "value representation"
+
+## `.def` files
+
+The Carbon toolchain uses a technique related to
+[X-macros](https://en.wikipedia.org/wiki/X_macro) to generate code that operates
+over a collection of types, enumerators, or another similar list of names. This
+works as follows:
+
+-   A `.def` file is provided, that is intended to be repeatedly included by way
+    of `#include`.
+-   The user of the `.def` defines a macro, with a name and a form specified by
+    the `.def` file, for example
+    `#define CARBON_EACH_WIDGET(Name) Scope::Name,`.
+-   A `#include` of the `.def` file expands to `CARBON_EACH_WIDGET(Name1)`,
+    `CARBON_EACH_WIDGET(Name2)`, ... for each widget name, and then `#undef`s
+    the `CARBON_EACH_WIDGET` macro.
+
+For example:
+
+```cpp
+enum Widgets {
+#define CARBON_EACH_WIDGET(Name) Name,
+#include "widgets.def"
+}
+```
+
+... would expand to an enumeration definition with one enumerator per widget
+name.
+
+### EnumBase types
+
+Most `.def` files will have a corresponding [EnumBase](/common/enum_base.h)
+child class (if `widgets.def` has X-macros, `widgets.h` and `widgets.cpp` has
+the `EnumBase` child class). These work similarly to an `enum class`, with the
+addition of a `name()` function and `<<` stream operator support. Many also have
+further utility functions for information related to the enum value.
+
+In code, these types and values can be used directly in a `switch`. They will
+convert to an internal _actual_ `enum class` for the `switch`, and receive
+corresponding compiler safety checks that all enum values are handled.
+
+## Index types
+
+Carbon makes frequent use of
+[IndexBase and IdBase](/toolchain/base/index_base.h). The `IndexBase` and
+`IdBase` types are small wrappers around `int32_t` to provide a measure of
+type-checking when passing around indices to vector-like storage types. The only
+difference is that `IndexBase` supports all comparison operators, whereas
+`IdBase` only supports equality comparison.
+
+Variable naming will often have `_id` at the end to indicate that it corresponds
+to an `IdBase`. This may include the full type, as in `operand_inst_id` being an
+`InstId` for an operand.
+
+A block is an array of ids. These will be indicated with either a `_block`
+suffix or pluralization (for example, `param_refs` pluralizing `refs`).
+
+The `ref` concept in a name means that there is an underlying instruction block,
+but only a subset of instructions are present in the `refs` block. For example,
+function parameters have a sequence, and also have a `refs` block with one entry
+per parameter. The `refs` block allows parameters to be counted and accessed
+directly, rather than through vector iteration.
+
+## ValueStore
+
+Many of Carbon's data types are stored in a
+[ValueStore](/toolchain/base/value_store.h) or related type with similar
+semantics (`sem_ir` has [several such classes](/toolchain/base/value_store.h)).
+`ValueStore` links an indexing type to a value type with vector-like storage.
+The indices typically use `IdBase`.
+
+`ValueStore`s APIs follow the shape of simple array access and mutation:
+
+-   `Add` which takes a value and returns the index.
+-   `Set` which takes a value and index to modify.
+-   `Get` takes an index and returns a reference to the value (possibly a
+    constant reference).
+-   Other vector-like functionality, including `size` or `Reserve`
+
+ValueStores should be named after the type they contain. The index type used on
+the value store should have a `using ValueType...` which indicates the stored
+type. When taking a return of one of these functions, it's common to use `auto`
+and rely on the name of the storage type to imply the returned type.
+
+Some name mirroring examples are:
+
+-   `ints` is a `ValueStore<IntId>`, which has an index type of `IntId` and a
+    value type of `llvm::APInt`.
+
+-   `functions` is a `ValueStore<SemIR::FunctionId>`, which has an index type of
+    `SemIR::FunctionId` and a value type of `SemIR::` `Function`.
+
+-   `strings` is a `ValueStore<StringId>`, which has an index type of
+    `StringId`, but for copy-related reasons, uses `llvm::StringRef` for values.
+
+A fairly complete list of `ValueStore` uses should be available on
+[checking's Context class](https://github.com/search?q=repository%3Acarbon-language%2Fcarbon-lang%20path%3Acheck%2Fcontext.h%20symbol%3Aidentifiers&type=code).
+
+## Template metaprogramming
+
+FIXME: show example patterns
+
+-   TypedInstArgsInfo from toolchain/sem_ir/inst.h
+-   templated using
+-   std::declval
+-   decltype
+-   static_assert
+-   if constexpr
+-   template specialization, for example `Inst::FromRaw<T>` (maybe also type
+    traits?)
+
+### Struct reflection
+
+The toolchain uses a primitive form of struct reflection to operate generically
+over the fields in a typed `SemIR` instruction. This is implemented in
+`common/struct_reflection.h`, and the interface to the functionality is
+`StructReflection::AsTuple(your_struct)`, which converts the given struct into a
+`std::tuple` containing the same fields in the same order.
+
+### Field detection
+
+The presence of specific fields in a struct with a specified type is detected
+using the following idiom:
+
+```cpp
+template <typename T, typename = FieldType T::*>
+constexpr bool HasField = false;
+template <typename T>
+constexpr bool HasField<T, decltype(&T::field)> = true;
+```
+
+This is intended to check the same property as the following concept, which we
+can't use because we currently need to compile in C++17 mode:
+
+```cpp
+template <typename T> concept HasField = requires (T x) {
+  { x.field } -> std::same_as<FieldType>;
+};
+```
+
+To detect a field with a specific name with a type derived from a specified base
+type, use this idiom:
+
+```cpp
+// HasField<T> is true if T has a `U field` field,
+// where `U` extends `BaseClass`.
+template <typename T, bool Enabled = true>
+inline constexpr bool HasField = false;
+template <typename T>
+inline constexpr bool HasField<
+    T, bool(std::is_base_of_v<BaseClass, decltype(T::field)>)> = true;
+```
+
+The equivalent concept is:
+
+```cpp
+template <typename T> concept HasField = requires (T x) {
+  { x.field } -> std::derived_from<BaseClass>;
+};
+```
+
+## Local lambdas to reduce duplicate code
+
+Sometimes code that would be repeated in a function is factored into a local
+variable containing a
+[lambda](https://en.cppreference.com/w/cpp/language/lambda):
+
+```cpp
+auto common_code = [&](AType param1, AnotherType param2) {
+  // code that would otherwise be repeated
+  ...
+}
+if (something) {
+  common_code(...);
+}
+if (something_else) {
+  common_code(...)
+}
+```
+
+Compared to defining a new function, this has the advantage of being able to be
+declared in context and access the local variables of the enclosing function.
+
+## Immediately invoked function expressions (IIFE)
+
+Instead of creating a separate function with its own name that will be called
+once to produce the initial value for a variable, the function can be declared
+inline and then immediately called.
+
+This can be used for complex initialization, as in:
+
+```cpp
+// variable declaration
+static const llvm::ArrayRef<std::byte> entropy_bytes =
+// initializer starts with a lambda
+    []() -> llvm::ArrayRef<std::byte> {
+  static llvm::SmallVector<std::byte> bytes;
+
+  // a bunch of code
+
+  // return the value to initialize the variable with
+  return bytes;
+
+// finish defining the lambda, and then immediately invoke it
+}();
+```
+
+It can also be used inside a `CARBON_DCHECK` to avoid computation that is only
+needed in debug builds:
+
+```cpp
+CARBON_DCHECK([&] {
+  // a bunch of code
+
+  // condition that will be tested by CARBON_DCHECK
+  return complicated && multiple_parts;
+
+// finish defining the lambda, and then immediately invoke it
+}()) << "Complicated things went wrong";
+```
+
+See a description of this technique on
+[wikipedia](https://en.wikipedia.org/wiki/Immediately_invoked_function_expression).
+
+## Declarations in conditions
+
+The condition part of an `if` statement may contain a declaration with an
+initializer followed by a semicolon (`;`) and then the proper boolean condition
+expression, as in:
+
+```cpp
+if (auto verify = tree.Verify(); !verify.ok()) {
+```
+
+The condition can be replaced by a declaration entirely, as in:
+
+```cpp
+if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal)) {
+// Equivalent to:
+if (auto equals = context.ConsumeIf(Lex::TokenKind::Equal); equals) {
+```
+
+or
+
+```cpp
+if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>()) {
+// Equivalent to:
+if (auto literal = bound_inst.TryAs<SemIR::IntegerLiteral>(); literal) {
+```
+
+This is a common way of handling a function that returns an optional value.
+
+See
+[https://en.cppreference.com/w/cpp/language/if](https://en.cppreference.com/w/cpp/language/if)
+
+## CRTP or "Curiously recurring template pattern"
+
+[Curiously Recurring Template Pattern - cppreference.com](https://en.cppreference.com/w/cpp/language/crtp)
+
+[Curiously recurring template pattern - Wikipedia](https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern)
+
+[Google search](https://www.google.com/search?q=crtp+c%2B%2B)
+
+Examples:
+
+-   `template <typename DerivedT, ...>` in [enum_base.h](/common/enum_base.h)
+-   `template <typename DerivedT>` in [ostream.h](/common/ostream.h)
+
+## Multiple inheritance
+
+We use multiple inheritance to support uses of
+[CRTP](#crtp-or-curiously-recurring-template-pattern).
+
+Example:
+
+```cpp
+struct NameScopeId : public IndexBase, public Printable<NameScopeId> {
+```
+
+## Defining constants usable in constexpr contexts
+
+To declare a constant usable at compile time in `constexpr` contexts as a static
+class member, we use this pattern:
+
+Declaration:
+
+```cpp
+class Foo {
+  // ...
+  static const std::array<ElementType, ElementCount> MyTable;
+  static constexpr auto ComputeMyTable()
+      -> std::array<ElementType, ElementCount> { ... }
+};
+```
+
+Definition:
+
+```cpp
+constexpr std::array<ElementType, ElementCount>
+    Foo::MyTable = Foo::ComputeMyTable();
+```
+
+Note the `const` on the declaration does not match the `constexpr` on
+definition, and that the definition is outside of the class body. This allows
+the initializer to depend on the definition of the class.
+
+Further note that this only works with static members of classes, not static
+variables in functions.
+
+Due to [a Clang bug](https://github.com/llvm/llvm-project/issues/85461), this
+technique does not work in a class template. The following pattern can be used
+instead:
+
+```cpp
+template <typename T>
+class Foo {
+  // ...
+  template <typename Self = Foo>
+  static constexpr auto MyValueImpl = Self();
+  static constexpr const Foo& MyValue = MyValueImpl<>;
+  // ...
+};
+```
+
+The parameters of the variable template can be chosen to allow reuse of the same
+variable template for multiple static data members.
+
+Examples:
+
+-   `NodeStack::IdKindTable` in
+    [check/node_stack.h](/toolchain/check/node_stack.h)
+-   `BuiltinKind::ValidCount` in
+    [sem_ir/builtin_inst_kind.h](/toolchain/sem_ir/builtin_inst_kind.h)
+
+A global constant may use a single definition without a separate declaration:
+
+```cpp
+static constexpr std::array<bool, 256> IsIdStartByteTable = [] {
+  std::array<bool, 256> table = {};
+  // ...
+  return table;
+}();
+```
+
+Note this example is using an
+[immediately invoked function expression](#immediately-invoked-function-expressions-iife)
+to compute the initial value, which is common.
+
+Examples:
+
+-   [lex/lex.cpp](/toolchain/lex/lex.cpp)

+ 44 - 0
toolchain/docs/lex.md

@@ -0,0 +1,44 @@
+# Lex
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+-   [Bracket matching](#bracket-matching)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Bracket matching in parser](#bracket-matching-in-parser)
+
+<!-- tocstop -->
+
+## Overview
+
+Lexing converts input source code into tokenized output. Literals, such as
+string literals, have their value parsed and form a single token at this stage.
+
+## Bracket matching
+
+The lexer handles matching for `()`, `[]`, and `{}`. When a bracket lacks a
+match, it will insert a "recovery" token to produce a match. As a consequence,
+the lexer's output should always have matched brackets, even with invalid code.
+
+While bracket matching could use hints such as contextual clues from
+indentation, that is not yet implemented.
+
+## Alternatives considered
+
+### Bracket matching in parser
+
+Bracket matching could have also been implemented in the parser, with some
+awareness of parse state. However, that would shift some of the complexity of
+recovery in other error situations, such as where the parser searches for the
+next comma in a list. That needs to skip over bracketed ranges. We don't think
+the trade-offs would yield a net benefit, so any change in this direction would
+need to show concrete improvement, for example better diagnostics for common
+issues.

+ 25 - 0
toolchain/docs/lower.md

@@ -0,0 +1,25 @@
+# Lower
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+
+<!-- tocstop -->
+
+## Overview
+
+Lowering takes the SemIR and produces LLVM IR. At present, this is done in a
+single pass, although it's possible we may need to do a second pass so that we
+can first generate type information for function arguments.
+
+Lowering is done per `SemIR::InstBlock`. This minimizes changes to the
+`IRBuilder` insertion point, something that is both expensive and potentially
+fragile.

+ 802 - 0
toolchain/docs/parse.md

@@ -0,0 +1,802 @@
+# Parse
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Overview](#overview)
+-   [Parse stack](#parse-stack)
+-   [Postorder tree](#postorder-tree)
+-   [Bracketing inside the tree](#bracketing-inside-the-tree)
+-   [Visual example](#visual-example)
+-   [Handling invalid parses](#handling-invalid-parses)
+-   [How is this accomplished?](#how-is-this-accomplished)
+    -   [Introducer](#introducer)
+    -   [Optional modifiers before an introducer](#optional-modifiers-before-an-introducer)
+    -   [Something required in context](#something-required-in-context)
+    -   [Optional clauses](#optional-clauses)
+        -   [Case 1: introducer to optional clause is used as parent node](#case-1-introducer-to-optional-clause-is-used-as-parent-node)
+        -   [Case 2: parent node is required token after optional clause, with different parent node kinds for different options](#case-2-parent-node-is-required-token-after-optional-clause-with-different-parent-node-kinds-for-different-options)
+        -   [Case 3: optional sibling](#case-3-optional-sibling)
+    -   [Operators](#operators)
+
+<!-- tocstop -->
+
+## Overview
+
+Parsing uses tokens to produce a parse tree that faithfully represents the tree
+structure of the source program, interpreted according to the Carbon grammar. No
+semantics are associated with the tree structure at this level, and no name
+lookup is performed.
+
+The parse tree's structure corresponds to the grammar of the Carbon language. On
+valid input, there will be a 1:1 correspondence between parse tree nodes and
+tokens.
+
+A parse tree is considered _structurally valid_ if all nodes have the number of
+children that their node kind requires. On invalid input, nodes may be added
+that don't correspond to a token to maintain a structurally valid parse tree.
+When a parse tree node is marked as having an error, it will still be
+structurally valid, but its children may not match a valid grammar. Code trying
+to handle children of erroneous nodes must be prepared to handle atypical
+structures, but it may still be helpful for tools such as syntax highlighters or
+refactoring tools.
+
+In general, we favor doing the checking for whether something is allowed _in a
+particular context_ in [the check stage](check.md) instead of the parse stage,
+unless the context is very local. This is for a few reasons:
+
+-   We anticipate that the parse stage will be used to operate on invalid code
+    while still preserving as much of the intent of the author as possible, for
+    example in an IDE or a code formatter.
+-   To keep as much code out of the parse stage as possible, so it is simple and
+    fast.
+-   We are building all the infrastructure to keep track of context in the check
+    stage.
+
+These reasons explain what local context is okay: where we already have the
+contextual information at hand so there is no performance cost, and we can
+output a parse tree that still captures faithfully what the user wrote.
+Examples:
+
+-   All declaration modifiers are allowed in any order on any declaration in the
+    parse stage. Diagnosing duplicated modifiers, modifiers that conflict with
+    other modifiers, or modifiers that can't be used on a particular declaration
+    is postponed until the check stage.
+-   Rejecting a keyword after `fn` where a name is expected is done at the parse
+    stage.
+
+## Parse stack
+
+The core parser loop is `Parse::Tree::Parse`. In the loop, it pops the next
+state off the stack, and dispatches to the appropriate `Handle` function.
+
+A typical handler function pops the state first, leaving the stack ready for the
+next state. It may add nodes to the parse tree, based on the current code. If it
+needs to trigger other states, it will push them onto the stack; because it's a
+stack, the _next_ state is always pushed _last_.
+
+Operator expressions store information about current operator precedence in the
+stack as well. While this isn't necessary for most parser states, and could be
+stored separately, it's currently together because it has no impact on the size
+of a stack entry and is thus more efficient to store in one place.
+
+## Postorder tree
+
+The parse tree's storage layout is in postorder. For example, given the code:
+
+```carbon
+fn foo() -> f64 {
+  return 42;
+}
+```
+
+The node order is (with indentation to indicate nesting):
+
+<!-- Prevent prettier from changing indents. -->
+<!-- prettier-ignore-start -->
+
+```yaml
+[
+  {kind: 'FileStart', text: ''},
+      {kind: 'FunctionIntroducer', text: 'fn'},
+      {kind: 'Name', text: 'foo'},
+        {kind: 'ParamListStart', text: '('},
+      {kind: 'ParamList', text: ')', subtree_size: 2},
+        {kind: 'Literal', text: 'f64'},
+      {kind: 'ReturnType', text: '->', subtree_size: 2},
+    {kind: 'FunctionDefinitionStart', text: '{', subtree_size: 7},
+      {kind: 'ReturnStatementStart', text: 'return'},
+      {kind: 'Literal', text: '42'},
+    {kind: 'ReturnStatement', text: ';', subtree_size: 3},
+  {kind: 'FunctionDefinition', text: '}', subtree_size: 11},
+  {kind: 'FileEnd', text: ''},
+]
+```
+
+<!-- prettier-ignore-end -->
+
+In this example, `FileStart`, `FunctionDefinition`, and `FileEnd` are "root"
+nodes for the tree. Function components are children of `FunctionDefinition`.
+
+It's produced in this way because it's an efficient layout to produce with
+vectorized storage, requiring little context to be maintained during parsing.
+Because it's stored in postorder, it's also most efficient to process the parsed
+output in postorder; this affects checking.
+
+The parse tree is printed in postorder by default because it matches how the
+parse tree is expected to be processed within the toolchain , and so can make it
+easier to reason about. However, the `--preorder` flag may be used in contexts
+where a preorder representation would be easier to handle.
+
+## Bracketing inside the tree
+
+The parse tree is designed to be walked in postorder by checking, allowing
+checking to be more efficient. To support this, checking sometimes requires
+context on the meaning of a node when it is encountered.
+
+Each `ParseNodeKind` has either a bracketing node, or a specific child count.
+This helps document and enforce the expected tree structure.
+
+When a bracketing node is indicated, it is the opening bracket: it will always
+be the first child of the parent, and that will be the only time it occurs in
+the parent's children (it may still occur in children of children). When
+checking encounters the opening bracket, this means it can make contextual
+decisions for the later children of the node.
+
+Nodes can also have a specific child count, for example, infix operators always
+have two children: the lhs and rhs expressions. Many nodes have a child count of
+0; this just means they're leaf nodes, and will never have children.
+
+Because the tree structure is always valid, these are treated as contracts. Some
+nodes exist only to be used to construct valid tree structures for invalid
+input, such as `StructFieldUnknown`.
+
+Although each subtree's size is also tracked as part of the node, we're
+currently trying to avoid relying on it and may eliminate it if it turns out to
+be unnecessary and a meaningful cost for the compiler.
+
+## Visual example
+
+To try to explain the transition from code to Parse Tree, consider the
+statement:
+
+```carbon
+var x: i32 = y + 1;
+```
+
+Lexing creates distinct tokens for each syntactic element, which will form the
+basis of the parse tree:
+
+<pre>
+<b>Tokens:</b>
+
++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+</pre>
+
+First the `var` keyword is used as a "bracketing" node (VariableIntroducer).
+When this is seen in a postorder traversal, it tells us to expect the basics of
+a variable declaration structure.
+
+<pre>
+<b>Tokens:</b>
+
+        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+        |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
+        +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+
+<b>Parse tree:</b>
+
+
+
+
+
+
+
++-----+
+| var |
++-----+
+
+
+
+
+
+
+</pre>
+
+Next, we can consider the pattern binding. Here, `x` is the identifier and `i32`
+is the type expression. The `:` provides a parent node that must always contain
+two children, the name and type expression. Because it always has two direct
+children, it doesn't need to be bracketed.
+
+<pre>
+<b>Tokens:</b>
+
+                                +-----+ +-----+ +-----+ +-----+ +-----+
+                                |  =  | |  y  | |  +  | |  1  | |  ;  |
+                                +-----+ +-----+ +-----+ +-----+ +-----+
+
+<b>Parse tree:</b>
+
+        +-----+ +-----+
+        |  x  | | i32 |
+        +-----+ +-----+
+           |       |
+           +-------+-------+
+                           |
++-----+                 +-----+
+| var |                 |  :  |
++-----+                 +-----+
+
+
+
+
+
+
+</pre>
+
+We use the `=` as a separator (instead of a node with children like `:`) to help
+indicate the transition from binding to assignment expression, which is
+important for expression parsing during checking.
+
+<pre>
+<b>Tokens:</b>
+
+                                        +-----+ +-----+ +-----+ +-----+
+                                        |  y  | |  +  | |  1  | |  ;  |
+                                        +-----+ +-----+ +-----+ +-----+
+
+<b>Parse tree:</b>
+
+        +-----+ +-----+
+        |  x  | | i32 |
+        +-----+ +-----+
+           |       |
+           +-------+-------+
+                           |
++-----+                 +-----+ +-----+
+| var |                 |  :  | |  =  |
++-----+                 +-----+ +-----+
+
+
+
+
+
+
+</pre>
+
+The expression is a subtree with `+` as the parent, and the two operands as
+child nodes.
+
+<pre>
+<b>Tokens:</b>
+
+                                                                +-----+
+                                                                |  ;  |
+                                                                +-----+
+
+<b>Parse tree:</b>
+
+        +-----+ +-----+                 +-----+ +-----+
+        |  x  | | i32 |                 |  y  | |  1  |
+        +-----+ +-----+                 +-----+ +-----+
+           |       |                       |       |
+           +-------+-------+               +-------+-------+
+                           |                               |
++-----+                 +-----+ +-----+                 +-----+
+| var |                 |  :  | |  =  |                 |  +  |
++-----+                 +-----+ +-----+                 +-----+
+
+
+
+
+
+
+</pre>
+
+Finally, the `;` is used as the "root" of the variable declaration. It's
+explicitly tracked as the `;` for a variable declaration so that it's
+unambiguously bracketed by `var`.
+
+<pre>
+<b>Tokens:</b>
+
+
+
+
+
+<b>Parse tree:</b>
+
+        +-----+ +-----+                 +-----+ +-----+
+        |  x  | | i32 |                 |  y  | |  1  |
+        +-----+ +-----+                 +-----+ +-----+
+           |       |                       |       |
+           +-------+-------+               +-------+-------+
+                           |                               |
++-----+                 +-----+ +-----+                 +-----+
+| var |                 |  :  | |  =  |                 |  +  |
++-----+                 +-----+ +-----+                 +-----+
+   |                       |       |                       |
+   +-----------------------+-------+-----------------------+-------+
+                                                                   |
+                                                                +-----+
+                                                                |  ;  |
+                                                                +-----+
+</pre>
+
+This is the completed parse tree.
+
+In storage, this tree will be flat and in postorder. Because the order hasn't
+changed much from the original code, we can do the reordering for postorder with
+a minimal number of nodes being delayed for later output: it will be linear with
+respect to the depth of the parse tree.
+
+<pre>
+<b>Tokens:</b>
+
++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+| var | |  x  | |  :  | | i32 | |  =  | |  y  | |  +  | |  1  | |  ;  |
++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+
+<b>Parse tree:</b>
+
+        +-----+ +-----+                 +-----+ +-----+
+        |  x  | | i32 |                 |  y  | |  1  |
+        +-----+ +-----+                 +-----+ +-----+
+           |       |                       |       |
+           +-------+-------+               +-------+-------+
+                           |                               |
++-----+                 +-----+ +-----+                 +-----+
+| var |                 |  :  | |  =  |                 |  +  |
++-----+                 +-----+ +-----+                 +-----+
+   |                       |       |                       |
+   +-----------------------+-------+-----------------------+-------+
+                                                                   |
+                                                                +-----+
+                                                                |  ;  |
+                                                                +-----+
+
+<b>Flattened for storage:</b>
+
++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+| var | |  x  | | i32 | |  :  | |  =  | |  y  | |  1  | |  +  | |  ;  |
++-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ +-----+
+</pre>
+
+The structural concepts of bracketing nodes (`var` and `;`) and parent nodes
+with a known child count (`:` and `+` with 2 children, but also `=` with 0
+children) will allow checking to reconstruct the tree as it encounters nodes
+during the postorder.
+
+There are other structures that could have been used here, such as `=` being
+parent of the `var` and pattern nodes, and `;` being the parent of the `=` and
+assignment expression nodes. In that example alternative, the storage order
+would be the same; it would only change the tree representation. The current
+structure is influenced by choices in checking.
+
+## Handling invalid parses
+
+On an invalid parse, the output tree should still try to mirror the intended
+tree structure when possible. There's a balance here, and it's not expected to
+try too hard to make things correct, but outputting nodes is preferred. There
+are `InvalidParse` nodes which may be used to provide a node when the planned
+node kind is too difficult to get correct child counts (bracketed subtrees may
+not need an `InvalidParse` node).
+
+When marking a child node with `has_error=true`, parent nodes may also be marked
+with `has_error=true`, but try to be conservative about this. As a rule of
+thumb, if checking could continue on a parent node without needing the child
+node to be fully checked (possibly with incomplete information), then the parent
+node should not be marked as `has_error=true`. The goal remains providing
+something similar to a well-formed parse tree.
+
+In general, a parent node must have the immediate children described in
+[parse/typed_nodes.h](/toolchain/parse/typed_nodes.h), unless it is marked
+`has_error=true`. If this is violated for a particular parse tree, an error will
+be raised in `Tree::Verify`. Note that an `InvalidParse` node is allowed as a
+declaration or expression, and an `InvalidParseSubtree` is allowed as a
+declaration. These invalid nodes can be added to more node categories as needed.
+
+Child states may indicate an error to their parent using `ReturnErrorOnState`.
+This is particularly intended for when a child state emits a diagnostic, to
+prevent the parent state from emitting redundant diagnostics; for example, an
+invalid expression might have more invalid tokens following it, and the parent
+might skip those without emitting diagnostics.
+
+## How is this accomplished?
+
+The specific approach to producing the desired tree depends on the kind of
+grammar rule being implemented, as well as the desired output tree structure.
+
+### Introducer
+
+**Example:** `if (c) { ... }`
+
+Here `if` is the introducer. Many other possible introducers could occur in that
+position, such as `while` or `var`, and we want to dispatch based on which token
+is present. See
+[parse/handle_statement.cpp](/toolchain/parse/handle_statement.cpp).
+
+The first step is to identify the introducer token, typically using a `switch`
+or `if` on the `Lex::TokenKind` at the current position:
+
+```cpp
+switch (context.PositionKind()) {
+  case Lex::TokenKind::___: {
+    ...
+    break;
+  }
+  ...
+}
+```
+
+There should be a `default:` (or `else`) case so every kind of token is handled.
+This may be an error, in which case:
+
+-   A [diagnostic](diagnostics.md) should be emitted.
+
+-   An invalid parse node should be added, using something like:
+
+    ```cpp
+    context.AddLeafNode(NodeKind::InvalidParse, context.Consume(),
+                        /*has_error=*/true);
+    ```
+
+-   At least one node should be consumed, particularly if it will continue with
+    this state at this position, to avoid an infinite loop.
+
+The default case may also be delegated to another state. For example, in the
+state where a statement is expected, if no keyword introducer is recognized, it
+switches to the expression-statement state.
+
+Depending on the introducer, different actions can be taken. The most common
+case is to:
+
+-   Call `context.PushState(State::___);` to mark the beginning of the statement
+    or declaration and indicate the state that will handle the tokens after the
+    introducer.
+
+-   Call `context.AddLeafNode(NodeKind::___, context.Consume());` to output a
+    bracketing node for this introducer.
+
+The next state can then add sibling nodes until it gets to the end of the
+declaration or statement. The last token, often a semicolon `;`, is used as a
+parent node to match the bracketing node of the introducer.
+
+If the introducer token won't be used as a bracketing node, it can be
+temporarily skipped after `context.PushState` by calling
+`context.ConsumeAndDiscard()` instead of `context.AddLeafNode`. It must be added
+to the output tree as a node by some later state, unless an error occurs. For
+example, a `for` statement uses the `for` token as the root of the tree -- it
+doesn't need a bracketing node since it has a fixed child count. Note that the
+token was saved when the state was pushed, and can be retrieved when adding a
+node as in this example:
+
+```cpp
+auto state = context.PopState();
+context.AddNode(NodeKind::ForStatement, state.token, state.subtree_start,
+                state.has_error);
+```
+
+If this state is for an element of a scope like the statements in a code block,
+most introducer tokens indicate that the current state should be repeated, to
+handle the next statement, but some other token, like a close curly brace (`}`)
+means that the state should be exited.
+
+### Optional modifiers before an introducer
+
+**Example:** `virtual fn Foo();`
+
+Here `fn` is the introducer, and `virtual` is an optional modifier that appears
+before. See
+[parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp).
+
+Use this pattern when the goal is to produce a subtree that starts with the
+introducer as a bracketing node, as in the previous case, followed by nodes for
+any modifiers. Note that bracketing is needed here, since the optional modifier
+nodes mean that there is not a fixed child count for the parent node. This means
+shuffling the introducer node before an unknown number of modifier nodes. This
+is accomplished by emitting a placeholder node for the introducer, processing
+all the modifiers until reaching the introducer, filling in the placeholder with
+the information about the introducer, and then finishing the rest of the
+declaration or statement.
+
+-   **Step 1**: Save the current value of `context.tree().size()`. This could be
+    accomplished by calling `context.PushState()`, which saves that value in the
+    `subtree_start` field of `Context::StateStackEntry`; or by constructing a
+    `Context::StateStackEntry` value directly, as is done in
+    [parse/handle_decl_scope_loop.cpp](/toolchain/parse/handle_decl_scope_loop.cpp).
+    This marks the position of the placeholder node we are going to replace, as
+    well as the beginning of the subtree we are eventually going to emit for
+    this declaration or statement.
+
+-   **Step 2**: Emit the placeholder node using
+    `context.AddLeafNode(NodeKind::Placeholder, *context.position());`. The
+    `NodeKind` and `Lex::TokenIndex` values will be overwritten later.
+
+-   **Step 3**: Process tokens until we hit the introducer. All of the nodes we
+    emit at this point will appear as siblings after the introducer token in the
+    output tree.
+
+-   **Step 4 - success**: If an introducer token is found, replace the
+    placeholder node using something like:
+
+    ```cpp
+    context.ReplacePlaceholderNode(state.subtree_start, introducer_kind,
+                                   context.Consume());
+    ```
+
+    -   `state.subtree_start` is the value of `context.tree().size()` saved in
+        step 1, which marks the position of the placeholder node in the output
+        parse tree.
+
+    -   `introducer_kind` is the `NodeKind` for the introducer of this
+        declaration or statement, a leaf node that will act as a bracketing node
+        at the beginning of the subtree for this declaration or statement
+
+-   **Step 4 - error**: If we run into something other than a modifier or
+    introducer before finding an introducer, we need to do error handling:
+
+    ```cpp
+    context.ReplacePlaceholderNode(subtree_start, NodeKind::InvalidParseStart,
+                                   *context.position(), /*has_error=*/true);
+    ```
+
+    -   Emit a [diagnostic](diagnostics.md).
+
+    -   Replace the placeholder node (similar to step 4) with an
+        `InvalidParseStart` node. It will be associated with the unexpected
+        token that triggered this error.
+
+    -   Consume input token up to the likely end of the end of the current
+        statement or declaration. For example, we might consume up to a `;` or a
+        token at a lesser indent level using `context.SkipPastLikelyEnd(...)`.
+        It is important that we consume at least one token in the error case,
+        otherwise we could have an infinite loop of generating the same error on
+        the same token.
+
+    -   Emit a `InvalidParseSubtree` node. This will be the parent of any
+        emitted modifier nodes, and will be bracketed by the `InvalidParseStart`
+        node emitted above. It should be associated with the last token
+        consumed.
+
+        ```cpp
+        // Set `iter` to the last token consumed, one before the current position.
+        auto iter = context.position();
+        --iter;
+        context.AddNode(NodeKind::InvalidParseSubtree, *iter, subtree_start,
+                        /*has_error=*/true);
+        ```
+
+-   **Step 5**: (If success at step 4) Push whatever states are to be used to
+    parse the rest of the declaration. The first state pushed (the last state to
+    be processed) will handle the end of this declaration. That pushed state
+    should have a `subtree_start` field set to the value of
+    `context.tree().size()` saved in step 1.
+
+-   **Step 6**: When handling the state for the end of the declaration, emit the
+    root node of subtree:
+
+    ```cpp
+    state = context.PopState();
+    context.AddNode(NodeKind::___, context.Consume(),
+                    state.subtree_start, state.has_error);
+    ```
+
+    -   This `state.subtree_start` will mark everything since the bracketing
+        introducer node as the children of this node.
+
+### Something required in context
+
+FIXME
+
+Example: name after introducer
+[parse/handle_decl_name_and_params.cpp](/toolchain/parse/handle_decl_name_and_params.cpp)
+
+Example: "`[` _implicit parameter list_ `]`" after `impl forall`
+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp)
+
+### Optional clauses
+
+#### Case 1: introducer to optional clause is used as parent node
+
+**Example:** The optional `-> <return type expression>` in a function signature
+uses this pattern, so `fn foo() -> u32;` is transformed to:
+
+```yaml
+  {kind: 'FunctionIntroducer', text: 'fn'},
+  {kind: 'IdentifierName', text: 'foo'},
+    {kind: 'TuplePatternStart', text: '('},
+  {kind: 'TuplePattern', text: ')', subtree_size: 2},
+    {kind: 'UnsignedIntTypeLiteral', text: 'u32'},
+  {kind: 'ReturnType', text: '->', subtree_size: 2},
+{kind: 'FunctionDecl', text: ';', subtree_size: 7},
+```
+
+Note how the `->` token becomes a `ReturnType` node in the output tree, and is
+moved after the `u32` type expression that becomes its child. Compare with the
+parse tree output for `fn foo();` which has no `ReturnType` node:
+
+```yaml
+  {kind: 'FunctionIntroducer', text: 'fn'},
+  {kind: 'IdentifierName', text: 'foo'},
+    {kind: 'TuplePatternStart', text: '('},
+  {kind: 'TuplePattern', text: ')', subtree_size: 2},
+{kind: 'FunctionDecl', text: ';', subtree_size: 5},
+```
+
+Here is the code from
+[parse/handle_function.cpp](/toolchain/parse/handle_function.cpp) that does
+this:
+
+```cpp
+auto HandleFunctionAfterParams(Context& context) -> void {
+  ...
+  // If there is a return type, parse the expression before adding the return
+  // type node.
+  if (context.PositionIs(Lex::TokenKind::MinusGreater)) {
+    context.PushState(State::FunctionReturnTypeFinish);
+    context.ConsumeAndDiscard();
+    context.PushStateForExpr(PrecedenceGroup::ForType());
+  }
+}
+
+auto HandleFunctionReturnTypeFinish(Context& context) -> void {
+  auto state = context.PopState();
+
+  context.AddNode(NodeKind::ReturnType, state.token, state.subtree_start,
+                  state.has_error);
+}
+```
+
+The `->` token is saved by `context.PushState(`...`)`, so it is available as
+`state.token` when calling
+`context.AddNode(NodeKind::ReturnType, state.token,`...`)` later in
+`HandleFunctionReturnTypeFinish`.
+
+Also see how the optional initializer is handled on `var`, treating the `=` as
+its introducer in `HandleVarAfterPattern` and `HandleVarInitializer` in
+[parse/handle_var.cpp](/toolchain/parse/handle_var.cpp).
+
+#### Case 2: parent node is required token after optional clause, with different parent node kinds for different options
+
+**Example:** The optional type expression before `as` in `impl as` is
+represented by producing two different output parse nodes for `as`. It outputs a
+`DefaultSelfImplAs` node with no children when the type expression is absent,
+and otherwise a `TypeImplAs` parse node with the type expression as its child.
+
+So `impl bool as Interface;` is transformed to:
+
+```yaml
+  {kind: 'ImplIntroducer', text: 'impl'},
+    {kind: 'BoolTypeLiteral', text: 'bool'},
+  {kind: 'TypeImplAs', text: 'as', subtree_size: 2},
+  {kind: 'IdentifierNameExpr', text: 'Interface'},
+{kind: 'ImplDecl', text: ';', subtree_size: 5},
+```
+
+while `impl as Interface;` is transformed to:
+
+```yaml
+  {kind: 'ImplIntroducer', text: 'impl'},
+  {kind: 'DefaultSelfImplAs', text: 'as'},
+  {kind: 'IdentifierNameExpr', text: 'Interface'},
+{kind: 'ImplDecl', text: ';', subtree_size: 4},
+```
+
+This is handled by the `ExpectAsOrTypeExpression` code from
+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp):
+
+```cpp
+if (context.PositionIs(Lex::TokenKind::As)) {
+  // as <expression> ...
+  context.AddLeafNode(NodeKind::DefaultSelfImplAs, context.Consume());
+  context.PushState(State::Expr);
+} else {
+  // <expression> as <expression>...
+  context.PushState(State::ImplBeforeAs);
+  context.PushStateForExpr(PrecedenceGroup::ForImplAs());
+}
+```
+
+and then `HandleImplBeforeAs` creates the parent node in the second case:
+
+```cpp
+auto state = context.PopState();
+if (auto as = context.ConsumeIf(Lex::TokenKind::As)) {
+  context.AddNode(NodeKind::TypeImplAs, *as, state.subtree_start,
+                  state.has_error);
+  context.PushState(State::Expr);
+} else {
+  if (!state.has_error) {
+    CARBON_DIAGNOSTIC(ImplExpectedAs, Error,
+                      "Expected `as` in `impl` declaration.");
+    context.emitter().Emit(*context.position(), ImplExpectedAs);
+  }
+  context.ReturnErrorOnState();
+}
+```
+
+Note (1) that the `state.subtree_start` value comes from the
+`context.PushState(State::ImplBeforeAs);` before parsing the type expression,
+and that is how that type expression ends up as the child of the created
+`TypeImplAs` node. Unlike
+[the previous case 1](#case-1-introducer-to-optional-clause-is-used-as-parent-node),
+though, the parent node uses the token after the optional expression, rather
+than an introducer token for the optional clause.
+
+Note (2) how `HandleImplBeforeAs` handles three cases of errors:
+
+-   `as` present but an error in the child type expression -> error on the
+    output `TypeImplAs` node, but not propagated to the parent.
+-   Error from no `as` present but the type expression was okay -> create a new
+    error.
+-   There was error from the child type expression and no `as` present -> no new
+    diagnostic, we suppress errors once one is emitted until we can recover.
+
+If there is no `as` token, we don't output either a `TypeImplAs` or a
+`DefaultSelfImplAs` node, as required by the parent node, so in those cases we
+mark the parent as having an error.
+
+#### Case 3: optional sibling
+
+> TODO: This was changed by
+> [#3678](https://github.com/carbon-language/carbon-lang/pull/3678) and needs to
+> be updated.
+
+**Example:** The optional type expression before `as` in `impl as` is output as
+an optional sibling subtree between the `ImplIntroducer` node for the `impl`
+introducer and the `ImplAs` node for the required `as` keyword.
+
+`impl bool as Interface;` is transformed to:
+
+```yaml
+  {kind: 'ImplIntroducer', text: 'impl'},
+  {kind: 'BoolTypeLiteral', text: 'bool'},
+  {kind: 'ImplAs', text: 'as'},
+  {kind: 'IdentifierNameExpr', text: 'Interface'},
+{kind: 'ImplDecl', text: ';', subtree_size: 5},
+```
+
+while `impl as Interface;` is transformed to:
+
+```yaml
+  {kind: 'ImplIntroducer', text: 'impl'},
+  {kind: 'ImplAs', text: 'as'},
+  {kind: 'IdentifierNameExpr', text: 'Interface'},
+{kind: 'ImplDecl', text: ';', subtree_size: 4},
+```
+
+This is handled by the `ExpectAsOrTypeExpression` code from
+[parse/handle_impl.cpp](/toolchain/parse/handle_impl.cpp):
+
+```cpp
+if (context.PositionIs(Lex::TokenKind::As)) {
+  // as <expression> ...
+  context.AddLeafNode(NodeKind::ImplAs, context.Consume());
+  context.PushState(State::Expr);
+} else {
+  // <expression> as <expression>...
+  context.PushState(State::ImplBeforeAs);
+  context.PushStateForExpr(PrecedenceGroup::ForImplAs());
+}
+```
+
+and then `HandleImplBeforeAs` follows
+[the "something required in context" pattern](#something-required-in-context) to
+deal with the `as` that follows when the type expression is present.
+
+### Operators
+
+FIXME
+
+An independent description of our approach:
+["Better operator precedence" on scattered-thoughts.net](https://www.scattered-thoughts.net/writing/better-operator-precedence/)

ファイルの差分が大きいため隠しています
+ 0 - 0
toolchain/docs/parse.svg


+ 3 - 1
website/prebuild.py

@@ -189,7 +189,9 @@ def main() -> None:
 
     # Reset the order for the implementation children.
     nav_order[0] = 0
-    label_subdir("toolchain", next(nav_order), parent_title="Implementation")
+    label_subdir(
+        "toolchain/docs", next(nav_order), parent_title="Implementation"
+    )
     label_subdir("explorer", next(nav_order), parent_title="Implementation")
     label_subdir("testing", next(nav_order), parent_title="Implementation")
 

この差分においてかなりの量のファイルが変更されているため、一部のファイルを表示していません