Pārlūkot izejas kodu

Update instructions for adding a SemIR instruction. (#4348)

Co-authored-by: Josh L <josh11b@users.noreply.github.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
josh11b 1 gadu atpakaļ
vecāks
revīzija
9e5e33082c
2 mainītis faili ar 83 papildinājumiem un 16 dzēšanām
  1. 1 1
      toolchain/check/context.cpp
  2. 82 15
      toolchain/docs/adding_features.md

+ 1 - 1
toolchain/check/context.cpp

@@ -1099,7 +1099,7 @@ class TypeCompleter {
   auto BuildValueRepr(SemIR::TypeId type_id, SemIR::Inst inst) const
       -> SemIR::ValueRepr {
     // Use overload resolution to select the implementation, producing compile
-    // errors when BuildTypeForInst isn't defined for a given instruction.
+    // errors when BuildValueReprForInst isn't defined for a given instruction.
     CARBON_KIND_SWITCH(inst) {
 #define CARBON_SEM_IR_INST_KIND(Name)                  \
   case CARBON_KIND(SemIR::Name typed_inst): {          \

+ 82 - 15
toolchain/docs/adding_features.md

@@ -14,6 +14,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -   [Parse](#parse)
     -   [Typed parse node metadata implementation](#typed-parse-node-metadata-implementation)
 -   [Check](#check)
+    -   [Adding a new SemIR instruction](#adding-a-new-semir-instruction)
     -   [SemIR typed instruction metadata implementation](#semir-typed-instruction-metadata-implementation)
 -   [Lower](#lower)
 -   [Tests and debugging](#tests-and-debugging)
@@ -232,42 +233,108 @@ Note: this is broadly similar to
 Each parse node kind requires adding a `Handle<kind>` function in a
 `check/handle_*.cpp` file.
 
+### Adding a new SemIR instruction
+
 If the resulting SemIR needs a new instruction:
 
--   add a new kind to [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def)
+-   Add a new kind to [sem_ir/inst_kind.def](/toolchain/sem_ir/inst_kind.def).
+
     -   Add a `CARBON_SEM_IR_INST_KIND(NewInstKindName)` line in alphabetical
         order
--   a new struct definition to
+
+-   Add a new struct definition to
     [sem_ir/typed_insts.h](/toolchain/sem_ir/typed_insts.h), such as:
 
     ```cpp
     struct NewInstKindName {
-        static constexpr auto Kind = InstKind::NewInstKindName.Define(
-            // the name used in textual IR
-            "new_inst_kind_name"
-            // Optional: , TerminatorKind::KindOfTerminator
+        static constexpr auto Kind =
+            // `Parse::SomeId` should be one of:
+            // - A node ID from `parse/node_ids.h`,
+            //   specifying the kind of parse nodes for this instruction.
+            //   This could be a node kind from `parse/node_kind.def`
+            //   suffixed by `Id`, or one of the `Any`...`Id` alias
+            //   declarations that match multiple kinds of parse nodes.
+            // - `Parse::NodeId` if it can be any kind of parse node.
+            // - `Parse::InvalidNodeId` if no associated parse node.
+            InstKind::NewInstKindName.Define<Parse::SomeId>(
+                // The name used in textual IR:
+                {.ir_name = "new_inst_kind_name"}
+                // Other parameters have defaults.
             );
 
-        // Optional: omit if not associated with a parse node.
-        Parse::Node parse_node;
-
-        // Optional: omit if this sem_ir instruction does not produce a value.
+        // Optional: Include if this instruction produces a value used in
+        // an expression.
         TypeId type_id;
 
-        // 0-2 id fields, with types from sem_ir/ids.h or sem_ir/builtin_kind.h
-        // For example, fields would look like:
+        // 0-2 id fields, with types from sem_ir/ids.h or
+        // sem_ir/builtin_kind.h. For example, fields would look like:
         StringId name_id;
         InstId value_id;
     };
     ```
 
-Adding an instruction will also require a handler in the Lower step.
+    -   [`sem_ir/inst_kind.h`](/toolchain/sem_ir/inst_kind.h) documents the
+        different options when defining a new instruction, as well as their
+        defaults, see `InstKind::DefinitionInfo`.
+    -   If an instruction always produces a type:
+
+        -   Set `.is_type = InstIsType::Always` in its `Kind` definition.
+        -   When constructing instructions of this kind, pass
+            `SemIR::TypeId::TypeType` in as the value of the `type_id` field, as
+            in:
+
+            ```
+            SemIR::InstId inst_id = context.AddInst<SemIR::NewInstKindName>(
+                node_id, {.type_id = SemIR::TypeId::TypeType, ...});
+            ```
+
+    -   Although most instructions have distinct types represented by
+        instructions like `ClassType`, we also have builtin types for cases
+        where types don't need to be distinct per-entity. This is rare, but
+        used, for example, when an expression implicitly uses a value as part of
+        SemIR evaluation or as part of desugaring. We have builtin types for
+        bound methods, namespaces, witnesses, among others. These are defined in
+        [`sem_ir/builtin_inst_kind.def`](/toolchain/sem_ir/builtin_inst_kind.def).
+        To get a type id for one of these builtin types, use something like
+        `context.GetBuiltinType(SemIR::BuiltinInstKind::WitnessType)`, as in:
+
+        ```
+        SemIR::TypeId witness_type_id =
+            context.GetBuiltinType(SemIR::BuiltinInstKind::WitnessType);
+        SemIR::InstId inst_id = context.AddInst<SemIR::NewInstKindName>(
+            node_id, {.type_id = witness_type_id, ...});
+        ```
+
+    -   Instructions without types may still be used as arguments to
+        instructions.
+
+Once those are added, a rebuild will give errors showing what needs to be
+updated. The updates needed, can depend on whether the instruction produces a
+type. Look to the comments on those functions for instructions on what is
+needed.
+
+Instructions won't be given a name unless
+[`InstNamer::CollectNamesInBlock](/toolchain/sem_ir/inst_namer.cpp) is called on
+the `InstBlockId` they are a member of. As of this writing,
+`InstNamer::CollectNamesInBlock` should only be called once per `InstBlockId`.
+To accomplish this, there should be one instruction kind that "owns" the
+instruction block, and will have a case in `InstNamer::CollectNamesInBlock` that
+visits the `InstBlockId`. That instruction kind will typically use
+`FormatTrailingBlock` in the `sem_ir/formatter.cpp` to list the instructions in
+curly braces (`{`...`}`). Other instructions that reference that `InstBlockId`
+will use the default rendering that has just the instruction names in parens
+(`(`...`)`).
+
+Adding an instruction will generally also require a handler in the Lower step.
 
 Most new instructions will automatically be formatted reasonably by the SemIR
-formatter.
+formatter. If not, then add a `FormatInst` overload to
+[`sem_ir/formatter.cpp`](/toolchain/sem_ir/formatter.cpp). If only the arguments
+need custom formatting, then a `FormatInstRHS` overload can be implemented
+instead.
 
 If the resulting SemIR needs a new built-in, add it to
-[builtin_inst_kind.def](/toolchain/sem_ir/builtin_inst_kind.def).
+[`sem_ir/builtin_inst_kind.def`](/toolchain/sem_ir/builtin_inst_kind.def).
 
 ### SemIR typed instruction metadata implementation