Преглед на файлове

Move toolchain alternatives to proposals (#6716)

As part of using the evolution process with the toolchain, alternatives
should
be in proposals. This proposal migrates existing alternatives here.

Assisted-by: Google Antigravity with Gemini 3 Flash
Jon Ross-Perkins преди 2 месеца
родител
ревизия
13d5fe9eed
променени са 5 файла, в които са добавени 189 реда и са изтрити 120 реда
  1. 177 0
      proposals/p6716.md
  2. 1 8
      toolchain/docs/check/README.md
  3. 9 42
      toolchain/docs/coalesce_generic_lowering.md
  4. 1 10
      toolchain/docs/lex.md
  5. 1 60
      toolchain/docs/parse.md

+ 177 - 0
proposals/p6716.md

@@ -0,0 +1,177 @@
+# Move toolchain alternatives to proposals
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6716)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Proposal](#proposal)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Lex](#lex)
+        -   [Bracket matching in parser](#bracket-matching-in-parser)
+    -   [Parse](#parse)
+        -   [Restrictive parsing](#restrictive-parsing)
+    -   [Check](#check)
+        -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
+    -   [Coalescing generic functions emitted when lowering to LLVM IR](#coalescing-generic-functions-emitted-when-lowering-to-llvm-ir)
+        -   [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
+        -   [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
+        -   [Compile-time trade-offs](#compile-time-trade-offs)
+        -   [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
+
+<!-- tocstop -->
+
+## Abstract
+
+As part of using the evolution process with the toolchain, alternatives should
+be in proposals. This proposal migrates existing alternatives here.
+
+## Problem
+
+Leads want to use the evolution process with more of the toolchain changes.
+Historically, alternatives were documented as part of the toolchain design, not
+going through evolution. Switching leaves those older alternatives in the
+toolchain design, while putting newer alternatives in the proposals directory.
+
+## Proposal
+
+Move existing alternatives to this proposal. Future proposals will more
+naturally have alternatives in the proposal document itself.
+
+## Rationale
+
+This is in support of the
+[evolution process](/docs/project/goals.md#software-and-language-evolution),
+aligning the toolchain documentation with design documentation.
+
+## Alternatives considered
+
+### Lex
+
+#### Bracket matching in parser
+
+Bracket matching could have also been implemented in the parser, with some
+awareness of parse state. However, that would shift some of the complexity of
+recovery in other error situations, such as where the parser searches for the
+next comma in a list. That needs to skip over bracketed ranges. We don't think
+the trade-offs would yield a net benefit, so any change in this direction would
+need to show concrete improvement, for example better diagnostics for common
+issues.
+
+### Parse
+
+#### Restrictive parsing
+
+The toolchain will often parse code that could theoretically be rejected,
+instead allowing the check phase to reject incorrect structures.
+
+For example, consider the code `abstract var x: i32 = 0;`. When parsing the
+`abstract` modifier, parse could do single-token lookahead to see `var`, and
+error in the parse (`abstract var` is never valid). Instead, we save the
+modifier and diagnose it during check.
+
+The problem is that code isn't always this simple. Considering the above
+example, there could be other modifiers, such as
+`abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a
+general solution. Some modifiers are also contextually valid; for example,
+`abstract fn` is only valid inside an `abstract class` scope. As a consequence,
+a form of either arbitrary lookahead or additional context would be necessary in
+parse in order to reliably diagnose incorrect uses of `abstract`. In contrast
+with parse, check will have that additional context.
+
+Rejecting incorrect code during parsing can also have negative consequences for
+diagnostics. The additional information that check has about semantics may
+produce better diagnostics. Alternately, sometimes check will produce
+diagnostics equivalent to what parse could, but with less work overall.
+
+As a consequence, at times we will defer to the check phase to produce
+diagnostics instead of trying to produce those same diagnostics during parse.
+Some examples of why we might diagnose in check instead of parse are:
+
+-   To issue better diagnostics based on semantic information.
+-   To diagnose similar invalid uses in one place, versus partly in check and
+    partly in parse.
+-   To support syntax highlighting for IDEs in near-correct code, still being
+    typed.
+
+Some examples of why we might diagnose in parse are:
+
+-   When it's important to distinguish between multiple possible syntaxes.
+-   When permitting the syntax would require more work than rejecting it.
+
+A few examples of parse designs to avoid are:
+
+-   Using arbitrary lookahead.
+    -   Looking ahead one or two tokens is okay. However, we should never have
+        arbitrary lookahead.
+    -   This includes approaches which would require using the mapping of
+        opening brackets to closing brackets that is produced by
+        `TokenizedBuffer`. Those are helpful for error recovery.
+-   Building complex context.
+    -   We want parsing to be faster and lighter weight than check.
+-   Duplicating diagnostics between parse and check.
+    -   When there are closely related invalid variants of syntax, only some of
+        which can be diagnosed during parse, consider diagnosing all variants
+        during check.
+
+This is a balance. We don't want to unnecessarily shift costs from parse onto
+check, and we don't try to allow clearly invalid constructs. Parse still tries
+to produce a reasonable parse tree. However, parse leans more towards a
+permissive parse, and an error-free parse tree does not mean the code is
+grammatically correct.
+
+### Check
+
+#### Using a traditional AST representation
+
+Clang creates an AST as part of compilation. In Carbon, it's something we could
+do as a step between parsing and checking, possibly replacing the SemIR. It's
+likely that doing so would be simpler, amongst other possible trade-offs.
+However, we think the SemIR approach is going to yield higher performance,
+enough so that it's the chosen approach.
+
+### Coalescing generic functions emitted when lowering to LLVM IR
+
+#### Coalescing in the front-end vs back-end?
+
+An alternative considered was not doing any coalescing in the front-end and
+relying on LLVM to make the analysis and optimization. The current choice was
+made based on the expectation that such an
+[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
+terms of compile-time. The relative cost has not yet been evaluated.
+
+#### When to do coalescing in the front-end?
+
+The analysis and coalescing could be done prior to lowering, after
+specialization. The advantage of that choice would be avoiding to lower
+duplicate LLVM functions and then removing the duplicates. The disadvantage of
+that choice would be duplicating much of the lowering logic, currently necessary
+to make the equivalence determination.
+
+#### Compile-time trade-offs
+
+Not doing any coalescing is also expected to increase the back-end codegen time
+more than performing the analysis and deduplication. This can be evaluated in
+practice and the feature disabled if found to be too costly.
+
+#### Coalescing duplicate non-specific functions
+
+We could coalesce duplicate functions in non-specific cases, similar to lld's
+[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
+[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
+require fingerprinting all instructions in all functions, whereas specific
+coalescing can focus on cases that only Carbon's front-end knows about. Carbon
+would also be restricted to coalescing functions in a single compilation unit,
+which would require replacing function definitions that allow external calls
+with a placeholder that calls the coalesced definition. We don't expect
+sufficient advantages over existing support.

+ 1 - 8
toolchain/docs/check/README.md

@@ -37,7 +37,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Value bindings](#value-bindings)
 -   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
 -   [Alternatives considered](#alternatives-considered)
-    -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
 
 <!-- tocstop -->
 
@@ -667,10 +666,4 @@ interfere with checking later valid lines in the same function.
 
 ## Alternatives considered
 
-### Using a traditional AST representation
-
-Clang creates an AST as part of compilation. In Carbon, it's something we could
-do as a step between parsing and checking, possibly replacing the SemIR. It's
-likely that doing so would be simpler, amongst other possible trade-offs.
-However, we think the SemIR approach is going to yield higher performance,
-enough so that it's the chosen approach.
+-   [Using a traditional AST representation](/proposals/p6716.md#using-a-traditional-ast-representation)

+ 9 - 42
toolchain/docs/coalesce_generic_lowering.md

@@ -17,12 +17,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Function fingerprints](#function-fingerprints)
     -   [Canonical specific to use](#canonical-specific-to-use)
 -   [Algorithm details](#algorithm-details)
--   [Alternatives considered](#alternatives-considered)
-    -   [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
-    -   [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
-    -   [Compile-time trade-offs](#compile-time-trade-offs)
-    -   [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
 -   [Opportunities for further improvement](#opportunities-for-further-improvement)
+-   [Alternatives considered](#alternatives-considered)
 
 <!-- tocstop -->
 
@@ -119,7 +115,7 @@ quadratic pass walking all calls instructions and comparing if the `specific_id`
 information is equivalent. These optimizations are not currently implemented.
 
 Note that this does not
-[coalesce non-specifics](#coalescing-duplicate-non-specific-functions).
+[coalesce non-specifics](/proposals/p6716.md#coalescing-duplicate-non-specific-functions).
 
 ### Canonical specific to use
 
@@ -233,42 +229,6 @@ CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool {
 }
 ```
 
-## Alternatives considered
-
-### Coalescing in the front-end vs back-end?
-
-An alternative considered was not doing any coalescing in the front-end and
-relying on LLVM to make the analysis and optimization. The current choice was
-made based on the expectation that such an
-[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
-terms of compile-time. The relative cost has not yet been evaluated.
-
-### When to do coalescing in the front-end?
-
-The analysis and coalescing could be done prior to lowering, after
-specialization. The advantage of that choice would be avoiding to lower
-duplicate LLVM functions and then removing the duplicates. The disadvantage of
-that choice would be duplicating much of the lowering logic, currently necessary
-to make the equivalence determination.
-
-### Compile-time trade-offs
-
-Not doing any coalescing is also expected to increase the back-end codegen time
-more than performing the analysis and deduplication. This can be evaluated in
-practice and the feature disabled if found to be too costly.
-
-### Coalescing duplicate non-specific functions
-
-We could coalesce duplicate functions in non-specific cases, similar to lld's
-[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
-[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
-require fingerprinting all instructions in all functions, whereas specific
-coalescing can focus on cases that only Carbon's front-end knows about. Carbon
-would also be restricted to coalescing functions in a single compilation unit,
-which would require replacing function definitions that allow external calls
-with a placeholder that calls the coalesced definition. We don't expect
-sufficient advantages over existing support.
-
 ## Opportunities for further improvement
 
 The current implemented algorithm can be improved with at least the following:
@@ -290,3 +250,10 @@ manner that is translation-unit independent, so this can be used in the mangled
 name, and the same function name emitted. This does not currently occur, as the
 two fingerprints use internal SemIR identifiers (`function_id` and `specific_id`
 respectively).
+
+## Alternatives considered
+
+-   [Coalescing in the front-end vs back-end?](/proposals/p6716.md#coalescing-in-the-front-end-vs-back-end)
+-   [When to do coalescing in the front-end?](/proposals/p6716.md#when-to-do-coalescing-in-the-front-end)
+-   [Compile-time trade-offs](/proposals/p6716.md#compile-time-trade-offs)
+-   [Coalescing duplicate non-specific functions](/proposals/p6716.md#coalescing-duplicate-non-specific-functions)

+ 1 - 10
toolchain/docs/lex.md

@@ -13,7 +13,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -   [Overview](#overview)
 -   [Bracket matching](#bracket-matching)
 -   [Alternatives considered](#alternatives-considered)
-    -   [Bracket matching in parser](#bracket-matching-in-parser)
 
 <!-- tocstop -->
 
@@ -33,12 +32,4 @@ indentation, that is not yet implemented.
 
 ## Alternatives considered
 
-### Bracket matching in parser
-
-Bracket matching could have also been implemented in the parser, with some
-awareness of parse state. However, that would shift some of the complexity of
-recovery in other error situations, such as where the parser searches for the
-next comma in a list. That needs to skip over bracketed ranges. We don't think
-the trade-offs would yield a net benefit, so any change in this direction would
-need to show concrete improvement, for example better diagnostics for common
-issues.
+-   [Bracket matching in parser](/proposals/p6716.md#bracket-matching-in-parser)

+ 1 - 60
toolchain/docs/parse.md

@@ -26,7 +26,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
         -   [Case 3: optional sibling](#case-3-optional-sibling)
     -   [Operators](#operators)
 -   [Alternatives considered](#alternatives-considered)
-    -   [Restrictive parsing](#restrictive-parsing)
 
 <!-- tocstop -->
 
@@ -805,62 +804,4 @@ An independent description of our approach:
 
 ## Alternatives considered
 
-### Restrictive parsing
-
-The toolchain will often parse code that could theoretically be rejected,
-instead allowing the check phase to reject incorrect structures.
-
-For example, consider the code `abstract var x: i32 = 0;`. When parsing the
-`abstract` modifier, parse could do single-token lookahead to see `var`, and
-error in the parse (`abstract var` is never valid). Instead, we save the
-modifier and diagnose it during check.
-
-The problem is that code isn't always this simple. Considering the above
-example, there could be other modifiers, such as
-`abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a
-general solution. Some modifiers are also contextually valid; for example,
-`abstract fn` is only valid inside an `abstract class` scope. As a consequence,
-a form of either arbitrary lookahead or additional context would be necessary in
-parse in order to reliably diagnose incorrect uses of `abstract`. In contrast
-with parse, check will have that additional context.
-
-Rejecting incorrect code during parsing can also have negative consequences for
-diagnostics. The additional information that check has about semantics may
-produce better diagnostics. Alternately, sometimes check will produce
-diagnostics equivalent to what parse could, but with less work overall.
-
-As a consequence, at times we will defer to the check phase to produce
-diagnostics instead of trying to produce those same diagnostics during parse.
-Some examples of why we might diagnose in check instead of parse are:
-
--   To issue better diagnostics based on semantic information.
--   To diagnose similar invalid uses in one place, versus partly in check and
-    partly in parse.
--   To support syntax highlighting for IDEs in near-correct code, still being
-    typed.
-
-Some examples of why we might diagnose in parse are:
-
--   When it's important to distinguish between multiple possible syntaxes.
--   When permitting the syntax would require more work than rejecting it.
-
-A few examples of parse designs to avoid are:
-
--   Using arbitrary lookahead.
-    -   Looking ahead one or two tokens is okay. However, we should never have
-        arbitrary lookahead.
-    -   This includes approaches which would require using the mapping of
-        opening brackets to closing brackets that is produced by
-        `TokenizedBuffer`. Those are helpful for error recovery.
--   Building complex context.
-    -   We want parsing to be faster and lighter weight than check.
--   Duplicating diagnostics between parse and check.
-    -   When there are closely related invalid variants of syntax, only some of
-        which can be diagnosed during parse, consider diagnosing all variants
-        during check.
-
-This is a balance. We don't want to unnecessarily shift costs from parse onto
-check, and we don't try to allow clearly invalid constructs. Parse still tries
-to produce a reasonable parse tree. However, parse leans more towards a
-permissive parse, and an error-free parse tree does not mean the code is
-grammatically correct.
+-   [Restrictive parsing](/proposals/p6716.md#restrictive-parsing)