Просмотр исходного кода

Move toolchain alternatives to proposals (#6716)

As part of using the evolution process with the toolchain, alternatives
should
be in proposals. This proposal migrates existing alternatives here.

Assisted-by: Google Antigravity with Gemini 3 Flash
Jon Ross-Perkins 2 месяцев назад
Родитель
Сommit
13d5fe9eed
5 измененных файлов с 189 добавлено и 120 удалено
  1. 177 0
      proposals/p6716.md
  2. 1 8
      toolchain/docs/check/README.md
  3. 9 42
      toolchain/docs/coalesce_generic_lowering.md
  4. 1 10
      toolchain/docs/lex.md
  5. 1 60
      toolchain/docs/parse.md

+ 177 - 0
proposals/p6716.md

@@ -0,0 +1,177 @@
+# Move toolchain alternatives to proposals
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6716)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Proposal](#proposal)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Lex](#lex)
+        -   [Bracket matching in parser](#bracket-matching-in-parser)
+    -   [Parse](#parse)
+        -   [Restrictive parsing](#restrictive-parsing)
+    -   [Check](#check)
+        -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
+    -   [Coalescing generic functions emitted when lowering to LLVM IR](#coalescing-generic-functions-emitted-when-lowering-to-llvm-ir)
+        -   [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
+        -   [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
+        -   [Compile-time trade-offs](#compile-time-trade-offs)
+        -   [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
+
+<!-- tocstop -->
+
+## Abstract
+
+As part of using the evolution process with the toolchain, alternatives should
+be in proposals. This proposal migrates existing alternatives here.
+
+## Problem
+
+Leads want to use the evolution process with more of the toolchain changes.
+Historically, alternatives were documented as part of the toolchain design, not
+going through evolution. Switching leaves those older alternatives in the
+toolchain design, while putting newer alternatives in the proposals directory.
+
+## Proposal
+
+Move existing alternatives to this proposal. Future proposals will more
+naturally have alternatives in the proposal document itself.
+
+## Rationale
+
+This is in support of the
+[evolution process](/docs/project/goals.md#software-and-language-evolution),
+aligning the toolchain documentation with design documentation.
+
+## Alternatives considered
+
+### Lex
+
+#### Bracket matching in parser
+
+Bracket matching could have also been implemented in the parser, with some
+awareness of parse state. However, that would shift some of the complexity of
+recovery in other error situations, such as where the parser searches for the
+next comma in a list. That needs to skip over bracketed ranges. We don't think
+the trade-offs would yield a net benefit, so any change in this direction would
+need to show concrete improvement, for example better diagnostics for common
+issues.
+
+### Parse
+
+#### Restrictive parsing
+
+The toolchain will often parse code that could theoretically be rejected,
+instead allowing the check phase to reject incorrect structures.
+
+For example, consider the code `abstract var x: i32 = 0;`. When parsing the
+`abstract` modifier, parse could do single-token lookahead to see `var`, and
+error in the parse (`abstract var` is never valid). Instead, we save the
+modifier and diagnose it during check.
+
+The problem is that code isn't always this simple. Considering the above
+example, there could be other modifiers, such as
+`abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a
+general solution. Some modifiers are also contextually valid; for example,
+`abstract fn` is only valid inside an `abstract class` scope. As a consequence,
+a form of either arbitrary lookahead or additional context would be necessary in
+parse in order to reliably diagnose incorrect uses of `abstract`. In contrast
+with parse, check will have that additional context.
+
+Rejecting incorrect code during parsing can also have negative consequences for
+diagnostics. The additional information that check has about semantics may
+produce better diagnostics. Alternately, sometimes check will produce
+diagnostics equivalent to what parse could, but with less work overall.
+
+As a consequence, at times we will defer to the check phase to produce
+diagnostics instead of trying to produce those same diagnostics during parse.
+Some examples of why we might diagnose in check instead of parse are:
+
+-   To issue better diagnostics based on semantic information.
+-   To diagnose similar invalid uses in one place, versus partly in check and
+    partly in parse.
+-   To support syntax highlighting for IDEs in near-correct code, still being
+    typed.
+
+Some examples of why we might diagnose in parse are:
+
+-   When it's important to distinguish between multiple possible syntaxes.
+-   When permitting the syntax would require more work than rejecting it.
+
+A few examples of parse designs to avoid are:
+
+-   Using arbitrary lookahead.
+    -   Looking ahead one or two tokens is okay. However, we should never have
+        arbitrary lookahead.
+    -   This includes approaches which would require using the mapping of
+        opening brackets to closing brackets that is produced by
+        `TokenizedBuffer`. Those are helpful for error recovery.
+-   Building complex context.
+    -   We want parsing to be faster and lighter weight than check.
+-   Duplicating diagnostics between parse and check.
+    -   When there are closely related invalid variants of syntax, only some of
+        which can be diagnosed during parse, consider diagnosing all variants
+        during check.
+
+This is a balance. We don't want to unnecessarily shift costs from parse onto
+check, and we don't try to allow clearly invalid constructs. Parse still tries
+to produce a reasonable parse tree. However, parse leans more towards a
+permissive parse, and an error-free parse tree does not mean the code is
+grammatically correct.
+
+### Check
+
+#### Using a traditional AST representation
+
+Clang creates an AST as part of compilation. In Carbon, it's something we could
+do as a step between parsing and checking, possibly replacing the SemIR. It's
+likely that doing so would be simpler, amongst other possible trade-offs.
+However, we think the SemIR approach is going to yield higher performance,
+enough so that it's the chosen approach.
+
+### Coalescing generic functions emitted when lowering to LLVM IR
+
+#### Coalescing in the front-end vs back-end?
+
+An alternative considered was not doing any coalescing in the front-end and
+relying on LLVM to make the analysis and optimization. The current choice was
+made based on the expectation that such an
+[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
+terms of compile-time. The relative cost has not yet been evaluated.
+
+#### When to do coalescing in the front-end?
+
+The analysis and coalescing could be done prior to lowering, after
+specialization. The advantage of that choice would be avoiding to lower
+duplicate LLVM functions and then removing the duplicates. The disadvantage of
+that choice would be duplicating much of the lowering logic, currently necessary
+to make the equivalence determination.
+
+#### Compile-time trade-offs
+
+Not doing any coalescing is also expected to increase the back-end codegen time
+more than performing the analysis and deduplication. This can be evaluated in
+practice and the feature disabled if found to be too costly.
+
+#### Coalescing duplicate non-specific functions
+
+We could coalesce duplicate functions in non-specific cases, similar to lld's
+[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
+[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
+require fingerprinting all instructions in all functions, whereas specific
+coalescing can focus on cases that only Carbon's front-end knows about. Carbon
+would also be restricted to coalescing functions in a single compilation unit,
+which would require replacing function definitions that allow external calls
+with a placeholder that calls the coalesced definition. We don't expect
+sufficient advantages over existing support.

+ 1 - 8
toolchain/docs/check/README.md

@@ -37,7 +37,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Value bindings](#value-bindings)
     -   [Value bindings](#value-bindings)
 -   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
 -   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
 -   [Alternatives considered](#alternatives-considered)
 -   [Alternatives considered](#alternatives-considered)
-    -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
 
 
 <!-- tocstop -->
 <!-- tocstop -->
 
 
@@ -667,10 +666,4 @@ interfere with checking later valid lines in the same function.
 
 
 ## Alternatives considered
 ## Alternatives considered
 
 
-### Using a traditional AST representation
-
-Clang creates an AST as part of compilation. In Carbon, it's something we could
-do as a step between parsing and checking, possibly replacing the SemIR. It's
-likely that doing so would be simpler, amongst other possible trade-offs.
-However, we think the SemIR approach is going to yield higher performance,
-enough so that it's the chosen approach.
+-   [Using a traditional AST representation](/proposals/p6716.md#using-a-traditional-ast-representation)

+ 9 - 42
toolchain/docs/coalesce_generic_lowering.md

@@ -17,12 +17,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Function fingerprints](#function-fingerprints)
     -   [Function fingerprints](#function-fingerprints)
     -   [Canonical specific to use](#canonical-specific-to-use)
     -   [Canonical specific to use](#canonical-specific-to-use)
 -   [Algorithm details](#algorithm-details)
 -   [Algorithm details](#algorithm-details)
--   [Alternatives considered](#alternatives-considered)
-    -   [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
-    -   [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
-    -   [Compile-time trade-offs](#compile-time-trade-offs)
-    -   [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
 -   [Opportunities for further improvement](#opportunities-for-further-improvement)
 -   [Opportunities for further improvement](#opportunities-for-further-improvement)
+-   [Alternatives considered](#alternatives-considered)
 
 
 <!-- tocstop -->
 <!-- tocstop -->
 
 
@@ -119,7 +115,7 @@ quadratic pass walking all calls instructions and comparing if the `specific_id`
 information is equivalent. These optimizations are not currently implemented.
 information is equivalent. These optimizations are not currently implemented.
 
 
 Note that this does not
 Note that this does not
-[coalesce non-specifics](#coalescing-duplicate-non-specific-functions).
+[coalesce non-specifics](/proposals/p6716.md#coalescing-duplicate-non-specific-functions).
 
 
 ### Canonical specific to use
 ### Canonical specific to use
 
 
@@ -233,42 +229,6 @@ CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool {
 }
 }
 ```
 ```
 
 
-## Alternatives considered
-
-### Coalescing in the front-end vs back-end?
-
-An alternative considered was not doing any coalescing in the front-end and
-relying on LLVM to make the analysis and optimization. The current choice was
-made based on the expectation that such an
-[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
-terms of compile-time. The relative cost has not yet been evaluated.
-
-### When to do coalescing in the front-end?
-
-The analysis and coalescing could be done prior to lowering, after
-specialization. The advantage of that choice would be avoiding to lower
-duplicate LLVM functions and then removing the duplicates. The disadvantage of
-that choice would be duplicating much of the lowering logic, currently necessary
-to make the equivalence determination.
-
-### Compile-time trade-offs
-
-Not doing any coalescing is also expected to increase the back-end codegen time
-more than performing the analysis and deduplication. This can be evaluated in
-practice and the feature disabled if found to be too costly.
-
-### Coalescing duplicate non-specific functions
-
-We could coalesce duplicate functions in non-specific cases, similar to lld's
-[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
-[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
-require fingerprinting all instructions in all functions, whereas specific
-coalescing can focus on cases that only Carbon's front-end knows about. Carbon
-would also be restricted to coalescing functions in a single compilation unit,
-which would require replacing function definitions that allow external calls
-with a placeholder that calls the coalesced definition. We don't expect
-sufficient advantages over existing support.
-
 ## Opportunities for further improvement
 ## Opportunities for further improvement
 
 
 The current implemented algorithm can be improved with at least the following:
 The current implemented algorithm can be improved with at least the following:
@@ -290,3 +250,10 @@ manner that is translation-unit independent, so this can be used in the mangled
 name, and the same function name emitted. This does not currently occur, as the
 name, and the same function name emitted. This does not currently occur, as the
 two fingerprints use internal SemIR identifiers (`function_id` and `specific_id`
 two fingerprints use internal SemIR identifiers (`function_id` and `specific_id`
 respectively).
 respectively).
+
+## Alternatives considered
+
+-   [Coalescing in the front-end vs back-end?](/proposals/p6716.md#coalescing-in-the-front-end-vs-back-end)
+-   [When to do coalescing in the front-end?](/proposals/p6716.md#when-to-do-coalescing-in-the-front-end)
+-   [Compile-time trade-offs](/proposals/p6716.md#compile-time-trade-offs)
+-   [Coalescing duplicate non-specific functions](/proposals/p6716.md#coalescing-duplicate-non-specific-functions)

+ 1 - 10
toolchain/docs/lex.md

@@ -13,7 +13,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -   [Overview](#overview)
 -   [Overview](#overview)
 -   [Bracket matching](#bracket-matching)
 -   [Bracket matching](#bracket-matching)
 -   [Alternatives considered](#alternatives-considered)
 -   [Alternatives considered](#alternatives-considered)
-    -   [Bracket matching in parser](#bracket-matching-in-parser)
 
 
 <!-- tocstop -->
 <!-- tocstop -->
 
 
@@ -33,12 +32,4 @@ indentation, that is not yet implemented.
 
 
 ## Alternatives considered
 ## Alternatives considered
 
 
-### Bracket matching in parser
-
-Bracket matching could have also been implemented in the parser, with some
-awareness of parse state. However, that would shift some of the complexity of
-recovery in other error situations, such as where the parser searches for the
-next comma in a list. That needs to skip over bracketed ranges. We don't think
-the trade-offs would yield a net benefit, so any change in this direction would
-need to show concrete improvement, for example better diagnostics for common
-issues.
+-   [Bracket matching in parser](/proposals/p6716.md#bracket-matching-in-parser)

+ 1 - 60
toolchain/docs/parse.md

@@ -26,7 +26,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
         -   [Case 3: optional sibling](#case-3-optional-sibling)
         -   [Case 3: optional sibling](#case-3-optional-sibling)
     -   [Operators](#operators)
     -   [Operators](#operators)
 -   [Alternatives considered](#alternatives-considered)
 -   [Alternatives considered](#alternatives-considered)
-    -   [Restrictive parsing](#restrictive-parsing)
 
 
 <!-- tocstop -->
 <!-- tocstop -->
 
 
@@ -805,62 +804,4 @@ An independent description of our approach:
 
 
 ## Alternatives considered
 ## Alternatives considered
 
 
-### Restrictive parsing
-
-The toolchain will often parse code that could theoretically be rejected,
-instead allowing the check phase to reject incorrect structures.
-
-For example, consider the code `abstract var x: i32 = 0;`. When parsing the
-`abstract` modifier, parse could do single-token lookahead to see `var`, and
-error in the parse (`abstract var` is never valid). Instead, we save the
-modifier and diagnose it during check.
-
-The problem is that code isn't always this simple. Considering the above
-example, there could be other modifiers, such as
-`abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a
-general solution. Some modifiers are also contextually valid; for example,
-`abstract fn` is only valid inside an `abstract class` scope. As a consequence,
-a form of either arbitrary lookahead or additional context would be necessary in
-parse in order to reliably diagnose incorrect uses of `abstract`. In contrast
-with parse, check will have that additional context.
-
-Rejecting incorrect code during parsing can also have negative consequences for
-diagnostics. The additional information that check has about semantics may
-produce better diagnostics. Alternately, sometimes check will produce
-diagnostics equivalent to what parse could, but with less work overall.
-
-As a consequence, at times we will defer to the check phase to produce
-diagnostics instead of trying to produce those same diagnostics during parse.
-Some examples of why we might diagnose in check instead of parse are:
-
--   To issue better diagnostics based on semantic information.
--   To diagnose similar invalid uses in one place, versus partly in check and
-    partly in parse.
--   To support syntax highlighting for IDEs in near-correct code, still being
-    typed.
-
-Some examples of why we might diagnose in parse are:
-
--   When it's important to distinguish between multiple possible syntaxes.
--   When permitting the syntax would require more work than rejecting it.
-
-A few examples of parse designs to avoid are:
-
--   Using arbitrary lookahead.
-    -   Looking ahead one or two tokens is okay. However, we should never have
-        arbitrary lookahead.
-    -   This includes approaches which would require using the mapping of
-        opening brackets to closing brackets that is produced by
-        `TokenizedBuffer`. Those are helpful for error recovery.
--   Building complex context.
-    -   We want parsing to be faster and lighter weight than check.
--   Duplicating diagnostics between parse and check.
-    -   When there are closely related invalid variants of syntax, only some of
-        which can be diagnosed during parse, consider diagnosing all variants
-        during check.
-
-This is a balance. We don't want to unnecessarily shift costs from parse onto
-check, and we don't try to allow clearly invalid constructs. Parse still tries
-to produce a reasonable parse tree. However, parse leans more towards a
-permissive parse, and an error-free parse tree does not mean the code is
-grammatically correct.
+-   [Restrictive parsing](/proposals/p6716.md#restrictive-parsing)