преди 2 месеца · 13d5fe9eed
--- a/proposals/p6716.md
+++ b/proposals/p6716.md
@@ -0,0 +1,177 @@
 
				+# Move toolchain alternatives to proposals
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6716)
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Abstract](#abstract)
			
 
				+-   [Problem](#problem)
			
 
				+-   [Proposal](#proposal)
			
 
				+-   [Rationale](#rationale)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Lex](#lex)
			
 
				+        -   [Bracket matching in parser](#bracket-matching-in-parser)
			
 
				+    -   [Parse](#parse)
			
 
				+        -   [Restrictive parsing](#restrictive-parsing)
			
 
				+    -   [Check](#check)
			
 
				+        -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
			
 
				+    -   [Coalescing generic functions emitted when lowering to LLVM IR](#coalescing-generic-functions-emitted-when-lowering-to-llvm-ir)
			
 
				+        -   [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
			
 
				+        -   [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
			
 
				+        -   [Compile-time trade-offs](#compile-time-trade-offs)
			
 
				+        -   [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Abstract
			
 
				+
			
 
				+As part of using the evolution process with the toolchain, alternatives should
			
 
				+be in proposals. This proposal migrates existing alternatives here.
			
 
				+
			
 
				+## Problem
			
 
				+
			
 
				+Leads want to use the evolution process with more of the toolchain changes.
			
 
				+Historically, alternatives were documented as part of the toolchain design, not
			
 
				+going through evolution. Switching leaves those older alternatives in the
			
 
				+toolchain design, while putting newer alternatives in the proposals directory.
			
 
				+
			
 
				+## Proposal
			
 
				+
			
 
				+Move existing alternatives to this proposal. Future proposals will more
			
 
				+naturally have alternatives in the proposal document itself.
			
 
				+
			
 
				+## Rationale
			
 
				+
			
 
				+This is in support of the
			
 
				+[evolution process](/docs/project/goals.md#software-and-language-evolution),
			
 
				+aligning the toolchain documentation with design documentation.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Lex
			
 
				+
			
 
				+#### Bracket matching in parser
			
 
				+
			
 
				+Bracket matching could have also been implemented in the parser, with some
			
 
				+awareness of parse state. However, that would shift some of the complexity of
			
 
				+recovery in other error situations, such as where the parser searches for the
			
 
				+next comma in a list. That needs to skip over bracketed ranges. We don't think
			
 
				+the trade-offs would yield a net benefit, so any change in this direction would
			
 
				+need to show concrete improvement, for example better diagnostics for common
			
 
				+issues.
			
 
				+
			
 
				+### Parse
			
 
				+
			
 
				+#### Restrictive parsing
			
 
				+
			
 
				+The toolchain will often parse code that could theoretically be rejected,
			
 
				+instead allowing the check phase to reject incorrect structures.
			
 
				+
			
 
				+For example, consider the code `abstract var x: i32 = 0;`. When parsing the
			
 
				+`abstract` modifier, parse could do single-token lookahead to see `var`, and
			
 
				+error in the parse (`abstract var` is never valid). Instead, we save the
			
 
				+modifier and diagnose it during check.
			
 
				+
			
 
				+The problem is that code isn't always this simple. Considering the above
			
 
				+example, there could be other modifiers, such as
			
 
				+`abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a
			
 
				+general solution. Some modifiers are also contextually valid; for example,
			
 
				+`abstract fn` is only valid inside an `abstract class` scope. As a consequence,
			
 
				+a form of either arbitrary lookahead or additional context would be necessary in
			
 
				+parse in order to reliably diagnose incorrect uses of `abstract`. In contrast
			
 
				+with parse, check will have that additional context.
			
 
				+
			
 
				+Rejecting incorrect code during parsing can also have negative consequences for
			
 
				+diagnostics. The additional information that check has about semantics may
			
 
				+produce better diagnostics. Alternately, sometimes check will produce
			
 
				+diagnostics equivalent to what parse could, but with less work overall.
			
 
				+
			
 
				+As a consequence, at times we will defer to the check phase to produce
			
 
				+diagnostics instead of trying to produce those same diagnostics during parse.
			
 
				+Some examples of why we might diagnose in check instead of parse are:
			
 
				+
			
 
				+-   To issue better diagnostics based on semantic information.
			
 
				+-   To diagnose similar invalid uses in one place, versus partly in check and
			
 
				+    partly in parse.
			
 
				+-   To support syntax highlighting for IDEs in near-correct code, still being
			
 
				+    typed.
			
 
				+
			
 
				+Some examples of why we might diagnose in parse are:
			
 
				+
			
 
				+-   When it's important to distinguish between multiple possible syntaxes.
			
 
				+-   When permitting the syntax would require more work than rejecting it.
			
 
				+
			
 
				+A few examples of parse designs to avoid are:
			
 
				+
			
 
				+-   Using arbitrary lookahead.
			
 
				+    -   Looking ahead one or two tokens is okay. However, we should never have
			
 
				+        arbitrary lookahead.
			
 
				+    -   This includes approaches which would require using the mapping of
			
 
				+        opening brackets to closing brackets that is produced by
			
 
				+        `TokenizedBuffer`. Those are helpful for error recovery.
			
 
				+-   Building complex context.
			
 
				+    -   We want parsing to be faster and lighter weight than check.
			
 
				+-   Duplicating diagnostics between parse and check.
			
 
				+    -   When there are closely related invalid variants of syntax, only some of
			
 
				+        which can be diagnosed during parse, consider diagnosing all variants
			
 
				+        during check.
			
 
				+
			
 
				+This is a balance. We don't want to unnecessarily shift costs from parse onto
			
 
				+check, and we don't try to allow clearly invalid constructs. Parse still tries
			
 
				+to produce a reasonable parse tree. However, parse leans more towards a
			
 
				+permissive parse, and an error-free parse tree does not mean the code is
			
 
				+grammatically correct.
			
 
				+
			
 
				+### Check
			
 
				+
			
 
				+#### Using a traditional AST representation
			
 
				+
			
 
				+Clang creates an AST as part of compilation. In Carbon, it's something we could
			
 
				+do as a step between parsing and checking, possibly replacing the SemIR. It's
			
 
				+likely that doing so would be simpler, amongst other possible trade-offs.
			
 
				+However, we think the SemIR approach is going to yield higher performance,
			
 
				+enough so that it's the chosen approach.
			
 
				+
			
 
				+### Coalescing generic functions emitted when lowering to LLVM IR
			
 
				+
			
 
				+#### Coalescing in the front-end vs back-end?
			
 
				+
			
 
				+An alternative considered was not doing any coalescing in the front-end and
			
 
				+relying on LLVM to make the analysis and optimization. The current choice was
			
 
				+made based on the expectation that such an
			
 
				+[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
			
 
				+terms of compile-time. The relative cost has not yet been evaluated.
			
 
				+
			
 
				+#### When to do coalescing in the front-end?
			
 
				+
			
 
				+The analysis and coalescing could be done prior to lowering, after
			
 
				+specialization. The advantage of that choice would be avoiding to lower
			
 
				+duplicate LLVM functions and then removing the duplicates. The disadvantage of
			
 
				+that choice would be duplicating much of the lowering logic, currently necessary
			
 
				+to make the equivalence determination.
			
 
				+
			
 
				+#### Compile-time trade-offs
			
 
				+
			
 
				+Not doing any coalescing is also expected to increase the back-end codegen time
			
 
				+more than performing the analysis and deduplication. This can be evaluated in
			
 
				+practice and the feature disabled if found to be too costly.
			
 
				+
			
 
				+#### Coalescing duplicate non-specific functions
			
 
				+
			
 
				+We could coalesce duplicate functions in non-specific cases, similar to lld's
			
 
				+[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
			
 
				+[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
			
 
				+require fingerprinting all instructions in all functions, whereas specific
			
 
				+coalescing can focus on cases that only Carbon's front-end knows about. Carbon
			
 
				+would also be restricted to coalescing functions in a single compilation unit,
			
 
				+which would require replacing function definitions that allow external calls
			
 
				+with a placeholder that calls the coalesced definition. We don't expect
			
 
				+sufficient advantages over existing support.
			
--- a/toolchain/docs/check/README.md
+++ b/toolchain/docs/check/README.md
@@ -37,7 +37,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
				     -   [Value bindings](#value-bindings)
			
 
				 -   [Handling Parse::Tree errors (not yet implemented)](#handling-parsetree-errors-not-yet-implemented)
			
 
				 -   [Alternatives considered](#alternatives-considered)
			
 
				-    -   [Using a traditional AST representation](#using-a-traditional-ast-representation)
			
 
				 
			
 
				 <!-- tocstop -->
			
 
				 
			
@@ -667,10 +666,4 @@ interfere with checking later valid lines in the same function.
 
				 
			
 
				 ## Alternatives considered
			
 
				 
			
 
				-### Using a traditional AST representation
			
 
				-
			
 
				-Clang creates an AST as part of compilation. In Carbon, it's something we could
			
 
				-do as a step between parsing and checking, possibly replacing the SemIR. It's
			
 
				-likely that doing so would be simpler, amongst other possible trade-offs.
			
 
				-However, we think the SemIR approach is going to yield higher performance,
			
 
				-enough so that it's the chosen approach.
			
 
				+-   [Using a traditional AST representation](/proposals/p6716.md#using-a-traditional-ast-representation)
			
--- a/toolchain/docs/coalesce_generic_lowering.md
+++ b/toolchain/docs/coalesce_generic_lowering.md
@@ -17,12 +17,8 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
				     -   [Function fingerprints](#function-fingerprints)
			
 
				     -   [Canonical specific to use](#canonical-specific-to-use)
			
 
				 -   [Algorithm details](#algorithm-details)
			
 
				--   [Alternatives considered](#alternatives-considered)
			
 
				-    -   [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
			
 
				-    -   [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
			
 
				-    -   [Compile-time trade-offs](#compile-time-trade-offs)
			
 
				-    -   [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
			
 
				 -   [Opportunities for further improvement](#opportunities-for-further-improvement)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				 
			
 
				 <!-- tocstop -->
			
 
				 
			
@@ -119,7 +115,7 @@ quadratic pass walking all calls instructions and comparing if the `specific_id`
 
				 information is equivalent. These optimizations are not currently implemented.
			
 
				 
			
 
				 Note that this does not
			
 
				-[coalesce non-specifics](#coalescing-duplicate-non-specific-functions).
			
 
				+[coalesce non-specifics](/proposals/p6716.md#coalescing-duplicate-non-specific-functions).
			
 
				 
			
 
				 ### Canonical specific to use
			
 
				 
			
@@ -233,42 +229,6 @@ CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool {
 
				 }
			
 
				 ```
			
 
				 
			
 
				-## Alternatives considered
			
 
				-
			
 
				-### Coalescing in the front-end vs back-end?
			
 
				-
			
 
				-An alternative considered was not doing any coalescing in the front-end and
			
 
				-relying on LLVM to make the analysis and optimization. The current choice was
			
 
				-made based on the expectation that such an
			
 
				-[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
			
 
				-terms of compile-time. The relative cost has not yet been evaluated.
			
 
				-
			
 
				-### When to do coalescing in the front-end?
			
 
				-
			
 
				-The analysis and coalescing could be done prior to lowering, after
			
 
				-specialization. The advantage of that choice would be avoiding to lower
			
 
				-duplicate LLVM functions and then removing the duplicates. The disadvantage of
			
 
				-that choice would be duplicating much of the lowering logic, currently necessary
			
 
				-to make the equivalence determination.
			
 
				-
			
 
				-### Compile-time trade-offs
			
 
				-
			
 
				-Not doing any coalescing is also expected to increase the back-end codegen time
			
 
				-more than performing the analysis and deduplication. This can be evaluated in
			
 
				-practice and the feature disabled if found to be too costly.
			
 
				-
			
 
				-### Coalescing duplicate non-specific functions
			
 
				-
			
 
				-We could coalesce duplicate functions in non-specific cases, similar to lld's
			
 
				-[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
			
 
				-[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
			
 
				-require fingerprinting all instructions in all functions, whereas specific
			
 
				-coalescing can focus on cases that only Carbon's front-end knows about. Carbon
			
 
				-would also be restricted to coalescing functions in a single compilation unit,
			
 
				-which would require replacing function definitions that allow external calls
			
 
				-with a placeholder that calls the coalesced definition. We don't expect
			
 
				-sufficient advantages over existing support.
			
 
				-
			
 
				 ## Opportunities for further improvement
			
 
				 
			
 
				 The current implemented algorithm can be improved with at least the following:
			
@@ -290,3 +250,10 @@ manner that is translation-unit independent, so this can be used in the mangled
 
				 name, and the same function name emitted. This does not currently occur, as the
			
 
				 two fingerprints use internal SemIR identifiers (`function_id` and `specific_id`
			
 
				 respectively).
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+-   [Coalescing in the front-end vs back-end?](/proposals/p6716.md#coalescing-in-the-front-end-vs-back-end)
			
 
				+-   [When to do coalescing in the front-end?](/proposals/p6716.md#when-to-do-coalescing-in-the-front-end)
			
 
				+-   [Compile-time trade-offs](/proposals/p6716.md#compile-time-trade-offs)
			
 
				+-   [Coalescing duplicate non-specific functions](/proposals/p6716.md#coalescing-duplicate-non-specific-functions)
			
--- a/toolchain/docs/lex.md
+++ b/toolchain/docs/lex.md
@@ -13,7 +13,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
				 -   [Overview](#overview)
			
 
				 -   [Bracket matching](#bracket-matching)
			
 
				 -   [Alternatives considered](#alternatives-considered)
			
 
				-    -   [Bracket matching in parser](#bracket-matching-in-parser)
			
 
				 
			
 
				 <!-- tocstop -->
			
 
				 
			
@@ -33,12 +32,4 @@ indentation, that is not yet implemented.
 
				 
			
 
				 ## Alternatives considered
			
 
				 
			
 
				-### Bracket matching in parser
			
 
				-
			
 
				-Bracket matching could have also been implemented in the parser, with some
			
 
				-awareness of parse state. However, that would shift some of the complexity of
			
 
				-recovery in other error situations, such as where the parser searches for the
			
 
				-next comma in a list. That needs to skip over bracketed ranges. We don't think
			
 
				-the trade-offs would yield a net benefit, so any change in this direction would
			
 
				-need to show concrete improvement, for example better diagnostics for common
			
 
				-issues.
			
 
				+-   [Bracket matching in parser](/proposals/p6716.md#bracket-matching-in-parser)
			
--- a/toolchain/docs/parse.md
+++ b/toolchain/docs/parse.md
@@ -26,7 +26,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
				         -   [Case 3: optional sibling](#case-3-optional-sibling)
			
 
				     -   [Operators](#operators)
			
 
				 -   [Alternatives considered](#alternatives-considered)
			
 
				-    -   [Restrictive parsing](#restrictive-parsing)
			
 
				 
			
 
				 <!-- tocstop -->
			
 
				 
			
@@ -805,62 +804,4 @@ An independent description of our approach:
 
				 
			
 
				 ## Alternatives considered
			
 
				 
			
 
				-### Restrictive parsing
			
 
				-
			
 
				-The toolchain will often parse code that could theoretically be rejected,
			
 
				-instead allowing the check phase to reject incorrect structures.
			
 
				-
			
 
				-For example, consider the code `abstract var x: i32 = 0;`. When parsing the
			
 
				-`abstract` modifier, parse could do single-token lookahead to see `var`, and
			
 
				-error in the parse (`abstract var` is never valid). Instead, we save the
			
 
				-modifier and diagnose it during check.
			
 
				-
			
 
				-The problem is that code isn't always this simple. Considering the above
			
 
				-example, there could be other modifiers, such as
			
 
				-`abstract private returned var x: i32 = 0;`, so single-token lookahead isn't a
			
 
				-general solution. Some modifiers are also contextually valid; for example,
			
 
				-`abstract fn` is only valid inside an `abstract class` scope. As a consequence,
			
 
				-a form of either arbitrary lookahead or additional context would be necessary in
			
 
				-parse in order to reliably diagnose incorrect uses of `abstract`. In contrast
			
 
				-with parse, check will have that additional context.
			
 
				-
			
 
				-Rejecting incorrect code during parsing can also have negative consequences for
			
 
				-diagnostics. The additional information that check has about semantics may
			
 
				-produce better diagnostics. Alternately, sometimes check will produce
			
 
				-diagnostics equivalent to what parse could, but with less work overall.
			
 
				-
			
 
				-As a consequence, at times we will defer to the check phase to produce
			
 
				-diagnostics instead of trying to produce those same diagnostics during parse.
			
 
				-Some examples of why we might diagnose in check instead of parse are:
			
 
				-
			
 
				--   To issue better diagnostics based on semantic information.
			
 
				--   To diagnose similar invalid uses in one place, versus partly in check and
			
 
				-    partly in parse.
			
 
				--   To support syntax highlighting for IDEs in near-correct code, still being
			
 
				-    typed.
			
 
				-
			
 
				-Some examples of why we might diagnose in parse are:
			
 
				-
			
 
				--   When it's important to distinguish between multiple possible syntaxes.
			
 
				--   When permitting the syntax would require more work than rejecting it.
			
 
				-
			
 
				-A few examples of parse designs to avoid are:
			
 
				-
			
 
				--   Using arbitrary lookahead.
			
 
				-    -   Looking ahead one or two tokens is okay. However, we should never have
			
 
				-        arbitrary lookahead.
			
 
				-    -   This includes approaches which would require using the mapping of
			
 
				-        opening brackets to closing brackets that is produced by
			
 
				-        `TokenizedBuffer`. Those are helpful for error recovery.
			
 
				--   Building complex context.
			
 
				-    -   We want parsing to be faster and lighter weight than check.
			
 
				--   Duplicating diagnostics between parse and check.
			
 
				-    -   When there are closely related invalid variants of syntax, only some of
			
 
				-        which can be diagnosed during parse, consider diagnosing all variants
			
 
				-        during check.
			
 
				-
			
 
				-This is a balance. We don't want to unnecessarily shift costs from parse onto
			
 
				-check, and we don't try to allow clearly invalid constructs. Parse still tries
			
 
				-to produce a reasonable parse tree. However, parse leans more towards a
			
 
				-permissive parse, and an error-free parse tree does not mean the code is
			
 
				-grammatically correct.
			
 
				+-   [Restrictive parsing](/proposals/p6716.md#restrictive-parsing)