2 ماه پیش · 45b3f47349
--- a/proposals/p6699.md
+++ b/proposals/p6699.md
@@ -0,0 +1,126 @@
 
				+# Diagnostic sorting
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6699)
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Abstract](#abstract)
			
 
				+-   [Problem](#problem)
			
 
				+-   [Background](#background)
			
 
				+-   [Proposal](#proposal)
			
 
				+-   [Details](#details)
			
 
				+-   [Rationale](#rationale)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Don't sort diagnostics](#dont-sort-diagnostics)
			
 
				+    -   [Sort by line and column](#sort-by-line-and-column)
			
 
				+    -   [Sort by last processed token](#sort-by-last-processed-token)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Abstract
			
 
				+
			
 
				+Change `SortingConsumer` from sorting by last processed token (per-phase) to
			
 
				+additionally allow diagnostics to request sorting by start position (line and
			
 
				+column) when the last processed token is the same.
			
 
				+
			
 
				+## Problem
			
 
				+
			
 
				+Diagnostics in many toolchains are emitted in the order they are discovered
			
 
				+during code traversal. While this naturally reflects the causal relationship
			
 
				+between errors (for example, an error in a macro expansion causing subsequent
			
 
				+parse errors), it can lead to a confusing experience for developers if the
			
 
				+diagnostics jump back and forth through the file. Conversely, sorting purely by
			
 
				+source location (line and column) can break causality, presenting a consequence
			
 
				+before its cause. We need a sorting strategy that feels natural to humans but
			
 
				+respects the underlying toolchain logic.
			
 
				+
			
 
				+## Background
			
 
				+
			
 
				+Carbon's processing of code in stages (lex, parse, check) causes diagnostics to
			
 
				+be produced in that order. In contrast, Clang interleaves parse and check, and
			
 
				+as a consequence the diagnostics produced are similarly interleaved.
			
 
				+
			
 
				+A more detailed overview of Carbon's diagnostic infrastructure can be found in
			
 
				+[diagnostics.md](/toolchain/docs/diagnostics.md).
			
 
				+
			
 
				+## Proposal
			
 
				+
			
 
				+In addition to sorting by the last processed token (which `SortingConsumer`
			
 
				+already does), add a way to sort based on the start position (line and column)
			
 
				+by request. This is being called "on-scope" because current cases we've
			
 
				+discussed are scope-related.
			
 
				+
			
 
				+See [SortingConsumer](/toolchain/docs/diagnostics.md#sortingconsumer) for more
			
 
				+documentation.
			
 
				+
			
 
				+## Details
			
 
				+
			
 
				+This was already implemented in
			
 
				+[PR #6687](https://github.com/carbon-language/carbon-lang/pull/6687).
			
 
				+
			
 
				+## Rationale
			
 
				+
			
 
				+This proposal advances Carbon's goal of providing
			
 
				+[Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
			
 
				+and creating
			
 
				+[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
			
 
				+by making the developer experience with error messages more predictable and
			
 
				+logical. It respects the inherent causal order of the toolchain while tailoring
			
 
				+the output to human expectations.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Don't sort diagnostics
			
 
				+
			
 
				+We could just print diagnostics in the order they are produced. This would print
			
 
				+all lex errors, then all parse errors, then all check errors. This would be
			
 
				+simple, but might be confusing when a parse error at the end of a file comes
			
 
				+before check errors, and fixing the check errors would fix the parse error.
			
 
				+
			
 
				+### Sort by line and column
			
 
				+
			
 
				+We could sort diagnostics purely by their line and column. This runs into issues
			
 
				+with cases such as:
			
 
				+
			
 
				+```carbon
			
 
				+fn F(x: i32, y: i32);
			
 
				+
			
 
				+fn G() {
			
 
				+  F(1 2);
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Here, the diagnostic for an invalid parse of `1 2` would appear after the
			
 
				+diagnostic that `F` expects two arguments, not one. This is confusing because
			
 
				+the missing comma is the root cause of the incorrect argument count.
			
 
				+
			
 
				+### Sort by last processed token
			
 
				+
			
 
				+We could sort diagnostics purely by the last token that was processed when the
			
 
				+diagnostic was emitted. This runs into issues with cases such as:
			
 
				+
			
 
				+```carbon
			
 
				+fn F(x: i32, y: i32) {}
			
 
				+```
			
 
				+
			
 
				+Here, both `x` and `y` would be diagnosed as unused at the `}`. The order would
			
 
				+be non-deterministic, hindering golden tests.
			
 
				+
			
 
				+This could be partially addressed by sorting the diagnostics locally (for
			
 
				+example, sorting each `unused` diagnostic together), but this is an incomplete
			
 
				+solution because we may introduce further scope-related checks, particularly
			
 
				+flow checking (for example, checking if there are provable out-of-bounds
			
 
				+accesses). These would all have the same last processed tokens. It also would
			
 
				+likely lead to sorting regardless of whether sorting was requested by the tool
			
 
				+user, a performance overhead we want to avoid.
			
 
				+
			
 
				+We believe sorting by the last processed token is a partial solution, which
			
 
				+we're building on.
			
--- a/toolchain/docs/diagnostics.md
+++ b/toolchain/docs/diagnostics.md
@@ -11,14 +11,17 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
				 ## Table of contents
			
 
				 
			
 
				 -   [Overview](#overview)
			
 
				--   [DiagnosticEmitter](#diagnosticemitter)
			
 
				--   [DiagnosticConsumers](#diagnosticconsumers)
			
 
				+-   [Emitters](#emitters)
			
 
				+-   [Consumers](#consumers)
			
 
				+    -   [SortingConsumer](#sortingconsumer)
			
 
				 -   [Producing diagnostics](#producing-diagnostics)
			
 
				 -   [Diagnostic registry](#diagnostic-registry)
			
 
				 -   [CARBON_DIAGNOSTIC placement](#carbon_diagnostic-placement)
			
 
				 -   [Diagnostic context](#diagnostic-context)
			
 
				 -   [Diagnostic parameter types](#diagnostic-parameter-types)
			
 
				 -   [Diagnostic message style guide](#diagnostic-message-style-guide)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+-   [References](#references)
			
 
				 
			
 
				 <!-- tocstop -->
			
 
				 
			
@@ -26,20 +29,18 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
				 
			
 
				 The diagnostic code is used by the toolchain to produce output.
			
 
				 
			
 
				-## DiagnosticEmitter
			
 
				+## Emitters
			
 
				 
			
 
				-[Emitters](/toolchain/diagnostics/emitter.h) handle the main formatting of a
			
 
				-message. It's parameterized on a location type, for which a
			
 
				-DiagnosticLocationTranslator must be provided that can translate the location
			
 
				-type into a standardized DiagnosticLocation of file, line, and column.
			
 
				+[`Emitter`s](/toolchain/diagnostics/emitter.h) handle the main formatting of a
			
 
				+message. It's parameterized on a location type, which `ConvertLoc` translates
			
 
				+into a standardized DiagnosticLocation of file, line, and column.
			
 
				 
			
 
				-When emitting, the resulting formatted message is passed to a
			
 
				-DiagnosticConsumer.
			
 
				+When emitting, the resulting formatted message is passed to a `Consumer`.
			
 
				 
			
 
				-## DiagnosticConsumers
			
 
				+## Consumers
			
 
				 
			
 
				-DiagnosticConsumers handle output of diagnostic messages after they've been
			
 
				-formatted by an Emitter. Important consumers are:
			
 
				+`Consumer`s handle output of diagnostic messages after they've been formatted by
			
 
				+an Emitter. Important consumers are:
			
 
				 
			
 
				 -   [ConsoleConsumer](/toolchain/diagnostics/consumer.cpp): prints diagnostics
			
 
				     to console.
			
@@ -48,22 +49,45 @@ formatted by an Emitter. Important consumers are:
 
				     number of errors produced, particularly so that it can be determined whether
			
 
				     any errors were encountered.
			
 
				 
			
 
				--   [SortingConsumer](/toolchain/diagnostics/sorting_consumer.h): sorts
			
 
				-    diagnostics by line so that diagnostics are seen in terminal based on their
			
 
				-    order in the file rather than the order they were produced.
			
 
				+-   [SortingConsumer](/toolchain/diagnostics/sorting_consumer.h): buffers and
			
 
				+    sorts diagnostics to provide a more human-understandable order while
			
 
				+    maintaining causal consistency.
			
 
				 
			
 
				--   [NullConsumer](/toolchain/diagnostics/null_diagnostics.h): suppresses
			
 
				-    diagnostics, particularly for tests.
			
 
				+### SortingConsumer
			
 
				 
			
 
				-Note that `SortingConsumer` is used by default by `carbon compile`. In cases
			
 
				-where one error leads to another error at an earlier location, for example if an
			
 
				-error in a function call argument leads to an error in the function call, this
			
 
				-can result in confusing diagnostic output where a consequence of the error is
			
 
				-reported before the cause. Usually this should be handled by tracking that an
			
 
				-error occurred and suppressing the follow-on diagnostic. During toolchain
			
 
				-development, it can be useful to disable the sorting so that the diagnostic
			
 
				-order matches the order in which the file was processed. This can be done using
			
 
				-`carbon compile –stream-errors`.
			
 
				+`SortingConsumer` is used by default by `carbon compile`. To see the actual
			
 
				+emitted order, use `carbon compile --stream-errors`.
			
 
				+
			
 
				+The current `SortingConsumer` implementation sorts diagnostics based on the
			
 
				+`last_byte_offset`, which represents the latest token handled by the phase
			
 
				+emitting the diagnostic. This maintains the causal order of the toolchain's
			
 
				+traversal.
			
 
				+
			
 
				+We expect cases where multiple diagnostics are emitted at the same offset,
			
 
				+particularly when they're emitted at the end of a scope. These can have attached
			
 
				+locations earlier in the scope, such as variable declarations. In these cases,
			
 
				+we put non-on-scope diagnostics first, and then sort on-scope diagnostics by
			
 
				+their start position (line and column).
			
 
				+
			
 
				+Diagnostic sorting is stable, so that diagnostics from earlier phases are
			
 
				+printed first if all else is equal.
			
 
				+
			
 
				+The sorting approach balances several competing needs:
			
 
				+
			
 
				+-   **Causal order**: Developers generally want to fix errors in the order they
			
 
				+    are printed. If fixing error A could also fix error B, A should be printed
			
 
				+    first.
			
 
				+
			
 
				+-   **Human-understandable order**: A human expects diagnostics to follow the
			
 
				+    flow of the file. If all parse errors in a file are printed before any
			
 
				+    semantic check errors, the developer may find it confusing to jump back and
			
 
				+    forth through the file.
			
 
				+
			
 
				+-   **Performance where possible**: On fully correct code with no diagnostics,
			
 
				+    which is our performance priority, this has negligible overhead. When there
			
 
				+    are diagnostics, we try to only sort within the `SortingConsumer`. When
			
 
				+    sorting is not desired (such as tools and IDEs that provide their own
			
 
				+    ordering), it's easy to disable.
			
 
				 
			
 
				 ## Producing diagnostics
			
 
				 
			
@@ -94,6 +118,14 @@ of an argument to expect for message formatting. The `invalid_char` argument to
 
				 `Emit` provides the matching value. It's then passed along with the diagnostic
			
 
				 message format to `llvm::formatv` to produce the final diagnostic message.
			
 
				 
			
 
				+An on-scope diagnostic uses `CARBON_DIAGNOSTIC_ON_SCOPE` which is identical
			
 
				+other than the macro name. For example:
			
 
				+
			
 
				+```cpp
			
 
				+CARBON_DIAGNOSTIC_ON_SCOPE(InvalidInScope, Error, "error inside scope");
			
 
				+emitter.Emit(location, InvalidInScope);
			
 
				+```
			
 
				+
			
 
				 ## Diagnostic registry
			
 
				 
			
 
				 There is a [registry](/toolchain/diagnostics/kind.def) which all diagnostics
			
@@ -285,3 +317,14 @@ Carbon's diagnostic style aims to balance these concerns. Our style is:
 
				     only allowed for types?
			
 
				 
			
 
				 -   TODO: Lots more things to decide, give examples.
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+-   [Don't sort diagnostics](/proposals/p6699.md#dont-sort-diagnostics)
			
 
				+-   [Sort by line and column](/proposals/p6699.md#sort-by-line-and-column)
			
 
				+-   [Sort by last processed token](/proposals/p6699.md#sort-by-last-processed-token)
			
 
				+
			
 
				+## References
			
 
				+
			
 
				+-   Proposal
			
 
				+    [#6699: Sort diagnostics](https://github.com/carbon-language/carbon-lang/pull/6699)