فهرست منبع

Diagnostic sorting (#6699)

Change `SortingConsumer` from sorting by last processed token
(per-phase) to
additionally allow diagnostics to request sorting by start position
(line and
column) when the last processed token is the same.

Assisted-by: Google Antigravity with Gemini 3 Flash

---------

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Jon Ross-Perkins 2 ماه پیش
والد
کامیت
45b3f47349
2فایلهای تغییر یافته به همراه195 افزوده شده و 26 حذف شده
  1. 126 0
      proposals/p6699.md
  2. 69 26
      toolchain/docs/diagnostics.md

+ 126 - 0
proposals/p6699.md

@@ -0,0 +1,126 @@
+# Diagnostic sorting
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6699)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Don't sort diagnostics](#dont-sort-diagnostics)
+    -   [Sort by line and column](#sort-by-line-and-column)
+    -   [Sort by last processed token](#sort-by-last-processed-token)
+
+<!-- tocstop -->
+
+## Abstract
+
+Change `SortingConsumer` from sorting by last processed token (per-phase) to
+additionally allow diagnostics to request sorting by start position (line and
+column) when the last processed token is the same.
+
+## Problem
+
+Diagnostics in many toolchains are emitted in the order they are discovered
+during code traversal. While this naturally reflects the causal relationship
+between errors (for example, an error in a macro expansion causing subsequent
+parse errors), it can lead to a confusing experience for developers if the
+diagnostics jump back and forth through the file. Conversely, sorting purely by
+source location (line and column) can break causality, presenting a consequence
+before its cause. We need a sorting strategy that feels natural to humans but
+respects the underlying toolchain logic.
+
+## Background
+
+Carbon's processing of code in stages (lex, parse, check) causes diagnostics to
+be produced in that order. In contrast, Clang interleaves parse and check, and
+as a consequence the diagnostics produced are similarly interleaved.
+
+A more detailed overview of Carbon's diagnostic infrastructure can be found in
+[diagnostics.md](/toolchain/docs/diagnostics.md).
+
+## Proposal
+
+In addition to sorting by the last processed token (which `SortingConsumer`
+already does), add a way to sort based on the start position (line and column)
+by request. This is being called "on-scope" because current cases we've
+discussed are scope-related.
+
+See [SortingConsumer](/toolchain/docs/diagnostics.md#sortingconsumer) for more
+documentation.
+
+## Details
+
+This was already implemented in
+[PR #6687](https://github.com/carbon-language/carbon-lang/pull/6687).
+
+## Rationale
+
+This proposal advances Carbon's goal of providing
+[Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
+and creating
+[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+by making the developer experience with error messages more predictable and
+logical. It respects the inherent causal order of the toolchain while tailoring
+the output to human expectations.
+
+## Alternatives considered
+
+### Don't sort diagnostics
+
+We could just print diagnostics in the order they are produced. This would print
+all lex errors, then all parse errors, then all check errors. This would be
+simple, but might be confusing when a parse error at the end of a file comes
+before check errors, and fixing the check errors would fix the parse error.
+
+### Sort by line and column
+
+We could sort diagnostics purely by their line and column. This runs into issues
+with cases such as:
+
+```carbon
+fn F(x: i32, y: i32);
+
+fn G() {
+  F(1 2);
+}
+```
+
+Here, the diagnostic for an invalid parse of `1 2` would appear after the
+diagnostic that `F` expects two arguments, not one. This is confusing because
+the missing comma is the root cause of the incorrect argument count.
+
+### Sort by last processed token
+
+We could sort diagnostics purely by the last token that was processed when the
+diagnostic was emitted. This runs into issues with cases such as:
+
+```carbon
+fn F(x: i32, y: i32) {}
+```
+
+Here, both `x` and `y` would be diagnosed as unused at the `}`. The order would
+be non-deterministic, hindering golden tests.
+
+This could be partially addressed by sorting the diagnostics locally (for
+example, sorting each `unused` diagnostic together), but this is an incomplete
+solution because we may introduce further scope-related checks, particularly
+flow checking (for example, checking if there are provable out-of-bounds
+accesses). These would all have the same last processed tokens. It also would
+likely lead to sorting regardless of whether sorting was requested by the tool
+user, a performance overhead we want to avoid.
+
+We believe sorting by the last processed token is a partial solution, which
+we're building on.

+ 69 - 26
toolchain/docs/diagnostics.md

@@ -11,14 +11,17 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 ## Table of contents
 
 -   [Overview](#overview)
--   [DiagnosticEmitter](#diagnosticemitter)
--   [DiagnosticConsumers](#diagnosticconsumers)
+-   [Emitters](#emitters)
+-   [Consumers](#consumers)
+    -   [SortingConsumer](#sortingconsumer)
 -   [Producing diagnostics](#producing-diagnostics)
 -   [Diagnostic registry](#diagnostic-registry)
 -   [CARBON_DIAGNOSTIC placement](#carbon_diagnostic-placement)
 -   [Diagnostic context](#diagnostic-context)
 -   [Diagnostic parameter types](#diagnostic-parameter-types)
 -   [Diagnostic message style guide](#diagnostic-message-style-guide)
+-   [Alternatives considered](#alternatives-considered)
+-   [References](#references)
 
 <!-- tocstop -->
 
@@ -26,20 +29,18 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
 The diagnostic code is used by the toolchain to produce output.
 
-## DiagnosticEmitter
+## Emitters
 
-[Emitters](/toolchain/diagnostics/emitter.h) handle the main formatting of a
-message. It's parameterized on a location type, for which a
-DiagnosticLocationTranslator must be provided that can translate the location
-type into a standardized DiagnosticLocation of file, line, and column.
+[`Emitter`s](/toolchain/diagnostics/emitter.h) handle the main formatting of a
+message. It's parameterized on a location type, which `ConvertLoc` translates
+into a standardized DiagnosticLocation of file, line, and column.
 
-When emitting, the resulting formatted message is passed to a
-DiagnosticConsumer.
+When emitting, the resulting formatted message is passed to a `Consumer`.
 
-## DiagnosticConsumers
+## Consumers
 
-DiagnosticConsumers handle output of diagnostic messages after they've been
-formatted by an Emitter. Important consumers are:
+`Consumer`s handle output of diagnostic messages after they've been formatted by
+an Emitter. Important consumers are:
 
 -   [ConsoleConsumer](/toolchain/diagnostics/consumer.cpp): prints diagnostics
     to console.
@@ -48,22 +49,45 @@ formatted by an Emitter. Important consumers are:
     number of errors produced, particularly so that it can be determined whether
     any errors were encountered.
 
--   [SortingConsumer](/toolchain/diagnostics/sorting_consumer.h): sorts
-    diagnostics by line so that diagnostics are seen in terminal based on their
-    order in the file rather than the order they were produced.
+-   [SortingConsumer](/toolchain/diagnostics/sorting_consumer.h): buffers and
+    sorts diagnostics to provide a more human-understandable order while
+    maintaining causal consistency.
 
--   [NullConsumer](/toolchain/diagnostics/null_diagnostics.h): suppresses
-    diagnostics, particularly for tests.
+### SortingConsumer
 
-Note that `SortingConsumer` is used by default by `carbon compile`. In cases
-where one error leads to another error at an earlier location, for example if an
-error in a function call argument leads to an error in the function call, this
-can result in confusing diagnostic output where a consequence of the error is
-reported before the cause. Usually this should be handled by tracking that an
-error occurred and suppressing the follow-on diagnostic. During toolchain
-development, it can be useful to disable the sorting so that the diagnostic
-order matches the order in which the file was processed. This can be done using
-`carbon compile –stream-errors`.
+`SortingConsumer` is used by default by `carbon compile`. To see the actual
+emitted order, use `carbon compile --stream-errors`.
+
+The current `SortingConsumer` implementation sorts diagnostics based on the
+`last_byte_offset`, which represents the latest token handled by the phase
+emitting the diagnostic. This maintains the causal order of the toolchain's
+traversal.
+
+We expect cases where multiple diagnostics are emitted at the same offset,
+particularly when they're emitted at the end of a scope. These can have attached
+locations earlier in the scope, such as variable declarations. In these cases,
+we put non-on-scope diagnostics first, and then sort on-scope diagnostics by
+their start position (line and column).
+
+Diagnostic sorting is stable, so that diagnostics from earlier phases are
+printed first if all else is equal.
+
+The sorting approach balances several competing needs:
+
+-   **Causal order**: Developers generally want to fix errors in the order they
+    are printed. If fixing error A could also fix error B, A should be printed
+    first.
+
+-   **Human-understandable order**: A human expects diagnostics to follow the
+    flow of the file. If all parse errors in a file are printed before any
+    semantic check errors, the developer may find it confusing to jump back and
+    forth through the file.
+
+-   **Performance where possible**: On fully correct code with no diagnostics,
+    which is our performance priority, this has negligible overhead. When there
+    are diagnostics, we try to only sort within the `SortingConsumer`. When
+    sorting is not desired (such as tools and IDEs that provide their own
+    ordering), it's easy to disable.
 
 ## Producing diagnostics
 
@@ -94,6 +118,14 @@ of an argument to expect for message formatting. The `invalid_char` argument to
 `Emit` provides the matching value. It's then passed along with the diagnostic
 message format to `llvm::formatv` to produce the final diagnostic message.
 
+An on-scope diagnostic uses `CARBON_DIAGNOSTIC_ON_SCOPE` which is identical
+other than the macro name. For example:
+
+```cpp
+CARBON_DIAGNOSTIC_ON_SCOPE(InvalidInScope, Error, "error inside scope");
+emitter.Emit(location, InvalidInScope);
+```
+
 ## Diagnostic registry
 
 There is a [registry](/toolchain/diagnostics/kind.def) which all diagnostics
@@ -285,3 +317,14 @@ Carbon's diagnostic style aims to balance these concerns. Our style is:
     only allowed for types?
 
 -   TODO: Lots more things to decide, give examples.
+
+## Alternatives considered
+
+-   [Don't sort diagnostics](/proposals/p6699.md#dont-sort-diagnostics)
+-   [Sort by line and column](/proposals/p6699.md#sort-by-line-and-column)
+-   [Sort by last processed token](/proposals/p6699.md#sort-by-last-processed-token)
+
+## References
+
+-   Proposal
+    [#6699: Sort diagnostics](https://github.com/carbon-language/carbon-lang/pull/6699)