3 سال پیش · 6accdfff77
--- a/toolchain/README.md
+++ b/toolchain/README.md
@@ -6,198 +6,6 @@ Exceptions. See /LICENSE for license information.
 
				 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				 -->
			
 
				 
			
 
				-The toolchain represents the production portion of Carbon. At a high level, the
			
 
				-toolchain's top priorities are:
			
 
				-
			
 
				--   Correctness.
			
 
				--   Quality of generated code, including its performance.
			
 
				--   Compilation performance.
			
 
				--   Quality of diagnostics for incorrect or questionable code.
			
 
				-
			
 
				-TODO: Add an expanded document that fully explains the goals and priorities and
			
 
				-link to it here.
			
 
				-
			
 
				-The compiler is organized into a collection of libraries that can be used
			
 
				-independently. This includes the `//toolchain/driver` libraries that orchestrate
			
 
				-the typical and expected compilation flow using the other libraries. The driver
			
 
				-also includes the primary command-line tool: `//toolchain/driver:carbon`.
			
 
				-
			
 
				-The typical compilation flow of data is:
			
 
				-
			
 
				-1. Load the file into a [SourceBuffer](source/source_buffer.h).
			
 
				-2. Lex a `SourceBuffer` into a [TokenizedBuffer](lexer/tokenized_buffer.h).
			
 
				-3. Parse a `TokenizedBuffer` into a [ParseTree](parser/parse_tree.h).
			
 
				-4. Transform a `ParseTree` into a [SemanticsIR](semantics/semantics_ir.h).
			
 
				-5. This flow is still incomplete: code generation, using LLVM, is still
			
 
				-   required.
			
 
				-
			
 
				-## Lexing
			
 
				-
			
 
				-The [TokenizedBuffer](lexer/tokenized_buffer.h) is the central point of lexing.
			
 
				-
			
 
				-The entire source buffer is converted into tokens before parsing begins. Tokens
			
 
				-are referred to by an opaque handle, `TokenizedBuffer::Token`, which is
			
 
				-represented as a dense integer index into the buffer. The tokenized buffer can
			
 
				-be queried to discover information about a token, such as its token kind, its
			
 
				-location in the source file, and its spelling.
			
 
				-
			
 
				-The lexer ensures that all forms of brackets are matched, and is intended to
			
 
				-recover from missing brackets based on contextual cues such as indentation
			
 
				-(although this is not yet implemented), inserting matching close bracket tokens
			
 
				-where it thinks they belong. After the lexer completes, every opening bracket
			
 
				-token has a matching closing bracket token.
			
 
				-
			
 
				-## Parsing
			
 
				-
			
 
				-The [ParseTree](parser/parse_tree.h) is the output of parsing, but most logic is
			
 
				-in [Parser](parser/parser.h).
			
 
				-
			
 
				-The parse tree faithfully represents the tree structure of the source program,
			
 
				-interpreted according to the Carbon grammar. No semantics are associated with
			
 
				-the tree structure at this level, and no name lookup is performed.
			
 
				-
			
 
				-Each parse tree node has an expected structure, corresponding to the grammar of
			
 
				-the Carbon language, and the parser ensures that a valid parse tree node always
			
 
				-has a valid structure. However, any parse tree node can be marked as invalid,
			
 
				-and an invalid parse tree node can contain child nodes of any kind in any order.
			
 
				-This is intended to model the situation where parsing failed because the code
			
 
				-did not match the grammar, but we were still able to parse some subexpressions,
			
 
				-as an aid for non-compiler tools such as syntax highlighters or refactoring
			
 
				-tools.
			
 
				-
			
 
				-The produced `ParseTree` is in postorder. For example, given the code:
			
 
				-
			
 
				-```carbon
			
 
				-fn foo() -> f64 {
			
 
				-  return 42;
			
 
				-}
			
 
				-```
			
 
				-
			
 
				-The node order is (with indentation to indicate nesting):
			
 
				-
			
 
				-```
			
 
				-    {node_index: 0, kind: 'FunctionIntroducer', text: 'fn'}
			
 
				-    {node_index: 1, kind: 'DeclaredName', text: 'foo'}
			
 
				-      {node_index: 2, kind: 'ParameterListEnd', text: ')'}
			
 
				-    {node_index: 3, kind: 'ParameterList', text: '(', subtree_size: 2}
			
 
				-      {node_index: 4, kind: 'Literal', text: 'f64'}
			
 
				-    {node_index: 5, kind: 'ReturnType', text: '->', subtree_size: 2}
			
 
				-  {node_index: 6, kind: 'FunctionDefinitionStart', text: '{', subtree_size: 7}
			
 
				-    {node_index: 7, kind: 'Literal', text: '42'}
			
 
				-    {node_index: 8, kind: 'StatementEnd', text: ';'}
			
 
				-  {node_index: 9, kind: 'ReturnStatement', text: 'return', subtree_size: 3}
			
 
				-{node_index: 10, kind: 'FunctionDefinition', text: '}', subtree_size: 11}
			
 
				-{node_index: 11, kind: 'FileEnd', text: ''}
			
 
				-```
			
 
				-
			
 
				-This ordering is focused on efficient translation into the SemanticsIR.
			
 
				-Non-template code should be type-checked as soon as nodes are encountered,
			
 
				-decreasing SemanticsIR mutations.
			
 
				-
			
 
				-While sometimes the beginning of the grammatical construct will be the parent,
			
 
				-where introducer keywords are used, it will often be the _end_ of the
			
 
				-grammatical construct that is the parent: this is so that a postorder traversal
			
 
				-of the tree can see the kind of grammatical construct being built first, and
			
 
				-handle child nodes taking that into account.
			
 
				-
			
 
				-TODO: Document flow.
			
 
				-
			
 
				-## Semantics
			
 
				-
			
 
				-The [SemanticsIR](semantics/semantics_ir.h) is the output of semantic
			
 
				-processing.
			
 
				-
			
 
				-The intent is that a `SemanticsIR` looks closer to a series of instructions than
			
 
				-a tree. This is in order to better align with the LLVM IR structure which will
			
 
				-be used for code generation.
			
 
				-
			
 
				-This phase should eventually include semantic checking of the SemanticsIR, but
			
 
				-it's a work in progress.
			
 
				-
			
 
				-## Diagnostics
			
 
				-
			
 
				-### DiagnosticEmitter
			
 
				-
			
 
				-[DiagnosticEmitters](diagnostics/diagnostic_emitter.h) handle the main
			
 
				-formatting of a message. It's parameterized on a location type, for which a
			
 
				-`DiagnosticLocationTranslator` must be provided that can translate the location
			
 
				-type into a standardized `DiagnosticLocation` of file, line, and column.
			
 
				-
			
 
				-When emitting, the resulting formatted message is passed to a
			
 
				-`DiagnosticConsumer`.
			
 
				-
			
 
				-### DiagnosticConsumers
			
 
				-
			
 
				-`DiagnosticConsumers` handle output of diagnostic messages after they've been
			
 
				-formatted by an `Emitter`. Important consumers are:
			
 
				-
			
 
				--   [ConsoleDiagnosticConsumer](diagnostics/diagnostic_emitter.h): prints
			
 
				-    diagnostics to console.
			
 
				--   [ErrorTrackingDiagnosticConsumer](diagnostics/diagnostic_emitter.h): counts
			
 
				-    the number of errors produced, particularly so that it can be determined
			
 
				-    whether any errors were encountered.
			
 
				--   [SortingDiagnosticConsumer](diagnostics/sorting_diagnostic_consumer.h):
			
 
				-    sorts diagnostics by line so that diagnostics are seen in terminal based on
			
 
				-    their order in the file rather than the order they were produced.
			
 
				--   [NullDiagnosticConsumer](diagnostics/null_diagnostics.h): suppresses
			
 
				-    diagnostics, particularly for tests.
			
 
				-
			
 
				-### Producing diagnostics
			
 
				-
			
 
				-Diagnostics are used to surface issues from compilation. A simple diagnostic
			
 
				-looks like:
			
 
				-
			
 
				-```cpp
			
 
				-CARBON_DIAGNOSTIC(InvalidCode, Error, "Code is invalid");
			
 
				-emitter.Emit(location, InvalidCode);
			
 
				-```
			
 
				-
			
 
				-Here, `CARBON_DIAGNOSTIC` defines a static instance of a diagnostic named
			
 
				-`InvalidCode` with the associated severity (`Error` or `Warning`).
			
 
				-
			
 
				-The `Emit` call produces a single instance of the diagnostic. When emitted,
			
 
				-`"Code is invalid"` will be the message used. The type of `location` depends on
			
 
				-the `DiagnosticEmitter`.
			
 
				-
			
 
				-A diagnostic with an argument looks like:
			
 
				-
			
 
				-```cpp
			
 
				-CARBON_DIAGNOSTIC(InvalidCharacter, Error, "Invalid character `{0}`.", char);
			
 
				-emitter.Emit(location, InvalidCharacter, invalid_char);
			
 
				-```
			
 
				-
			
 
				-Here, the additional `char` argument to `CARBON_DIAGNOSTIC` specifies the type
			
 
				-of an argument to expect for message formatting. The `invalid_char` argument to
			
 
				-`Emit` provides the matching value. It's then passed along with the diagnostic
			
 
				-message format to `llvm::formatv` in order to produce the final diagnostic
			
 
				-message.
			
 
				-
			
 
				-#### Diagnostic registry
			
 
				-
			
 
				-There is a [registry](diagnostics/diagnostic_registry.def) which all diagnostics
			
 
				-must be added to. Each diagnostic has a line like:
			
 
				-
			
 
				-```cpp
			
 
				-CARBON_DIAGNOSTIC_KIND(InvalidCode)
			
 
				-```
			
 
				-
			
 
				-This produces a central enumeration of all diagnostics. The eventual intent is
			
 
				-to require tests for every diagnostic that can be produced, but that isn't
			
 
				-currently implemented.
			
 
				-
			
 
				-#### `CARBON_DIAGNOSTIC` placement
			
 
				-
			
 
				-Idiomatically, `CARBON_DIAGNOSTIC` will be adjacent to the `Emit` call. However,
			
 
				-this is only because many diagnostics can only be produced in one code location.
			
 
				-If they can be produced in multiple locations, they will be at a higher scope so
			
 
				-that multiple `Emit` calls can reference them. When in a function,
			
 
				-`CARBON_DIAGNOSTIC` should be placed as close as possible to the usage so that
			
 
				-it's easier to see the associated output.
			
 
				-
			
 
				-### Diagnostic context
			
 
				-
			
 
				-In the future, we'll want to provide additional context for errors. For example,
			
 
				-if there's a function parameter mismatch, it may be useful to point both at the
			
 
				-caller and function signature compared. However, at present the emitter only
			
 
				-produces errors on one location. This is something that we need to consider
			
 
				-further, and will probably involve further changes to diagnostic handling.
			
 
				+A design is currently maintained in
			
 
				+[Google Drive](https://docs.google.com/document/d/1RRYMm42osyqhI2LyjrjockYCutQ5dOf8Abu50kTrkX0/edit).
			
 
				+It'll be migrated to markdown once we are confident in its stability.