Jon Ross-Perkins 93ecd70827 Writeup for why we don't do deeply restrictive parsing (#5129) 1 год назад
..
check d71b84438f Add a writeup for how associated constants are processed. (#4856) 1 год назад
README.md d71b84438f Add a writeup for how associated constants are processed. (#4856) 1 год назад
adding_features.md d58b523a5e Add `INCLUDE-FILE:` and --custom-core for file tests to specify a minimal prelude library (#5080) 1 год назад
check.svg a24816a1f4 Move toolchain architecture to markdown (#4242) 1 год назад
diagnostics.md e09bf82d36 Update links to the DiagnosticConsumers (#4580) 1 год назад
driver.md a24816a1f4 Move toolchain architecture to markdown (#4242) 1 год назад
idioms.md 1cba3328f7 Finish removing BuiltinInstKind (#4637) 1 год назад
lex.md a24816a1f4 Move toolchain architecture to markdown (#4242) 1 год назад
lower.md 580e84513c Brief documentation for the current name mangling scheme (#4286) 1 год назад
parse.md 93ecd70827 Writeup for why we don't do deeply restrictive parsing (#5129) 1 год назад
parse.svg a24816a1f4 Move toolchain architecture to markdown (#4242) 1 год назад

README.md

Toolchain architecture

Table of contents

Goals

The toolchain represents the production portion of Carbon. At a high level, the toolchain's top priorities are:

  • Correctness.
  • Quality of generated code, including performance.
  • Compilation performance.
  • Quality of diagnostics for incorrect or questionable code.

TODO: Add an expanded document that details the goals and priorities and link to it here.

High-level architecture

The main components are:

Design patterns

A few common design patterns are:

  • Distinct steps: Each step of processing produces an output structure, avoiding callbacks passing data between structures.

    • For example, the parser takes a Lex::TokenizedBuffer as input and produces a Parse::Tree as output.

    • Performance: It should yield better locality versus a callback approach.

    • Understandability: Each step has a clear input and output, versus callbacks which obscure the flow of data.

  • Vectorized storage: Data is stored in vectors and flyweights are passed around, avoiding more typical heap allocation with pointers.

    • For example, the parse tree is stored as a llvm::SmallVector<Parse::Tree::NodeImpl> indexed by Parse::Node which wraps an int32_t.

    • Performance: Vectorization both minimizes memory allocation overhead and enables better read caching because adjacent entries will be cached together.

  • Iterative processing: We rely on state stacks and iterative loops for parsing, avoiding recursive function calls.

    • For example, the parser has a Parse::State enum tracked in state_stack_, and loops in Parse::Tree::Parse.

    • Scalability: Complex code must not cause recursion issues. We have experience in Clang seeing stack frame recursion limits being hit in unexpected ways, and non-recursive approaches largely avoid that risk.

See also Idioms for abbreviations and more implementation techniques.

Adding features

We have a walkthrough for adding features.