Richard Smith 181a592b8c Support for parsing expression patterns (#6977) 1 ماه پیش
..
check 181a592b8c Support for parsing expression patterns (#6977) 1 ماه پیش
design 611aba3cc2 Clang IRGen in Carbon (#6641) 2 ماه پیش
README.md 611aba3cc2 Clang IRGen in Carbon (#6641) 2 ماه پیش
adding_features.md e2f451dc9c Update adding features (#6719) 2 ماه پیش
coalesce_generic_lowering.md 13d5fe9eed Move toolchain alternatives to proposals (#6716) 2 ماه پیش
debugging.md d8adcf93f5 Consolidate debugging documentation. (#6596) 3 ماه پیش
diagnostics.md 628b6c8a73 Inject IntAsSelect into example diagnostic (#6718) 2 ماه پیش
driver.md a24816a1f4 Move toolchain architecture to markdown (#4242) 1 سال پیش
idioms.md a65f4b89e2 Make ValueStore require a ValueT parameter (#5757) 10 ماه پیش
lex.md 13d5fe9eed Move toolchain alternatives to proposals (#6716) 2 ماه پیش
lower.md f53f837125 Remove ReturnTypeInfo (#6619) 3 ماه پیش
parse.md 13d5fe9eed Move toolchain alternatives to proposals (#6716) 2 ماه پیش

README.md

Toolchain architecture

Table of contents

Goals

The toolchain represents the production portion of Carbon. At a high level, the toolchain's top priorities are:

  • Correctness.
  • Quality of generated code, including performance.
  • Compilation performance.
  • Quality of diagnostics for incorrect or questionable code.

TODO: Add an expanded document that details the goals and priorities and link to it here.

High-level architecture

The main components are:

Design patterns

A few common design patterns are:

  • Distinct steps: Each step of processing produces an output structure, avoiding callbacks passing data between structures.

    • For example, the parser takes a Lex::TokenizedBuffer as input and produces a Parse::Tree as output.

    • Performance: It should yield better locality versus a callback approach.

    • Understandability: Each step has a clear input and output, versus callbacks which obscure the flow of data.

  • Vectorized storage: Data is stored in vectors and flyweights are passed around, avoiding more typical heap allocation with pointers.

    • For example, the parse tree is stored as a llvm::SmallVector<Parse::Tree::NodeImpl> indexed by Parse::Node which wraps an int32_t.

    • Performance: Vectorization both minimizes memory allocation overhead and enables better read caching because adjacent entries will be cached together.

  • Iterative processing: We rely on state stacks and iterative loops for parsing, avoiding recursive function calls.

    • For example, the parser has a Parse::State enum tracked in state_stack_, and loops in Parse::Tree::Parse.

    • Scalability: Complex code must not cause recursion issues. We have experience in Clang seeing stack frame recursion limits being hit in unexpected ways, and non-recursive approaches largely avoid that risk.

See also Idioms for abbreviations and more implementation techniques.

Adding features

We have a walkthrough for adding features.

Design docs

We have design docs.

Videos

Talks

These talks are focused on implementation details of the toolchain, and can be helpful for learning how the toolchain internals work.

2025

Implementation walkthroughs

These are recordings of implementing PRs.

  • PR #4173: Parsing extern library syntax (video)
  • PR #4149: Implementing syntactic merge checks (video)