# Toolchain architecture ## Table of contents - [Goals](#goals) - [High-level architecture](#high-level-architecture) - [Design patterns](#design-patterns) - [Adding features](#adding-features) ## Goals The toolchain represents the production portion of Carbon. At a high level, the toolchain's top priorities are: - Correctness. - Quality of generated code, including performance. - Compilation performance. - Quality of diagnostics for incorrect or questionable code. TODO: Add an expanded document that details the goals and priorities and link to it here. ## High-level architecture The main components are: - [Driver](driver.md): Provides commands and ties together compilation flow. - [Diagnostics](diagnostics.md): Produces diagnostic output. - Compilation flow: 1. Source: Load the file into a [SourceBuffer](/toolchain/source/source_buffer.h). 2. [Lex](lex.md): Transform a SourceBuffer into a [Lex::TokenizedBuffer](/toolchain/lex/tokenized_buffer.h). 3. [Parse](parse.md): Transform a TokenizedBuffer into a [Parse::Tree](/toolchain/parse/tree.h). 4. [Check](check): Transform a Tree to produce [SemIR::File](/toolchain/sem_ir/file.h). 5. [Lower](lower.md): Transform the SemIR to an [LLVM Module](https://llvm.org/doxygen/classllvm_1_1Module.html). 6. CodeGen: Transform the LLVM Module into an Object File. ### Design patterns A few common design patterns are: - Distinct steps: Each step of processing produces an output structure, avoiding callbacks passing data between structures. - For example, the parser takes a `Lex::TokenizedBuffer` as input and produces a `Parse::Tree` as output. - Performance: It should yield better locality versus a callback approach. - Understandability: Each step has a clear input and output, versus callbacks which obscure the flow of data. - Vectorized storage: Data is stored in vectors and flyweights are passed around, avoiding more typical heap allocation with pointers. - For example, the parse tree is stored as a `llvm::SmallVector` indexed by `Parse::Node` which wraps an `int32_t`. - Performance: Vectorization both minimizes memory allocation overhead and enables better read caching because adjacent entries will be cached together. - Iterative processing: We rely on state stacks and iterative loops for parsing, avoiding recursive function calls. - For example, the parser has a `Parse::State` enum tracked in `state_stack_`, and loops in `Parse::Tree::Parse`. - Scalability: Complex code must not cause recursion issues. We have experience in Clang seeing stack frame recursion limits being hit in unexpected ways, and non-recursive approaches largely avoid that risk. See also [Idioms](idioms.md) for abbreviations and more implementation techniques. ## Adding features We have a [walkthrough for adding features](adding_features.md).