Prabhat Sachdeva 3c7bca4115 Comma seperated selection of allowed phases for trace (#2963) 2 lat temu
..
ast 7bcfe7df59 LLVM should be included using quotes. (#2952) 2 lat temu
common d18c1347d7 Migrate compatible uses to TestRawOstream. (#2891) 2 lat temu
data 19c74ead49 Explorer: Add initial initializing expression support for variable declaration (#2907) 2 lat temu
fuzzing da647f579a Explorer: readme: clarify fuzzer debugging (#2905) 2 lat temu
interpreter 798233fe23 Error evaluating symbolic tuple (#2957) 2 lat temu
lit_testdata 3c7bca4115 Comma seperated selection of allowed phases for trace (#2963) 2 lat temu
parse_and_execute 8e940d9724 Migrate //common test libraries to //testing/util. (#2890) 2 lat temu
syntax 1505a634a4 Implement syntax changes from #2760 in Explorer (#2906) 2 lat temu
testdata cb14ce56a8 GitHub merge queue test commit (#2959) 2 lat temu
trace_testdata 2e45dd58f9 Check explorer's full trace output (#2934) 2 lat temu
BUILD 2e45dd58f9 Check explorer's full trace output (#2934) 2 lat temu
README.md 45c185ebf7 Add -trace_phase option that takes list of allowed phases (#2962) 2 lat temu
__init__.py 530541e56c Add boilerplate framework for lowering. (#2607) 3 lat temu
autoupdate_lit_testdata.py e2b1511a0d Clean up lit tests and config to reflect current uses. (#2913) 2 lat temu
autoupdate_testdata.py 735502273b Refactor lit test infra under //testing. (#2829) 3 lat temu
autoupdate_trace_testdata.py 45c185ebf7 Add -trace_phase option that takes list of allowed phases (#2962) 2 lat temu
file_test.cpp 2e45dd58f9 Check explorer's full trace output (#2934) 2 lat temu
main.cpp 3c7bca4115 Comma seperated selection of allowed phases for trace (#2963) 2 lat temu
main.h 0b9bda10b7 Refactor common main logic (#2260) 3 lat temu
main_bin.cpp 0b9bda10b7 Refactor common main logic (#2260) 3 lat temu

README.md

Explorer

explorer is an implementation of Carbon whose primary purpose is to act as a clear specification of the language. As an extension of that goal, it can also be used as a platform for prototyping and validating changes to the language. Consequently, it prioritizes straightforward, readable code over performance, diagnostic quality, and other conventional implementation priorities. In other words, its intended audience is people working on the design of Carbon, and it is not intended for real-world Carbon programming on any scale. See the toolchain directory for a separate implementation that's focused on the needs of Carbon users.

Overview

explorer represents Carbon code using an abstract syntax tree (AST), which is defined in the ast directory. The syntax directory contains lexer and parser, which define how the AST is generated from Carbon code. The interpreter directory contains the remainder of the implementation.

explorer is an interpreter rather than a compiler, although it attempts to separate compile time from run time, since that separation is an important constraint on Carbon's design.

Programming conventions

The class hierarchies in explorer are built to support LLVM-style RTTI, and define a kind accessor that returns an enum identifying the concrete type. explorer typically relies less on virtual dispatch, and more on using kind as the key of a switch and then down-casting in the individual cases. As a result, adding a new derived class to a hierarchy requires updating existing code to handle it. It is generally better to avoid defining default cases for RTTI switches, so that the compiler can help ensure the code is updated when a new type is added.

explorer never uses plain pointer types directly. Instead, we use the Nonnull<T*> alias for pointers that are not nullable, or std::optional<Nonnull<T*>> for pointers that are nullable.

Many of the most commonly-used objects in explorer have lifetimes that are tied to the lifespan of the entire Carbon program. We manage the lifetimes of those objects by allocating them through an Arena object, which can allocate objects of arbitrary types, and retains ownership of them. As of this writing, all of explorer uses a single Arena object, we may introduce multiple Arenas for different lifetime groups in the future.

For simplicity, explorer generally treats all errors as fatal. Errors caused by bugs in the user-provided Carbon code should be reported with the error builders in error_builders.h. Errors caused by bugs in explorer itself should be reported with CHECK or FATAL.

Decompose functions

Many of explorer's data structures provide a Decompose method, which allows simple data types to be generically decomposed into their fields. The Decompose function for a type takes a function and calls it with the fields of that type. For example:

class MyType {
 public:
  MyType(Type1 arg1, Type2 arg2) : arg1_(arg1), arg2_(arg2) {}

  template <typename F>
  auto Decompose(F f) const { return f(arg1_, arg2_); }

 private:
  Type1 arg1_;
  Type2 arg2_;
};

Where possible, a value equivalent to the original value should be created by passing the given arguments to the constructor of the type. For example, my_value.Decompose([](auto ...args) { return MyType(args...); }) should recreate the original value.

Example Programs (Regression Tests)

The testdata/ subdirectory includes some example programs with expected output.

These tests make use of GoogleTest with Bazel's cc_test rules. Tests have boilerplate at the top:

// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// AUTOUPDATE
// CHECK:STDOUT: result: 7

package ExplorerTest api;

fn Main() -> i32 {
  return (1 + 2) + 4;
}

To explain this boilerplate:

  • The standard copyright is expected.
  • The AUTOUPDATE line indicates that CHECK lines matching the output will be automatically inserted immediately below by the ./autoupdate_testdata.py script.
  • The CHECK lines indicate expected output.
    • Where a CHECK line contains text like {{.*}}, the double curly braces indicate a contained regular expression.
  • The package is required in all test files, per normal Carbon syntax rules.

lit tests

The lit_testdata/ subdirectory includes other example programs.

These tests make use of LLVM's lit and FileCheck.

They share most of their header with those in testdata, with an additional RUN rule:

// RUN: %{explorer-run}
// RUN: %{explorer-run-trace}

The RUN lines indicate two commands for lit to execute using the file: one without trace and debug output, one with.

  • RUN: will be followed by the not command when failure is expected. In particular, RUN: not %{explorer-run}.
  • The full command is in lit.cfg.py; it will run explorer and pass results to FileCheck.

Useful commands

  • ./autoupdate_testdata.py -- Updates expected output.
    • This can be combined with git diff to see changes in output.
  • autoupdate_lit_testdata.py -- Updates lit tests expected output.
  • bazel test ... --test_output=errors -- Runs tests and prints any errors.
  • bazel test //explorer:file_test.subset --test_arg=explorer/testdata/DIR/FILE.carbon -- Runs a specific test.
  • bazel run testdata/DIR/FILE.carbon.run -- Runs explorer on the file.
  • bazel run testdata/DIR/FILE.carbon.verbose -- Runs explorer on the file with tracing enabled.

Updating fuzzer logic after making AST changes

Please refer to Fuzzer documentation.

Trace Program Execution

When tracing is turned on (using the --trace_file=... option or .verbose target), explorer prints the state of the program and each step that is performed during execution.

Printing directly to the standard output using the --trace_file option is supported by passing - in place of a filepath (--trace_file=-).

To customize the trace output and include specific information, you can pass the following list of options to -trace_phase=... option along with --trace_file=... option:

  • source_program: Include trace output for the source program phase.
  • name_resolution: Include trace output for the name resolution phase.
  • control_flow_resolution: Include trace output for the control flow resolution phase.
  • type_checking: Include trace output for the type checking phase.
  • unformed_variables_resolution: Include trace output for the unformed variables resolution phase.
  • declarations: Include trace output for printing declarations.
  • execution: Include trace output for program execution.
  • timing: Include timing logs indicating the time taken by each phase.
  • all: Include trace output for all phases.

By default, only execution trace will be added to the trace output. You can use combination of these options to include trace of multiple program phases.

State of the Program

The state of the program is printed in the following format, which consists of two components: (1) a stack of actions and (2) a memory.

{
stack: action1 ## action2 ## ...
memory: 0: valueA, 1: valueB, 2: valueC, ...
}

The memory is a mapping of addresses to values. The memory is used to represent both heap-allocated objects and also mutable parts of the procedure call stack, for example, for local variables. When an address is deallocated, it stays in memory but !! is printed before its value.

The stack is list of actions separated by double pound signs (##). Each action has the format:

syntax .position. [[ results ]] { scope }

which can have up to four parts.

  1. The syntax for the part of the program to be executed such as an expression or statement.
  2. The position of execution (an integer) for this action (each action can take multiple steps to complete).
  3. The results from subexpressions of this part.
  4. The scope is the variables whose lifetimes are associated with this part of the program.

The stack always begins with a function call to Main.

In the special case of a function call, when the function call finishes, the result value appears at the end of the results.

Step of Execution

Each step of execution is printed in the following format:

--- step kind syntax .position. (file-location) --->
  • The syntax is the part of the program being executed.
  • The kind is the syntactic category of the part, such as exp, stmt, or decl.
  • The position says how far along explorer is in executing this action.
  • The file-location gives the filename and line number for the syntax.

Each step of execution can push new actions on the stack, pop actions, increment the position number of an action, and add result values to an action.