# Explorer structured fuzzer

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

## Overview

Fuzz testing is based on generating a large amount of random inputs for a
software component in order to trigger bugs and unexpected behavior. Basic
fuzzing uses randomly generated arrays of bytes as inputs, which works great for
some applications but is problematic for testing the logic that operates on
highly structured data, as most random inputs are immediately rejected as
invalid before any interesting parts of the code get a chance to run.

Structured fuzzing addresses this issue by ensuring the randomly generated data
is itself structured, and as such has a high chance of presenting a valid input.

`explorer_fuzzer` is a structured fuzzer based on
[libprotobuf-mutator](https://github.com/google/libprotobuf-mutator), which is a
library to randomly mutate
[protobuffers](https://github.com/protocolbuffers/protobuf).

The input to the fuzzer is an instance of `Carbon::Fuzzing::Carbon` proto
randomly generated by the `libprotobuf-mutator` framework. `explorer_fuzzer`
converts the proto to a Carbon source code string, and tries to parse and
execute the code using `explorer` implementation.

## Fuzzer data format

`libprotobuf-mutator` supports fuzzer inputs in either text or binary protocol
buffer format. `explorer_fuzzer` uses text proto format with `Carbon` proto
message definition in `common/fuzzing/carbon.proto`.

## Incorporating AST changes into the fuzzer

Fuzzer AST representation in
[carbon.proto](https://github.com/carbon-language/carbon-lang/blob/trunk/common/fuzzing/carbon.proto)
needs to be updated when changes are made to the AST, like adding a new AST node
classes or changing relevant data members of existing nodes.

There are two unit tests which normally should not require direct changes, as
both tests work off of Carbon test files in
[testdata](https://github.com/carbon-language/carbon-lang/tree/trunk/explorer/testdata).

-   [ast_to_proto_test.cpp](https://github.com/carbon-language/carbon-lang/blob/trunk/explorer/fuzzing/ast_to_proto_test.cpp)
    is a 'smoke' test which verifies that each field of Carbon proto is
    populated at least once after converting all of test Carbon files and
    merging the results into a single protocol buffer.

-   [proto_to_carbon_test.cpp](https://github.com/carbon-language/carbon-lang/blob/trunk/explorer/fuzzing/proto_to_carbon_test.cpp)
    uses a 'roundtrip' approach, by converting each parseable Carbon file to a
    proto representation, then back to Carbon source, parsing this source into a
    second instance of an AST, and comparing the second AST with the original
    AST using `AST::Dump()` method. The goal of the test is to ensure that
    `carbon.proto` is able to represent ASTs correctly without information loss.

To incorporate AST changes into fuzzing logic:

1. Add appropriate AST information to
   [carbon.proto](https://github.com/carbon-language/carbon-lang/blob/trunk/common/fuzzing/carbon.proto).
   Use existing similar cases as examples.

1. Add logic to populate the proto to
   [ast_to_proto.cpp](https://github.com/carbon-language/carbon-lang/blob/trunk/explorer/fuzzing/ast_to_proto.cpp).

1. Make sure `ast_to_proto_test` passes with the new changes.

1. Modify
   [proto_to_carbon.cpp](https://github.com/carbon-language/carbon-lang/blob/trunk/common/fuzzing/proto_to_carbon.cpp)
   which handles printing of a Carbon proto instance as a Carbon source string.
   For example, add code to print newly introduced proto fields.

1. Make sure `proto_to_carbon_test` passes after the changes.

## Running the fuzzer

The fuzzer can be run in 'unit test' mode, where the fuzzer executes on each
input file from the `fuzzer_corpus/` folder, or in 'fuzzing' mode, where the
fuzzer will keep generating random inputs and executing the logic on them until
a crash is triggered, or forever in a bug-free program ;).

To run in 'unit test' mode:

```bash
bazel test --config=proto-fuzzer --test_output=all //explorer/fuzzing:explorer_fuzzer
```

To run in 'fuzzing' mode:

```bash
bazel build --config=proto-fuzzer //explorer/fuzzing:explorer_fuzzer

bazel-bin/explorer/fuzzing/explorer_fuzzer
```

It's also possible to run the fuzzer on a single input:

```bash
bazel-bin/explorer/fuzzing/explorer_fuzzer /tmp/crash.textproto
```

## Investigating a crash

To reproduce a crash, run the fuzzer on the crashing input as described above.

A separate tool called `fuzzverter` can be used for things like converting a
crashing input to Carbon source code for running `explorer` on the code
directly.

To convert a `Fuzzing::Carbon` text proto to Carbon source:

```bash
bazel-bin/explorer/fuzzing/fuzzverter --mode proto_to_carbon --input /tmp/crash.textproto
```

## Generating new fuzzer corpus entries

The ability of the fuzzing framework to generate 'interesting' inputs can be
improved by providing 'seed' inputs known as the fuzzer corpus. The inputs need
to be a `Fuzzing::Carbon` text proto.

To generate a text proto from Carbon source:

```bash
bazel-bin/explorer/fuzzing/fuzzverter --mode carbon_to_proto --input /tmp/crash.carbon --output /tmp/crash.textproto
```