# Design direction for sum types

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/157)

## Table of contents

<!-- toc -->

## Table of contents

-   [Problem](#problem)
-   [Background](#background)
-   [Proposal](#proposal)
-   [Shareable storage](#shareable-storage)
-   [User-defined pattern matching](#user-defined-pattern-matching)
-   ["Bare" designator syntax](#bare-designator-syntax)
    -   [Alternatives considered](#alternatives-considered)
        -   [Placeholder keywords](#placeholder-keywords)
        -   [Designator types](#designator-types)
-   [Distinguishing pattern and expression semantics](#distinguishing-pattern-and-expression-semantics)
    -   [Alternatives considered](#alternatives-considered-1)
        -   [Separate syntaxes](#separate-syntaxes)
        -   [Disambiguation by fixed priority](#disambiguation-by-fixed-priority)
        -   [Disambiguation based on pattern content](#disambiguation-based-on-pattern-content)
-   [Choice types](#choice-types)
    -   [Alternatives considered](#alternatives-considered-2)
        -   [Different spelling for `choice`](#different-spelling-for-choice)
-   [Alternatives considered](#alternatives-considered-3)
    -   [`choice` types only](#choice-types-only)
    -   [Indexing by type](#indexing-by-type)
    -   [Pattern matching proxies](#pattern-matching-proxies)
    -   [Pattern functions](#pattern-functions)
-   [Rationale](#rationale)

<!-- tocstop -->

## Problem

Many important programming use cases involve values that are most naturally
represented as having one of several alternative forms (called _alternatives_
for short). For example,

-   Optional values, which are pervasive in computing, take the form of either
    values of some underlying type, or a special "not present" value.
-   Functions that cannot throw exceptions often use a return type that can
    represent either a successfully computed result, or some description of how
    the computation failed.
-   Nodes of a parse tree often take different forms depending on the grammar
    production that generated them. For example, a node of a parse tree for
    simple arithmetic expressions might represent either a sum or product
    expression with two child nodes representing the operands, or a
    parenthesized expression with a single child node representing the contents
    of the parentheses.
-   The error codes returned by APIs like POSIX have a fixed set of named
    values.

What unites these use cases is that the set of alternatives is fixed by the API,
it is possible for user code to determine which alternative is present, and
there is little or nothing you can usefully do with such a value without first
making that determination.

Carbon needs to support defining and working with types representing such
values. Following Carbon's principles, these types need to be easy to define,
understand, and use, and they need to be safe -- in ordinary usage, the type
system should ensure that user code cannot accidentally access the wrong
alternative. These types should be writeable as well as readable, and writing
should be type-safe and efficient. In particular, it should be possible to
mutate a single sub-field of an alternative, without having to overwrite the
entire alternative, and without a risk of accidentally doing so when that
alternative is not present.

Furthermore, it needs to be possible for type owners to customize the
representations of these types. For example:

-   Most sum types need a "discriminator" field to indicate which alternative is
    present, but since it typically has very few possible values, it can often
    be packed into padding, or even the low-order bits of a pointer.
-   Other sum types avoid an explicit discriminator, and instead reserve certain
    values to indicate separate alternatives. For example, a typical C-style
    pointer can be thought of as an optional type, with a special null value
    indicating that no pointer is present, because the platform guarantees that
    the null byte pattern is never the representation of a valid pointer.

It must be possible to implement such customizations without changing the type's
API, and hence without altering the static safety guarantees for users of the
type.

## Background

The terminology in this space is quite fragmented and inconsistent. This
proposal will use the term _sum types_ to refer to types of the kind described
in the problem statement. Note that "sum type" is not being proposed as a
specific Carbon feature, or even as a precisely defined term of art; it is
merely an informal way for this proposal to refer to its motivating use cases,
in much the same way that a structs proposal might refer to "value types".

Carbon as currently envisioned is already capable of approximating support for
sum types. In particular, [pattern matching](/docs/design/pattern_matching.md)
gives us a natural way to express querying which alternative is active, and then
performing computations on that active alternative, which as discussed above is
the primary way of interacting with a sum type. For example, a value-or-error
type `Result(T, Error)` could be implemented like so:

```
// Result(T, Error) holds either a successfully-computed value of type T,
// or metadata about a failure during that computation, or a singleton
// "cancelled" state indicating that the computation successfully complied with
// a request to halt before completion.
struct Result(Type:$$ T, Type:$$ Error) {
  // 0 if this represents a value, 1 if this represents an error, 2 if this
  // represents the cancelled state.
  var Int: discriminator;
  var T: value;
  var Error: error;

  fn Success(T: value) -> Result(T, Error) {
    return (.discriminator = 0, .value = value, .error = Error());
  }

  fn Failure(Error: error) -> Result(T, Error) {
    return (.discriminator = 1, .value = T(), .error = error);
  }

  var Result(T, Error):$$ Cancelled =
    (.discriminator = 2, .value = T(), .error = Error());
}
```

A typical usage might look like:

```
fn ParseAsInt(String: s) -> Result(Int, String) {
  var Int: result = 0;
  var auto: it = s.begin();
  while (it != s.end()) {
    if (*it < '0' || *it > '9') {
      return Result(Int, String).Failure("String contains non-digit");
    }
    result += *it - '0';
    result *= 10;
  }
  return Result(Int, String).Success(result);
}

fn GetIntFromUser() -> Int {
  while(True) {
    var String: s = UserPrompt("Please enter a number");
    match (ParseAsInt(s)) {
      case (.discriminator = 0, .value = Int: value, .error = String: _) => {
        return value;
      }
      case (.discriminator = 1, .value = Int: _, .error = String: error) => {
        Display(error);
      }
      case .Cancelled => {
        // We didn't request cancellation, so something is very wrong.
        Terminate();
      }
      default => {
        // Can't happen, because the above cases are exhaustive.
        Assert(False);
      }
    }
  }
}
```

However, this code has several functional deficiencies:

-   `.value` and `.error` must both be live throughout the `Result`'s lifetime,
    even when they are not meaningful. Consequently, `Success` must populate
    `.error` with a default-constructed dummy value (and so it won't work if
    `Error` is not default-constructible), `Failure` must do the same for
    `.value`, and `Cancelled` must do the same for both. Furthermore, `Result`
    is bloated by the fact that the two fields must have separately-allocated
    storage, even though at most one at a time actually stores any data.
-   The implementation details of `Result` are not encapsulated. This makes the
    `Result` API unsafe: nothing prevents client code from accessing `.value`
    even when `.discriminator` is not 0. This also makes the patterns extremely
    verbose.
-   `.discriminator` should never have any value other than 0, 1, or 2, but the
    compiler can't enforce that property when `Result`s are created, or exploit
    it when `Result`s are used. So, for example, the `match` must have a
    `default` case in order for the compiler and other tools to consider it
    exhaustive, even though that default case should never be entered.

It also has a couple of ergonomic problems:

-   The definition of `Result` is largely boilerplate. Conceptually, the only
    information needed to specify this type is the names and parameter types of
    the two factory functions, the name of the static member `Cancelled`, plus
    the fact that every possible value of `Result` is uniquely described by the
    name and parameter values of a call to one of those two functions, or else
    equal to `Cancelled`. Given that information, the compiler could easily
    generate the rest of the struct definition. This generated implementation
    may not always be as efficient as a hand-coded one could be, but in a lot of
    cases that may not matter.
-   The `return` statements in `ParseAsInt` are quite verbose, due to the need
    to explicitly qualify the function calls with `Result(Int, String)`. In
    fact, developers might prefer to avoid having a function call at all,
    especially in the success case, and instead rely on implicit conversions to
    write something like `return result;`.

## Proposal

To summarize, the previous section identified several missing features in
Carbon, which together would enable Carbon to support efficient and ergonomic
sum types:

-   There's no way to manually control the lifetimes of subobjects, or enable
    them to share storage.
-   There's no way for pattern matching to operate through an encapsulation
    boundary.
-   There's no way for a type to specify that a given set of patterns is
    exhaustive.
-   There's no concise way to define a sum type based on the form of its
    alternatives.
-   There's no way to return a specific alternative without restating the return
    type of the function.

I propose supporting sum types by introducing several language features to
supply the missing functionality identified above. These features are largely
separable, although there are some dependencies between them, so their detailed
design will be addressed in future proposals, and the details discussed here
should be considered provisional. This proposal merely establishes the overall
design direction for sum types, in the same way that [#83](p0083.md) established
the overall design direction for the language as a whole.

To support manual lifetime control and storage sharing, I propose introducing at
least one and preferably both of the following:

-   A `Storage` type, which represents a fixed-size buffer of untyped memory and
    provides operations for creating and destroying objects within it.
-   A typed `union` facility, such as the one described in proposal
    [0139](https://github.com/carbon-language/carbon-lang/pull/139).

To support encapsulation in pattern matching, I propose introducing a
`Matchable` interface, which a type can implement in order to specify how it
behaves in pattern matching, including what (if anything) constitutes an
exhaustive set of patterns for that type.

To allow users to concisely specify a sum type when they don't need to
micro-optimize the implementation details, I propose a `choice` syntax that
specifies a sum type purely in terms of its alternatives, which acts as a sugar
syntax for those lower-level features.

To avoid redundant boilerplate in functions that return sum types, I propose
allowing statements of the form `return .X;` and `return .F();`, which are
interpreted as if the function's return type appeared immediately prior to the
`.` character.

Cumulatively, these features will allow `Status` to be defined so that the usage
portions of the above example can be rewritten as follows:

```
fn ParseAsInt(String: s) -> Result(Int, String) {
  var Int: result = 0;
  var auto: it = s.begin();
  while (it != s.end()) {
    if (*it < '0' || *it > '9') {
      return .Failure("String contains non-digit");
    }
    result += *it - '0';
    result *= 10;
  }
  return .Success(result);
}

fn GetIntFromUser() -> Int {
  while(True) {
    var String: s = UserPrompt("Please enter a number");
    match (ParseAsInt(s)) {
      case .Success(var Int: value) => {
        return value;
      }
      case .Failure(var String: error) => {
        Display(error);
      }
      case .Cancelled => {
        // We didn't request cancellation, so something is very wrong.
        Terminate();
      }
    }
  }
}
```

> **Open question:** How will user-defined sum types (and pattern matching in
> general) support name bindings that allow you to mutate the underlying object?

## Shareable storage

This approach to sum types imposes relatively few requirements on the language
features used to implement shareable storage (meaning, storage that can be
inhabited by different objects at different times), and so this proposal doesn't
describe them in much detail. The primary options are typed unions along the
lines of proposal
[0139](https://github.com/carbon-language/carbon-lang/pull/139), and untyped
byte buffers along the lines described below. Typed unions are somewhat safer
and more readable, but less general. For example, they can't support use cases
like implementing a small-object-optimized version of
[`std::any`](https://en.cppreference.com/w/cpp/utility/any), because the set of
possible types is not known in advance.

This proposal takes the position that Carbon must have at least one of these two
features. Whether it should have only untyped byte buffers, only typed unions,
or both, is left as an **open question**, because the answer is orthogonal to
the overall design direction that is the focus of this proposal. Consequently, I
will not further discuss the tradeoffs between the two features. This proposal's
examples focus on untyped byte buffers because they are simpler to describe, and
aren't already covered by another proposal.

Regardless of the form that shareable storage takes, it won't be able to
intrinsically keep track of whether it currently holds any objects, or the types
or offsets of those objects, because that would require it to maintain
additional hidden storage, and a major goal of this design is to give the
developer explicit control of the object representation. Consequently, it is not
safe to copy, move, assign to, or destroy shareable storage unless it is known
not to be inhabited by an object.

This means that in the general case, the compiler will not be able to generate
safe default implementations for any special member functions of types that have
shareable storage members.

For purposes of illustration in this proposal, I will treat
`Storage(SizeT size, SizeT align)` as a library template representing an untyped
buffer of `size` bytes, aligned to `align`. It provides `Create`, `Read`, and
`Destroy` methods which create, access, and destroy an object of a specified
type within the buffer.

Using `Storage`, we can redefine `Status` as follows:

```
struct Result(Type:$$ T, Type:$$ Error) {
  var Int: discriminator;

  var Storage(Max(Sizeof(T), Sizeof(Error)),
              Max(Alignof(T), Alignof(Error))): storage;

  fn Success(T: value) -> Self {
    Self result = (.discriminator = 0);
    result.storage->Create(T, value);
    return result;
  }

  fn Failure(Error: error) -> Self {
    Self result = (.discriminator = 1);
    result.storage->Create(Error, error);
    return result;
  }

  var Self:$$ Cancelled = (.discriminator = 2);

  // Copy, move, assign, destroy, and similar operations need to be defined
  // explicitly, but are omitted for brevity.
}
```

## User-defined pattern matching

As seen in the example above, we want to allow pattern-matching on `Status` to
look like this:

```
match (ParseAsInt(s)) {
  case .Success(var Int: value) => {
    return value;
  }
  case .Failure(var String: error) => {
    Display(error);
  }
  case .Cancelled => {
    // We didn't request cancellation, so something is very wrong.
    Terminate();
  }
}
```

For this to work, `Status` needs to specify two things:

-   The set of all possible alternatives, including their names and parameter
    types, so that the compiler can typecheck the `match` body, identify any
    unreachable `case`s, and determine whether any `case`s are missing.
-   The algorithm that, given a `Status` object, determines which alternative is
    present, and specifies the values of its parameters.

Here's how `Status` can do that under this proposal:

```
struct Result(Type:$$ T, Type:$$ Error) {
  var Int: discriminator;

  var Storage(Max(Sizeof(T), Sizeof(Error)),
              Max(Alignof(T), Alignof(Error))): storage;

  interface MatchContinuation {
    var Type:$$ ReturnType;
    fn Success(T: value) -> ReturnType;
    fn Failure(Error: error) -> ReturnType;
    fn Cancelled() -> ReturnType;
  }

  impl Matchable(MatchContinuation) {
    method (Ptr(Self): this) Match[MatchContinuation:$ Continuation](
        Ptr(Continuation): continuation) -> Continuation.ReturnType {
      match (discriminator) {
        case 0 => {
          return continuation->Success(this->storage.Read(T));
        }
        case 1 => {
          return continuation->Failure(this->storage.Read(Error));
        }
        case 2 => {
          return continuation->Cancelled();
        }
        default => {
          Assert(false);
        }
      }
    }
  }

  // Success() and Failure() factory functions, and the Cancelled static
  // constant, are defined as above.

  // Copy, move, assign, destroy, and similar operations need to be defined
  // explicitly, but are omitted for brevity.
}
```

In this code, `Result` makes itself available for use in pattern matching by
declaring that it implements the `Matchable` interface. `Matchable` takes an
interface argument, `MatchContinuation` in this case, which specifies the set of
possible alternatives by declaring a method for each one. I'll call that
argument the _continuation interface_, for reasons that are about to become
clear.

The `Match` method of the `Matchable` interface. This method takes two
parameters: the value being matched against, and an instance of the continuation
interface, which the compiler generates from the `match` expression being
evaluated, with method bodies that correspond to the bodies of the corresponding
`case`s. Once `Match` has determined which alternative is present, and the
values of its parameters, it invokes the corresponding method of the
continuation object. `Match` is required to invoke exactly one continuation
method, and to do so exactly once.

This proposal assumes that Carbon will have support for defining and
implementing generic interfaces, including interfaces that take interface
parameters, and uses `interface`, `impl`, `method` etc. as **placeholder**
syntax. It can probably be revised to work if interfaces can't be parameterized
that way, or if we don't have a feature like this at all, but it might be
somewhat more awkward.

Notice that the names `Success`, `Failure`, and `Cancelled` are defined twice,
once as factory functions of `Result` and once as methods of
`MatchContinuation`, with the same parameter types in each case. The two
effectively act as inverses of each other: the factory functions compute a
`Result` from their parameters, and the methods are used to report the parameter
values that compute a given `Result`. This mirroring between expression and
pattern syntax is ultimately a design choice by the type author; there is no
language-level requirement that the alternatives correspond to the factory
functions.

This approach can be extended to support non-sum patterns as well. For example,
a type that wants to match a tuple-shaped pattern like `(Int: i, String: s)`
could define a continuation interface like

```
interface MatchContinuation {
  fn operator()(Int: i, String: s);
}
```

A type's continuation interface can also include a function named
`operator default`, which implements the `default` case of the `match`.
Consequently, any `match` on that type will be required to have a `default`
case. This is valuable because it protects the type author's ability to add new
alternatives in the future, without causing build failures in client code.

> **Open question:** Can `Match` actually invoke `operator default`? It might
> sometimes be useful to define a type that can't be matched exhaustively, and
> the mere possibility that the `default` case could actually run might help
> encourage client code to implement it robustly, rather than blindly providing
> something like `Assert(False)`. However, if there's a language-level guarantee
> that the `default` case is unreachable if all other alternatives are handled,
> then we can allow code in the same library as the type to omit the `default`
> case. That's desirable because code in the same library doesn't pose an
> evolutionary risk, and it's often valuable to have a build-time guarantee that
> code within the type's own API explicitly handles every case.

Note that `Match`'s continuation parameter type must be generic rather than
templated (`:$` rather than `:$$`). Template specialization is driven by the
concrete values of the template arguments, but the type of the continuation
parameter may depend on code generation details that aren't yet known when
template specialization takes place. By the same token, we won't support
overloading `Match`, because overload resolution is likewise driven by the
concrete types of the function arguments, which may not be known at that point.

## "Bare" designator syntax

A _designator_ is a token consisting of `.` followed by an identifier. The
canonical use case for designators is member access, as in `foo.bar`, where the
designator applies to the preceding expression. However, we expect Carbon will
also have some use cases for "bare" designators, where there is no preceding
expression. In particular, we expect to use bare designators to initialize named
tuple fields, as in `(.my_int = 42, .my_str = "Foo")`, which produces a tuple
with fields named `my_int` and `my_str`.

I propose to also permit using bare designators to refer to the alternatives of
a sum type, in cases where the sum type is clear from context. In particular, I
propose that in statements of the form `return R.Alt;` or
`return R.Alt(<args>);`, where `R` is the function return type, the `R` can be
omitted. Similarly, I propose that patterns of the form `S.Alt` or
`S.Alt(<subpatterns>)`, where `S` is the type being matched against, the `S` can
be omitted. Note that both of these shorthands are allowed only at top level,
not as subexpressions or subpatterns.

> **Open question:** Can we also permit these shorthands to be nested? This
> would be more consistent and less surprising, but could create ambiguity with
> the use of bare designators in tuple initialization: does `case (.Foo, .Bar)`
> match a tuple of two fields named `Foo` and `Bar`, or does it match a tuple of
> two positional fields, of sum types that respectively have `.Foo` and `.Bar`
> as alternatives? Furthermore, allowing such nested usages may conflict with
> the
> [proposed principle](https://github.com/carbon-language/carbon-lang/pull/103)
> that type information should propagate only from an expression to its
> enclosing context, and not vice-versa.
>
> The issue of type propagation is particularly acute if we want to allow
> nesting of alternatives, such as `case .Foo(.Bar(_))`. The problem there is
> that we are relying on the type of `Foo`'s parameter to tell us the type of
> the argument expression `.Bar(_)`, but if `Foo` is overloaded, we can't
> determine the type of the parameter until the overload is resolved, and to do
> that we first need to know the type of the argument expression. And even if
> `Foo` is not currently overloaded, an overload might be added in the future
> (at least if the sum type has an `operator default`). Adding overloads is a
> canonical example of the kind of software evolution that Carbon is intended to
> allow, so the validity of this code can't depend on whether or not `Foo` is
> overloaded.

### Alternatives considered

#### Placeholder keywords

Rather than allowing code to omit the type altogether, we could allow code to
replace the type with a placeholder keyword, for example `case auto.Foo`. This
would avoid the ambiguity with tuple initialization, but would still mean that
type information is propagating into the expression from its surrounding context
rather than vice-versa (which also means that `auto` may not be an appropriate
spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy,
even if the keyword is very short.

#### Designator types

We could treat each bare designator as essentially defining its own type, which
may then implicitly convert to a suitable sum type. For example, we could think
of `.Some(42)` as having a type `DesignatedTuple("Some", (Int))`, and
`Optional(Int)` would define an implicit conversion from that type. This would
ensure that type information does not propagate into the expression from the
context, but would not resolve the ambiguity with tuple initialization.
Furthermore, this would mean that the names of bare designators do not need to
be declared before they are used, and in fact can't be meaningfully declared at
all.

This could have very surprising consequences. For example, a typo in a line of
code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. Instead,
the problem can't be diagnosed until `x` is used, and even then it will show up
as an overload resolution error rather than a name lookup error. This is
particularly problematic because it is much harder to provide useful, actionable
diagnostics for overload resolution failure than for name lookup failure.
Relatedly, in the case of bare designators, IDEs would not be able to implement
tab-completion using Carbon's name lookup rules, but would effectively have to
invent their own ad hoc name lookup rules.

Note that this approach would work well with the "Indexing by type" alternative
discussed below, with instances of `DesignatedTuple` (or whatever we call it)
acting as tagged wrapper types.

## Distinguishing pattern and expression semantics

As discussed above, we expect a well-behaved sum type to define factory
functions that correspond to each of the alternatives in its continuation
interface. This creates some potential ambiguity about whether a given use of an
alternative name refers to the factory function or the continuation interface.
For example, consider the following code:

```
var Result(Int, String): r = ...;
match (r) {
  case Result(Int, String).Value(0) => ...
```

This could potentially be interpreted two ways:

-   `Result(Int, String).Value(0)` is evaluated as an ordinary function call,
    and the result is compared with `r` to see if they match, presumably using
    the `==` operator.
-   The whole `match` expression is evaluated by invoking
    `Result(Int, String)`'s implementation of `Matchable`, with a continuation
    object whose `.Value(Int)` method compares the `Int` parameter with 0.

This can also be thought of as a name lookup problem: is `.Value` looked up in
`Result(Int, String)`, or in `Result(Int, String)`'s implementation of
`Matchable`?

I propose to leave this choice unspecified, so that the compiler may validly
generate code either way. This gives the compiler more freedom to optimize, and
perhaps more importantly, it helps discourage sum type authors from
intentionally making its factory function behavior inconsistent with its pattern
matching behavior. By extension, I propose treating the shorthand syntax
`case .Value(0)` the same way.

### Alternatives considered

#### Separate syntaxes

We could instead avoid the ambiguity by providing separate syntaxes for the two
semantics. The more straightforward version of this would be to say that
`Result(Int, String).Value(0)` is always interpreted as an ordinary function
call, and patterns like `Result(Int, String).Value(Int: i)` are ill-formed
because they cannot be interpreted as function calls. However, this would
require us to have a syntax for matching alternatives that is disjoint from the
syntax for constructing alternatives. This would be at odds with existing
practice in languages like Rust, Swift, Haskell, and ML, all of which use the
same syntax for constructing and matching alternatives.

Note that it is tempting to use bare designators as the syntax for matching
alternatives, so that `Result(Int, String).Value(0)` is an expression, but
`.Value(0)` is a pattern. However, that is unlikely to be sufficient on its own,
because bare designators rely on type information that may not always be
available, especially in nested patterns. Hence, in order for this approach to
work, we would need to introduce a separate syntax for specifying the type of a
bare designator, such as `.Value(0) as Result(Int, String)`.

Alternatively, we could say that code in a pattern-matching context is always
interpreted as a pattern, even if it could otherwise be interpreted as an
expression. We would then need to introduce a pattern operator for explicitly
evaluating a subpattern as an expression, such as
`is Result(Int, String).Value(0)` or `== Result(Int, String).Value(0)`. However,
this would impose an educational and cognitive burden on users: FAQ entries like
"What's the difference between `case .Foo` and `case is .Foo`" and "How do I
choose between `case .Foo` and `case is .Foo`" seem inevitable, and would
require fairly nuanced answers. It would also add syntactic noise to the use
cases that correspond to C++ `switch` statements, where all of the cases are
fixed values.

#### Disambiguation by fixed priority

We could instead specify that, when `Result(Int, String).Value(0)` appears in a
context where a pattern is expected, the name `.Value` is looked up as both an
ordinary function call and as a use of the `Matchable` interface, and specify
one of the two as the "winner" in the case where both lookups succeed. This is
effectively a variant of the previous option, except that some usages that would
be build errors under that approach would be saved by the fallback
interpretation here. In particular, it would still require us to introduce a
second syntax for the case where the programmer wants the lower-priority of the
two behaviors. Thus, it would carry largely the same drawbacks as the previous
option, with the additional drawback that there wouldn't be a consistent
correspondence between syntax and semantics.

#### Disambiguation based on pattern content

We could instead specify that `Result(Int, String).Value(0)` is always
interpreted as an ordinary function call, but patterns like
`Result(Int, String).Value(Int: i)` are evaluated using the `Matchable`
interface, because no other implementation is possible. However, this would mean
that the name-lookup behavior of a function call depends on code that can be
arbitrarily deeply nested within it, which seems likely to be hostile to both
programmers and tools.

## Choice types

To allow users to define sum types without micromanaging the implementation
details, I propose introducing `choice` as a convenient syntax for defining a
sum type by specifying only the declarations of the set of alternatives. From
that information, the compiler generates an appropriate object representation,
and synthesizes definitions for the alternatives and special member functions.

Our manual implementation of `Result` above doesn't really benefit from having
direct control of the object representation, and doesn't seem to have any
additional API surfaces, so it's well-suited to being defined as a choice type
instead:

```
choice Result(Type:$$ T, Type:$$ Error) {
  Success(T: value),
  Failure(Error: error),
  Cancelled
}
```

The body of a `choice` type definition consists of a comma-separated list of
alternatives. These have the same syntax as a function declaration, but with
`fn` and the return type omitted. If there are no arguments, the parentheses may
also be omitted, as with `Cancelled`. `default` may also be included in the
list, with the same meaning as `operator default` in a continuation interface:
it means that pattern-matching operations on this type must be prepared to
handle alternatives other than those explicitly listed.

The choice type will have a static factory function corresponding to each of the
alternatives with parentheses, and a static data member corresponding to each
alternative with no parentheses. The choice type will also implement the
`Matchable` interface, and its continuation interface will have a method
corresponding to each alternative.

In short, this definition of `Result` as a choice type will have the same
semantics as the earlier definition of it as a struct. It will probably also
have the same implementation, with a discriminator field and a storage buffer
large enough to hold the argument values of the alternatives. Any alternative
parameter types that are incomplete (or have unknown size for any other reason)
will be represented using owning pointers; among other things, this will allow
users to define recursive choice types. The implementation will be hidden, of
course, and the compiler may be able to generate better code, but we will design
this feature to support at least that baseline implementation strategy.

One consequence is that although the alternatives of a choice type can be
overloaded (as in the `Variant` example below), they cannot be templates. More
precisely, the parameter types of a pattern function must be fixed without
knowing the values of any of the arguments. To see why, consider a choice type
like the following, which attempts to emulate `std::any`:

```
choice Any {
  Value[Type:$$ T](T: value)
}
```

The problem is that since `T` could be any type, and a single `Any` object could
hold values of different types throughout its lifetime, `Any` can't be
implemented using a storage buffer within the `Any` object. Instead, the storage
buffer for the `T` object would have to be allocated on the heap, but then the
compiler would need to decide whether to apply a small buffer optimization, and
if so what size threshold to use, etc. Allowing choice types to be implemented
in terms of heap allocation would make their performance far less predictable,
contrary to Carbon's performance goals, and would have little offsetting
benefit: these sorts of types appear to be rare, and when needed they should be
implemented in library code, where the performance tradeoffs are explicit and
under programmer control.

It may be possible to relax this restriction when and if we have a design for
supporting non-fixed-size types, although it's worth noting that even that would
not give us a way for `Any` to support assignment.

Carbon will probably have some mechanism for allowing a struct to have
compiler-generated default implementations of operations such as copy, move,
assignment, hashing, and equality comparison, so long as the struct's members
support those operations. Assuming that mechanism exists, choice types will
support it as well, with the parameter types of the pattern functions taking the
place of the member types. However, there are a couple of special cases:

-   choice types cannot be default constructible, unless we provide a separate
    mechanism for specifying which alternative is the default.
-   choice types can be assignable, regardless of whether the parameter types
    are assignable, because assigning to a choice type always destroys the
    existing alternative, rather than assigning to it.

A future proposal for this mechanism will need to consider whether to require an
explicit opt-in to generate these operations.

**Open question:** Should `choice` provide a way to directly access the
discriminator? Correspondingly, should it provide a way to specify the
discriminator type, and which discriminator values correspond to which
alternatives? These features would enable choice types to support all the same
use cases as C++ `enum`s, and permit zero-overhead conversion between the two at
language boundaries.

### Alternatives considered

#### Different spelling for `choice`

The Rust and Swift counterparts of `choice` are spelled `enum`. I have avoided
this because these types are not really "enumerated types" in the sense of all
values being explicitly enumerated in the code. I chose the spelling `choice`
because "choice type" is one of the only available synonyms for "sum type" that
doesn't have any potentially-misleading associations.

## Alternatives considered

### `choice` types only

Rather than layering `choice` types on top of lower level features, we could
make them a primitive language feature, and simply not provide a way for user
code to customize the representation of sum types. However, this would mean that
users who encounter performance problems with the compiler-generated code for a
`choice` type would have no way to address those problems without rewriting all
code that uses that type. This would be contrary to Carbon's performance and
evolvability goals. Furthermore, the rewritten code would probably be
substantially less readable, and less safe, because it wouldn't be able to use
pattern matching.

### Indexing by type

Rather than requiring each alternative to have a distinct name (or at least a
distinct function signature), we could pursue a design that requires each
alternative to have a distinct type. With this approach, which I'll call
"type-indexed" as opposed to "name-indexed", Carbon sum types would much more
closely resemble C++'s `std::variant`, rather than Swift and Rust's `enum` or
the sum types of various functional programming languages.

Either approach can be emulated in terms of the other. For example, we don't yet
have enough of a design for variadics to give an example of a Carbon counterpart
for `std::variant<Ts...>`, but a variant with exactly three alternative types
could be written like so:

```
choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) {
  Value(T1: value),
  Value(T2: value),
  Value(T3: value)
}
```

Conversely a type-indexed type like `std::variant` can model a name-indexed type
like `Result(T,E)` by introducing a wrapper type for each name, leading to
something like `std::variant<Value<T>, Error<E>, Cancelled>` (note that
`std::variant<T,E>` would not work, because `T` and `E` can be the same type).
In either case, emulating the other model introduces some syntactic overhead:
with name-indexing, `Variant`'s factory functions must be given a name (`Value`)
even though it doesn't really convey any information, and emulating
`Result(T,E)` in terms of type-indexing requires separately defining the tagged
wrapper templates `Value` and `Error`.

The distinction between these two models of sum types seems analogous the
distinction between the tuple and struct models of product types. Tuples and
type-indexed sum types treat the data structurally, in terms of types and
positional indices, but structs and name-indexed sum types require the
components of the data to have names, which contributes to both readability and
type-safety by attaching higher-level semantics to the data.

It is possible that both models of sum types could coexist in Carbon, just as
structs and tuples do. However, that seems unlikely to be a good idea: the
coexistence of tuples and structs is necessitated by the fact that it is quite
difficult to emulate either of them in terms of the other in a type-safe way,
but as we've seen, it's fairly straightforward to emulate either model of sum
types in terms of the other.

Use cases that work best with type-indexing appear to be quite rare, just as use
cases for tuples appear to be quite rare compared to use cases for structs.
Consequently, if Carbon has only one form of sum types, it should probably be
the name-indexed form, as proposed here.

> **Open question:** Should Carbon have a native syntax for pattern matching on
> the dynamic type of an object? If so, should types like `Variant` be able to
> use it, instead of having the `.Value` boilerplate in every pattern? Should
> this mechanism be aware of subtype relationships (so that a subtype pattern is
> a better match than a supertype pattern)? If so, how are those subtype
> relationships defined?

### Pattern matching proxies

As a variant of the previous approach, we could allow types to specify their
pattern-matching behavior in terms of a proxy type that Carbon "natively" knows
how to pattern-match against. In the case of a sum type, this proxy would be a
`choice` type, which means that `choice` needs to be a fundamental part of the
language, rather than syntactic sugar for a sum type struct. Returning once more
to the `Result` example:

```
struct Result(Type:$$ T, Type:$$ Error) {
  // Data members, factories, and special members same as above

  choice Choice {
    Success(T: value),
    Failure(Error: error),
    Cancelled
  }

  fn operator match(Ptr(Self): this) -> Choice {
    match (discriminator) {
      case 0 => {
        return .Success(this->storage.Read(T));
      }
      case 1 => {
        return .Failure(this->storage.Read(Error));
      }
      case 2 => {
        return .Cancelled;
      }
      default => {
        Assert(False);
      }
    }
  }
}
```

This approach has several advantages:

-   It's somewhat simpler, because it uses return values instead of
    continuation-passing.
-   It will be easier for the compiler to reason about, because of that
    simplicity and the somewhat narrower API surface. This may lead to better
    compiler performance, and better generated code.
-   It could generalize more easily to allow things like user-defined types that
    can match list patterns (if Carbon has those).

However, it also has several drawbacks:

-   It forces us to treat `choice` as a fundamental part of the language: in
    order to implement a sum type, you have to work with an object type whose
    layout and implementation is inherently opaque. This would be a substantial
    departure from C++, and it's difficult to foresee the consequences of that.
    Possibly the closest analogy in C++ is virtual calls, and especially virtual
    base classes, where fundamental operations like `->` and pointer casts can
    involve nontrivial generated code, and some aspects of object layout are
    required to be hidden from the user. However, Carbon seems to be moving away
    from C++ in precisely those ways.
-   Although both approaches carry a risk of confusion due to duplicate names,
    the risk is somewhat greater here: a naive reader might think that, for
    example, `.Success` in the body of `operator match` refers to `Self.Success`
    rather than `Self.Choice.Success`. Relatedly, there's some risk that the
    author may omit the leading `.`, and thereby invoke `Self.Success` instead
    of `Self.Choice.Success`. This will probably fail to build, but the errors
    may be confusing.
-   It has less expressive power, because there's no straightforward way for the
    library type to specify code that should run after pattern matching is
    complete. For example, the callback approach could allow `Match` to create a
    mutable local variable, pass it to the callback by pointer/reference, and
    then write the (possibly modified) contents of the variable back into the
    sum type object.
-   Extending this approach to support in-place mutation during pattern matching
    is likely to require Carbon to support reference _types_, whereas the
    primary proposal would probably only require reference _patterns_, which are
    substantially less problematic. This is a consequence of using a return type
    rather than a set of callback parameters (which are patterns) to define the
    type's pattern matching interface.

I very tentatively recommend the callback approach rather than this one,
primarily because of the last point above: the Carbon type system is likely to
be dramatically simplified if there are no reference types, but I think the
proxy approach will make reference types all but unavoidable.

### Pattern functions

Rather than require user types to define both the pattern-matching logic and the
factory functions, with the expectation that they will be inverses of each
other, we could instead enable them to define the set of alternatives as factory
functions that the compiler can invert automatically. This approach, like the
primary proposal, would consist of several parts.

A _pattern function_ is a function that can be invoked as part of a pattern,
even with arguments that contain placeholders. Pattern functions use the
introducer `pattern` instead of `fn`, and can only contain the sort of code that
could appear directly in a pattern. This lets us define reusable pattern
syntaxes that can do things like encapsulate hidden implementation details of
the object they're matching.

Next, we would introduce the concept of an `alternatives` block, which groups
together a set of factory functions and designates them as a set of
alternatives. They are required to be both exhaustive and unambiguous, meaning
that for any possible value of the type, it must be possible to obtain it as the
return value of exactly one of the alternatives. Alternatives that take no
arguments, which represent singleton states such as `Cancelled`, can instead be
written as static constants. An `alternatives` block can be marked `closed`,
which plays the same role as omitting `operator default` in the primary
proposal: it indicates that client code need not be prepared to accept
alternatives other than the ones currently present.

As with the primary proposal, `Storage` is used to represent a span of memory
that the user can create objects within. However, with this approach we also
need it to support initialization from a `ToStorage` factory function, because
pattern functions can't contain procedural code. Note that for the same reason,
`ToStorage` will probably need to be a language intrinsic, or implemented in
terms of one.

Using these features, the `Result(T, Error)` example can be written as follows:

```
struct Result(Type:$$ T, Type:$$ Error) {
  var Int: discriminator;

  var Storage(Max(Sizeof(T), Sizeof(Error)),
              Max(Alignof(T), Alignof(Error))): storage;

  closed alternatives {
    pattern Success(T: value) -> Self {
      return (.discriminator = 0, .storage = ToStorage(value));
    }

    pattern Failure(Error: error) -> Self {
      return (.discriminator = 1, .storage = ToStorage(error));
    }

    var Self:$$ Cancelled = (.discriminator = 2);
  }
}
```

Notice that with this approach, we do not need to define the special member
functions of `Result` manually. The compiler can infer appropriate definitions
in the same way that it infers how to invert these functions during pattern
matching.

As with the primary proposal, `choice` would be available as syntactic sugar,
but its syntax would mirror the syntax of an `alternatives` block:

```
closed choice Result(Type:$$ T, Type:$$ Error) {
  pattern Success(T: value) -> Self;
  pattern Failure(Error: error) -> Self;
  var Self:$$ Cancelled;
}
```

This ensures that a sum type's API is defined using essentially the same syntax,
regardless of how the type author chooses to implement it.

This approach is described in much more detail in an
[earlier draft](https://github.com/carbon-language/carbon-lang/blob/4dbd31d71e02895892f97a211df4b5fff8cae5c3/proposals/p0157.md)
of this document, where it was the primary proposal. It has a number of
advantages over the primary proposal:

-   Manually-defined sum types could probably be implemented with dramatically
    less code, because both the special member functions and the code that
    implements pattern matching can be generated automatically. This would not
    only make these types less tedious to implement, it would probably also
    reduce the risk of bugs, and avoid readability problems arising from the
    duplication of names between factory functions and continuation interface
    methods.
-   Programmers would be able to define functions that encapsulate complex
    pattern-matching logic behind simple interfaces.
-   Pattern matching would not require the special overload resolution rules
    that are needed to translate a pattern-matching operation into a `Match`
    call.
-   User-defined sum types would default to being open, whereas they default to
    being closed under the primary proposal. This is probably a better default,
    because an open sum type can always be closed, but a closed sum type can't
    be opened (much less extended with new alternatives) without the risk of
    breaking user code.

However, this approach also carries some substantial drawbacks:

-   Types can't use the full power of the Carbon language when defining their
    pattern-matching behavior, or the corresponding factory functions. Instead,
    they are restricted to a very narrow subset of the language that is valid in
    both patterns and procedural code. This forces us to introduce intrinsics
    like `ToStorage` to make that subset even minimally usable, and creates a
    substantial risk that some user-defined sum types just won't be able to
    support pattern matching.
-   The language rules are substantially more complicated and harder to explain,
    because we need to define the language subset that is usable in pattern
    functions, and define both "forward" and "reverse" semantics for it.
    Relatedly, it means that during pattern matching, there will be no
    straightforward correspondence between the Carbon code and the generated
    assembly, which could substantially complicate things like debugging.
-   Carbon's pattern language can't allow you to write _underconstrained_
    patterns, which are patterns where the values of all bindings aren't
    sufficient to uniquely determine the state of the object being matched. This
    rules out things like the `|` pattern operator, and even prevents us from
    using pattern matching with types that have non-salient state, like the
    capacity of a `std::vector`.

These issues, especially the overall complexity of this approach, leads me to
recommend against adopting it.

## Rationale

This proposal gives us solid ground on which to continue developing features
that rely on sum types and pattern matching thereof. The approach taken seems
plausible, and while there's a good chance that we will revise it significantly
before we finish our first pass over the complete Carbon design (for example, by
switching to pattern matching proxies or by supporting mutation of matched
elements), we don't think it will anchor us too much on this one particular
direction.

With regard to Carbon's goals:

-   Performance: while the approach taken herein potentially has some
    performance cost for a common operation that is likely to appear in
    performance-critical code (requiring an indirect call and the generation of
    continuations for user-defined pattern matching), such cost should be
    practically removable by inlining. We'll need to take care to ensure this
    abstraction penalty is reliably removed in common cases, but this seems
    sufficiently feasible to be worth attempting.
-   Evolution: software evolution is supported by allowing user-defined pattern
    matching to specify (by way of the presence/absence of operator default)
    whether the set of patterns is intended to be extended in the future.
-   Ergonomics: custom pattern matching for user-defined types promotes language
    consistency and removes boilerplate

Note: At the decision meeting, it was stated that geoffromer will update the
proposal to add a rationale to address austern's questions about the layered
approach.