Просмотр исходного кода

Design direction for sum types (#157)

Directional proposal for supporting sum types in Carbon
Geoff Romer 5 лет назад
Родитель
Сommit
5fade567c8
2 измененных файлов с 1061 добавлено и 0 удалено
  1. 2 0
      proposals/README.md
  2. 1059 0
      proposals/p0157.md

+ 2 - 0
proposals/README.md

@@ -49,6 +49,8 @@ request:
     -   [0143 - Decision](p0143_decision.md)
 -   [0149 - Change documentation style guide](p0149.md)
     -   [0149 - Decision](p0149_decision.md)
+-   [0157 - Design direction for sum types](p0157.md)
+    -   [0157 - Decision](p0157_decision.md)
 -   [0162 - Basic Syntax](p0162.md)
     -   [0162 - Decision](p0162_decision.md)
 -   [0175 - C++ interoperability goals](p0175.md)

+ 1059 - 0
proposals/p0157.md

@@ -0,0 +1,1059 @@
+# Design direction for sum types
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/157)
+
+## Table of contents
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Shareable storage](#shareable-storage)
+-   [User-defined pattern matching](#user-defined-pattern-matching)
+-   ["Bare" designator syntax](#bare-designator-syntax)
+    -   [Alternatives considered](#alternatives-considered)
+        -   [Placeholder keywords](#placeholder-keywords)
+        -   [Designator types](#designator-types)
+-   [Distinguishing pattern and expression semantics](#distinguishing-pattern-and-expression-semantics)
+    -   [Alternatives considered](#alternatives-considered-1)
+        -   [Separate syntaxes](#separate-syntaxes)
+        -   [Disambiguation by fixed priority](#disambiguation-by-fixed-priority)
+        -   [Disambiguation based on pattern content](#disambiguation-based-on-pattern-content)
+-   [Choice types](#choice-types)
+    -   [Alternatives considered](#alternatives-considered-2)
+        -   [Different spelling for `choice`](#different-spelling-for-choice)
+-   [Alternatives considered](#alternatives-considered-3)
+    -   [`choice` types only](#choice-types-only)
+    -   [Indexing by type](#indexing-by-type)
+    -   [Pattern matching proxies](#pattern-matching-proxies)
+    -   [Pattern functions](#pattern-functions)
+
+<!-- tocstop -->
+
+## Problem
+
+Many important programming use cases involve values that are most naturally
+represented as having one of several alternative forms (called _alternatives_
+for short). For example,
+
+-   Optional values, which are pervasive in computing, take the form of either
+    values of some underlying type, or a special "not present" value.
+-   Functions that cannot throw exceptions often use a return type that can
+    represent either a successfully computed result, or some description of how
+    the computation failed.
+-   Nodes of a parse tree often take different forms depending on the grammar
+    production that generated them. For example, a node of a parse tree for
+    simple arithmetic expressions might represent either a sum or product
+    expression with two child nodes representing the operands, or a
+    parenthesized expression with a single child node representing the contents
+    of the parentheses.
+-   The error codes returned by APIs like POSIX have a fixed set of named
+    values.
+
+What unites these use cases is that the set of alternatives is fixed by the API,
+it is possible for user code to determine which alternative is present, and
+there is little or nothing you can usefully do with such a value without first
+making that determination.
+
+Carbon needs to support defining and working with types representing such
+values. Following Carbon's principles, these types need to be easy to define,
+understand, and use, and they need to be safe -- in ordinary usage, the type
+system should ensure that user code cannot accidentally access the wrong
+alternative. These types should be writeable as well as readable, and writing
+should be type-safe and efficient. In particular, it should be possible to
+mutate a single sub-field of an alternative, without having to overwrite the
+entire alternative, and without a risk of accidentally doing so when that
+alternative is not present.
+
+Furthermore, it needs to be possible for type owners to customize the
+representations of these types. For example:
+
+-   Most sum types need a "discriminator" field to indicate which alternative is
+    present, but since it typically has very few possible values, it can often
+    be packed into padding, or even the low-order bits of a pointer.
+-   Other sum types avoid an explicit discriminator, and instead reserve certain
+    values to indicate separate alternatives. For example, a typical C-style
+    pointer can be thought of as an optional type, with a special null value
+    indicating that no pointer is present, because the platform guarantees that
+    the null byte pattern is never the representation of a valid pointer.
+
+It must be possible to implement such customizations without changing the type's
+API, and hence without altering the static safety guarantees for users of the
+type.
+
+## Background
+
+The terminology in this space is quite fragmented and inconsistent. This
+proposal will use the term _sum types_ to refer to types of the kind described
+in the problem statement. Note that "sum type" is not being proposed as a
+specific Carbon feature, or even as a precisely defined term of art; it is
+merely an informal way for this proposal to refer to its motivating use cases,
+in much the same way that a structs proposal might refer to "value types".
+
+Carbon as currently envisioned is already capable of approximating support for
+sum types. In particular, [pattern matching](/docs/design/pattern_matching.html)
+gives us a natural way to express querying which alternative is active, and then
+performing computations on that active alternative, which as discussed above is
+the primary way of interacting with a sum type. For example, a value-or-error
+type `Result(T, Error)` could be implemented like so:
+
+```
+// Result(T, Error) holds either a successfully-computed value of type T,
+// or metadata about a failure during that computation, or a singleton
+// "cancelled" state indicating that the computation successfully complied with
+// a request to halt before completion.
+struct Result(Type:$$ T, Type:$$ Error) {
+  // 0 if this represents a value, 1 if this represents an error, 2 if this
+  // represents the cancelled state.
+  var Int: discriminator;
+  var T: value;
+  var Error: error;
+
+  fn Success(T: value) -> Result(T, Error) {
+    return (.discriminator = 0, .value = value, .error = Error());
+  }
+
+  fn Failure(Error: error) -> Result(T, Error) {
+    return (.discriminator = 1, .value = T(), .error = error);
+  }
+
+  var Result(T, Error):$$ Cancelled =
+    (.discriminator = 2, .value = T(), .error = Error());
+}
+```
+
+A typical usage might look like:
+
+```
+fn ParseAsInt(String: s) -> Result(Int, String) {
+  var Int: result = 0;
+  var auto: it = s.begin();
+  while (it != s.end()) {
+    if (*it < '0' || *it > '9') {
+      return Result(Int, String).Failure("String contains non-digit");
+    }
+    result += *it - '0';
+    result *= 10;
+  }
+  return Result(Int, String).Success(result);
+}
+
+fn GetIntFromUser() -> Int {
+  while(True) {
+    var String: s = UserPrompt("Please enter a number");
+    match (ParseAsInt(s)) {
+      case (.discriminator = 0, .value = Int: value, .error = String: _) => {
+        return value;
+      }
+      case (.discriminator = 1, .value = Int: _, .error = String: error) => {
+        Display(error);
+      }
+      case .Cancelled => {
+        // We didn't request cancellation, so something is very wrong.
+        Terminate();
+      }
+      default => {
+        // Can't happen, because the above cases are exhaustive.
+        Assert(False);
+      }
+    }
+  }
+}
+```
+
+However, this code has several functional deficiencies:
+
+-   `.value` and `.error` must both be live throughout the `Result`'s lifetime,
+    even when they are not meaningful. Consequently, `Success` must populate
+    `.error` with a default-constructed dummy value (and so it won't work if
+    `Error` is not default-constructible), `Failure` must do the same for
+    `.value`, and `Cancelled` must do the same for both. Furthermore, `Result`
+    is bloated by the fact that the two fields must have separately-allocated
+    storage, even though at most one at a time actually stores any data.
+-   The implementation details of `Result` are not encapsulated. This makes the
+    `Result` API unsafe: nothing prevents client code from accessing `.value`
+    even when `.discriminator` is not 0. This also makes the patterns extremely
+    verbose.
+-   `.discriminator` should never have any value other than 0, 1, or 2, but the
+    compiler can't enforce that property when `Result`s are created, or exploit
+    it when `Result`s are used. So, for example, the `match` must have a
+    `default` case in order for the compiler and other tools to consider it
+    exhaustive, even though that default case should never be entered.
+
+It also has a couple of ergonomic problems:
+
+-   The definition of `Result` is largely boilerplate. Conceptually, the only
+    information needed to specify this type is the names and parameter types of
+    the two factory functions, the name of the static member `Cancelled`, plus
+    the fact that every possible value of `Result` is uniquely described by the
+    name and parameter values of a call to one of those two functions, or else
+    equal to `Cancelled`. Given that information, the compiler could easily
+    generate the rest of the struct definition. This generated implementation
+    may not always be as efficient as a hand-coded one could be, but in a lot of
+    cases that may not matter.
+-   The `return` statements in `ParseAsInt` are quite verbose, due to the need
+    to explicitly qualify the function calls with `Result(Int, String)`. In
+    fact, developers might prefer to avoid having a function call at all,
+    especially in the success case, and instead rely on implicit conversions to
+    write something like `return result;`.
+
+## Proposal
+
+To summarize, the previous section identified several missing features in
+Carbon, which together would enable Carbon to support efficient and ergonomic
+sum types:
+
+-   There's no way to manually control the lifetimes of subobjects, or enable
+    them to share storage.
+-   There's no way for pattern matching to operate through an encapsulation
+    boundary.
+-   There's no way for a type to specify that a given set of patterns is
+    exhaustive.
+-   There's no concise way to define a sum type based on the form of its
+    alternatives.
+-   There's no way to return a specific alternative without restating the return
+    type of the function.
+
+I propose supporting sum types by introducing several language features to
+supply the missing functionality identified above. These features are largely
+separable, although there are some dependencies between them, so their detailed
+design will be addressed in future proposals, and the details discussed here
+should be considered provisional. This proposal merely establishes the overall
+design direction for sum types, in the same way that [p0083](p0083.md)
+established the overall design direction for the language as a whole.
+
+To support manual lifetime control and storage sharing, I propose introducing at
+least one and preferably both of the following:
+
+-   A `Storage` type, which represents a fixed-size buffer of untyped memory and
+    provides operations for creating and destroying objects within it.
+-   A typed `union` facility, such as the one described in proposal
+    [0139](https://github.com/carbon-language/carbon-lang/pull/139).
+
+To support encapsulation in pattern matching, I propose introducing a
+`Matchable` interface, which a type can implement in order to specify how it
+behaves in pattern matching, including what (if anything) constitutes an
+exhaustive set of patterns for that type.
+
+To allow users to concisely specify a sum type when they don't need to
+micro-optimize the implementation details, I propose a `choice` syntax that
+specifies a sum type purely in terms of its alternatives, which acts as a sugar
+syntax for those lower-level features.
+
+To avoid redundant boilerplate in functions that return sum types, I propose
+allowing statements of the form `return .X;` and `return .F();`, which are
+interpreted as if the function's return type appeared immediately prior to the
+`.` character.
+
+Cumulatively, these features will allow `Status` to be defined so that the usage
+portions of the above example can be rewritten as follows:
+
+```
+fn ParseAsInt(String: s) -> Result(Int, String) {
+  var Int: result = 0;
+  var auto: it = s.begin();
+  while (it != s.end()) {
+    if (*it < '0' || *it > '9') {
+      return .Failure("String contains non-digit");
+    }
+    result += *it - '0';
+    result *= 10;
+  }
+  return .Success(result);
+}
+
+fn GetIntFromUser() -> Int {
+  while(True) {
+    var String: s = UserPrompt("Please enter a number");
+    match (ParseAsInt(s)) {
+      case .Success(var Int: value) => {
+        return value;
+      }
+      case .Failure(var String: error) => {
+        Display(error);
+      }
+      case .Cancelled => {
+        // We didn't request cancellation, so something is very wrong.
+        Terminate();
+      }
+    }
+  }
+}
+```
+
+> **Open question:** How will user-defined sum types (and pattern matching in
+> general) support name bindings that allow you to mutate the underlying object?
+
+## Shareable storage
+
+This approach to sum types imposes relatively few requirements on the language
+features used to implement shareable storage (meaning, storage that can be
+inhabited by different objects at different times), and so this proposal doesn't
+describe them in much detail. The primary options are typed unions along the
+lines of proposal
+[0139](https://github.com/carbon-language/carbon-lang/pull/139), and untyped
+byte buffers along the lines described below. Typed unions are somewhat safer
+and more readable, but less general. For example, they can't support use cases
+like implementing a small-object-optimized version of
+[`std::any`](https://en.cppreference.com/w/cpp/utility/any), because the set of
+possible types is not known in advance.
+
+This proposal takes the position that Carbon must have at least one of these two
+features. Whether it should have only untyped byte buffers, only typed unions,
+or both, is left as an **open question**, because the answer is orthogonal to
+the overall design direction that is the focus of this proposal. Consequently, I
+will not further discuss the tradeoffs between the two features. This proposal's
+examples focus on untyped byte buffers because they are simpler to describe, and
+aren't already covered by another proposal.
+
+Regardless of the form that shareable storage takes, it won't be able to
+intrinsically keep track of whether it currently holds any objects, or the types
+or offsets of those objects, because that would require it to maintain
+additional hidden storage, and a major goal of this design is to give the
+developer explicit control of the object representation. Consequently, it is not
+safe to copy, move, assign to, or destroy shareable storage unless it is known
+not to be inhabited by an object.
+
+This means that in the general case, the compiler will not be able to generate
+safe default implementations for any special member functions of types that have
+shareable storage members.
+
+For purposes of illustration in this proposal, I will treat
+`Storage(SizeT size, SizeT align)` as a library template representing an untyped
+buffer of `size` bytes, aligned to `align`. It provides `Create`, `Read`, and
+`Destroy` methods which create, access, and destroy an object of a specified
+type within the buffer.
+
+Using `Storage`, we can redefine `Status` as follows:
+
+```
+struct Result(Type:$$ T, Type:$$ Error) {
+  var Int: discriminator;
+
+  var Storage(Max(Sizeof(T), Sizeof(Error)),
+              Max(Alignof(T), Alignof(Error))): storage;
+
+  fn Success(T: value) -> Self {
+    Self result = (.discriminator = 0);
+    result.storage->Create(T, value);
+    return result;
+  }
+
+  fn Failure(Error: error) -> Self {
+    Self result = (.discriminator = 1);
+    result.storage->Create(Error, error);
+    return result;
+  }
+
+  var Self:$$ Cancelled = (.discriminator = 2);
+
+  // Copy, move, assign, destroy, and similar operations need to be defined
+  // explicitly, but are omitted for brevity.
+}
+```
+
+## User-defined pattern matching
+
+As seen in the example above, we want to allow pattern-matching on `Status` to
+look like this:
+
+```
+match (ParseAsInt(s)) {
+  case .Success(var Int: value) => {
+    return value;
+  }
+  case .Failure(var String: error) => {
+    Display(error);
+  }
+  case .Cancelled => {
+    // We didn't request cancellation, so something is very wrong.
+    Terminate();
+  }
+}
+```
+
+For this to work, `Status` needs to specify two things:
+
+-   The set of all possible alternatives, including their names and parameter
+    types, so that the compiler can typecheck the `match` body, identify any
+    unreachable `case`s, and determine whether any `case`s are missing.
+-   The algorithm that, given a `Status` object, determines which alternative is
+    present, and specifies the values of its parameters.
+
+Here's how `Status` can do that under this proposal:
+
+```
+struct Result(Type:$$ T, Type:$$ Error) {
+  var Int: discriminator;
+
+  var Storage(Max(Sizeof(T), Sizeof(Error)),
+              Max(Alignof(T), Alignof(Error))): storage;
+
+  interface MatchContinuation {
+    var Type:$$ ReturnType;
+    fn Success(T: value) -> ReturnType;
+    fn Failure(Error: error) -> ReturnType;
+    fn Cancelled() -> ReturnType;
+  }
+
+  impl Matchable(MatchContinuation) {
+    method (Ptr(Self): this) Match[MatchContinuation:$ Continuation](
+        Ptr(Continuation): continuation) -> Continuation.ReturnType {
+      match (discriminator) {
+        case 0 => {
+          return continuation->Success(this->storage.Read(T));
+        }
+        case 1 => {
+          return continuation->Failure(this->storage.Read(Error));
+        }
+        case 2 => {
+          return continuation->Cancelled();
+        }
+        default => {
+          Assert(false);
+        }
+      }
+    }
+  }
+
+  // Success() and Failure() factory functions, and the Cancelled static
+  // constant, are defined as above.
+
+  // Copy, move, assign, destroy, and similar operations need to be defined
+  // explicitly, but are omitted for brevity.
+}
+```
+
+In this code, `Result` makes itself available for use in pattern matching by
+declaring that it implements the `Matchable` interface. `Matchable` takes an
+interface argument, `MatchContinuation` in this case, which specifies the set of
+possible alternatives by declaring a method for each one. I'll call that
+argument the _continuation interface_, for reasons that are about to become
+clear.
+
+The `Match` method of the `Matchable` interface. This method takes two
+parameters: the value being matched against, and an instance of the continuation
+interface, which the compiler generates from the `match` expression being
+evaluated, with method bodies that correspond to the bodies of the corresponding
+`case`s. Once `Match` has determined which alternative is present, and the
+values of its parameters, it invokes the corresponding method of the
+continuation object. `Match` is required to invoke exactly one continuation
+method, and to do so exactly once.
+
+This proposal assumes that Carbon will have support for defining and
+implementing generic interfaces, including interfaces that take interface
+parameters, and uses `interface`, `impl`, `method` etc. as **placeholder**
+syntax. It can probably be revised to work if interfaces can't be parameterized
+that way, or if we don't have a feature like this at all, but it might be
+somewhat more awkward.
+
+Notice that the names `Success`, `Failure`, and `Cancelled` are defined twice,
+once as factory functions of `Result` and once as methods of
+`MatchContinuation`, with the same parameter types in each case. The two
+effectively act as inverses of each other: the factory functions compute a
+`Result` from their parameters, and the methods are used to report the parameter
+values that compute a given `Result`. This mirroring between expression and
+pattern syntax is ultimately a design choice by the type author; there is no
+language-level requirement that the alternatives correspond to the factory
+functions.
+
+This approach can be extended to support non-sum patterns as well. For example,
+a type that wants to match a tuple-shaped pattern like `(Int: i, String: s)`
+could define a continuation interface like
+
+```
+interface MatchContinuation {
+  fn operator()(Int: i, String: s);
+}
+```
+
+A type's continuation interface can also include a function named
+`operator default`, which implements the `default` case of the `match`.
+Consequently, any `match` on that type will be required to have a `default`
+case. This is valuable because it protects the type author's ability to add new
+alternatives in the future, without causing build failures in client code.
+
+> **Open question:** Can `Match` actually invoke `operator default`? It might
+> sometimes be useful to define a type that can't be matched exhaustively, and
+> the mere possibility that the `default` case could actually run might help
+> encourage client code to implement it robustly, rather than blindly providing
+> something like `Assert(False)`. However, if there's a language-level guarantee
+> that the `default` case is unreachable if all other alternatives are handled,
+> then we can allow code in the same library as the type to omit the `default`
+> case. That's desirable because code in the same library doesn't pose an
+> evolutionary risk, and it's often valuable to have a build-time guarantee that
+> code within the type's own API explicitly handles every case.
+
+Note that `Match`'s continuation parameter type must be generic rather than
+templated (`:$` rather than `:$$`). Template specialization is driven by the
+concrete values of the template arguments, but the type of the continuation
+parameter may depend on code generation details that aren't yet known when
+template specialization takes place. By the same token, we won't support
+overloading `Match`, because overload resolution is likewise driven by the
+concrete types of the function arguments, which may not be known at that point.
+
+## "Bare" designator syntax
+
+A _designator_ is a token consisting of `.` followed by an identifier. The
+canonical use case for designators is member access, as in `foo.bar`, where the
+designator applies to the preceding expression. However, we expect Carbon will
+also have some use cases for "bare" designators, where there is no preceding
+expression. In particular, we expect to use bare designators to initialize named
+tuple fields, as in `(.my_int = 42, .my_str = "Foo")`, which produces a tuple
+with fields named `my_int` and `my_str`.
+
+I propose to also permit using bare designators to refer to the alternatives of
+a sum type, in cases where the sum type is clear from context. In particular, I
+propose that in statements of the form `return R.Alt;` or
+`return R.Alt(<args>);`, where `R` is the function return type, the `R` can be
+omitted. Similarly, I propose that patterns of the form `S.Alt` or
+`S.Alt(<subpatterns>)`, where `S` is the type being matched against, the `S` can
+be omitted. Note that both of these shorthands are allowed only at top level,
+not as subexpressions or subpatterns.
+
+> **Open question:** Can we also permit these shorthands to be nested? This
+> would be more consistent and less surprising, but could create ambiguity with
+> the use of bare designators in tuple initialization: does `case (.Foo, .Bar)`
+> match a tuple of two fields named `Foo` and `Bar`, or does it match a tuple of
+> two positional fields, of sum types that respectively have `.Foo` and `.Bar`
+> as alternatives? Furthermore, allowing such nested usages may conflict with
+> the
+> [proposed principle](https://github.com/carbon-language/carbon-lang/pull/103)
+> that type information should propagate only from an expression to its
+> enclosing context, and not vice-versa.
+>
+> The issue of type propagation is particularly acute if we want to allow
+> nesting of alternatives, such as `case .Foo(.Bar(_))`. The problem there is
+> that we are relying on the type of `Foo`'s parameter to tell us the type of
+> the argument expression `.Bar(_)`, but if `Foo` is overloaded, we can't
+> determine the type of the parameter until the overload is resolved, and to do
+> that we first need to know the type of the argument expression. And even if
+> `Foo` is not currently overloaded, an overload might be added in the future
+> (at least if the sum type has an `operator default`). Adding overloads is a
+> canonical example of the kind of software evolution that Carbon is intended to
+> allow, so the validity of this code can't depend on whether or not `Foo` is
+> overloaded.
+
+### Alternatives considered
+
+#### Placeholder keywords
+
+Rather than allowing code to omit the type altogether, we could allow code to
+replace the type with a placeholder keyword, for example `case auto.Foo`. This
+would avoid the ambiguity with tuple initialization, but would still mean that
+type information is propagating into the expression from its surrounding context
+rather than vice-versa (which also means that `auto` may not be an appropriate
+spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy,
+even if the keyword is very short.
+
+#### Designator types
+
+We could treat each bare designator as essentially defining its own type, which
+may then implicitly convert to a suitable sum type. For example, we could think
+of `.Some(42)` as having a type `DesignatedTuple("Some", (Int))`, and
+`Optional(Int)` would define an implicit conversion from that type. This would
+ensure that type information does not propagate into the expression from the
+context, but would not resolve the ambiguity with tuple initialization.
+Furthermore, this would mean that the names of bare designators do not need to
+be declared before they are used, and in fact can't be meaningfully declared at
+all.
+
+This could have very surprising consequences. For example, a typo in a line of
+code like `var auto: x = .Sone(42);` cannot be diagnosed at that line. Instead,
+the problem can't be diagnosed until `x` is used, and even then it will show up
+as an overload resolution error rather than a name lookup error. This is
+particularly problematic because it is much harder to provide useful, actionable
+diagnostics for overload resolution failure than for name lookup failure.
+Relatedly, in the case of bare designators, IDEs would not be able to implement
+tab-completion using Carbon's name lookup rules, but would effectively have to
+invent their own ad hoc name lookup rules.
+
+Note that this approach would work well with the "Indexing by type" alternative
+discussed below, with instances of `DesignatedTuple` (or whatever we call it)
+acting as tagged wrapper types.
+
+## Distinguishing pattern and expression semantics
+
+As discussed above, we expect a well-behaved sum type to define factory
+functions that correspond to each of the alternatives in its continuation
+interface. This creates some potential ambiguity about whether a given use of an
+alternative name refers to the factory function or the continuation interface.
+For example, consider the following code:
+
+```
+var Result(Int, String): r = ...;
+match (r) {
+  case Result(Int, String).Value(0) => ...
+```
+
+This could potentially be interpreted two ways:
+
+-   `Result(Int, String).Value(0)` is evaluated as an ordinary function call,
+    and the result is compared with `r` to see if they match, presumably using
+    the `==` operator.
+-   The whole `match` expression is evaluated by invoking
+    `Result(Int, String)`'s implementation of `Matchable`, with a continuation
+    object whose `.Value(Int)` method compares the `Int` parameter with 0.
+
+This can also be thought of as a name lookup problem: is `.Value` looked up in
+`Result(Int, String)`, or in `Result(Int, String)`'s implementation of
+`Matchable`?
+
+I propose to leave this choice unspecified, so that the compiler may validly
+generate code either way. This gives the compiler more freedom to optimize, and
+perhaps more importantly, it helps discourage sum type authors from
+intentionally making its factory function behavior inconsistent with its pattern
+matching behavior. By extension, I propose treating the shorthand syntax
+`case .Value(0)` the same way.
+
+### Alternatives considered
+
+#### Separate syntaxes
+
+We could instead avoid the ambiguity by providing separate syntaxes for the two
+semantics. The more straightforward version of this would be to say that
+`Result(Int, String).Value(0)` is always interpreted as an ordinary function
+call, and patterns like `Result(Int, String).Value(Int: i)` are ill-formed
+because they cannot be interpreted as function calls. However, this would
+require us to have a syntax for matching alternatives that is disjoint from the
+syntax for constructing alternatives. This would be at odds with existing
+practice in languages like Rust, Swift, Haskell, and ML, all of which use the
+same syntax for constructing and matching alternatives.
+
+Note that it is tempting to use bare designators as the syntax for matching
+alternatives, so that `Result(Int, String).Value(0)` is an expression, but
+`.Value(0)` is a pattern. However, that is unlikely to be sufficient on its own,
+because bare designators rely on type information that may not always be
+available, especially in nested patterns. Hence, in order for this approach to
+work, we would need to introduce a separate syntax for specifying the type of a
+bare designator, such as `.Value(0) as Result(Int, String)`.
+
+Alternatively, we could say that code in a pattern-matching context is always
+interpreted as a pattern, even if it could otherwise be interpreted as an
+expression. We would then need to introduce a pattern operator for explicitly
+evaluating a subpattern as an expression, such as
+`is Result(Int, String).Value(0)` or `== Result(Int, String).Value(0)`. However,
+this would impose an educational and cognitive burden on users: FAQ entries like
+"What's the difference between `case .Foo` and `case is .Foo`" and "How do I
+choose between `case .Foo` and `case is .Foo`" seem inevitable, and would
+require fairly nuanced answers. It would also add syntactic noise to the use
+cases that correspond to C++ `switch` statements, where all of the cases are
+fixed values.
+
+#### Disambiguation by fixed priority
+
+We could instead specify that, when `Result(Int, String).Value(0)` appears in a
+context where a pattern is expected, the name `.Value` is looked up as both an
+ordinary function call and as a use of the `Matchable` interface, and specify
+one of the two as the "winner" in the case where both lookups succeed. This is
+effectively a variant of the previous option, except that some usages that would
+be build errors under that approach would be saved by the fallback
+interpretation here. In particular, it would still require us to introduce a
+second syntax for the case where the programmer wants the lower-priority of the
+two behaviors. Thus, it would carry largely the same drawbacks as the previous
+option, with the additional drawback that there wouldn't be a consistent
+correspondence between syntax and semantics.
+
+#### Disambiguation based on pattern content
+
+We could instead specify that `Result(Int, String).Value(0)` is always
+interpreted as an ordinary function call, but patterns like
+`Result(Int, String).Value(Int: i)` are evaluated using the `Matchable`
+interface, because no other implementation is possible. However, this would mean
+that the name-lookup behavior of a function call depends on code that can be
+arbitrarily deeply nested within it, which seems likely to be hostile to both
+programmers and tools.
+
+## Choice types
+
+To allow users to define sum types without micromanaging the implementation
+details, I propose introducing `choice` as a convenient syntax for defining a
+sum type by specifying only the declarations of the set of alternatives. From
+that information, the compiler generates an appropriate object representation,
+and synthesizes definitions for the alternatives and special member functions.
+
+Our manual implementation of `Result` above doesn't really benefit from having
+direct control of the object representation, and doesn't seem to have any
+additional API surfaces, so it's well-suited to being defined as a choice type
+instead:
+
+```
+choice Result(Type:$$ T, Type:$$ Error) {
+  Success(T: value),
+  Failure(Error: error),
+  Cancelled
+}
+```
+
+The body of a `choice` type definition consists of a comma-separated list of
+alternatives. These have the same syntax as a function declaration, but with
+`fn` and the return type omitted. If there are no arguments, the parentheses may
+also be omitted, as with `Cancelled`. `default` may also be included in the
+list, with the same meaning as `operator default` in a continuation interface:
+it means that pattern-matching operations on this type must be prepared to
+handle alternatives other than those explicitly listed.
+
+The choice type will have a static factory function corresponding to each of the
+alternatives with parentheses, and a static data member corresponding to each
+alternative with no parentheses. The choice type will also implement the
+`Matchable` interface, and its continuation interface will have a method
+corresponding to each alternative.
+
+In short, this definition of `Result` as a choice type will have the same
+semantics as the earlier definition of it as a struct. It will probably also
+have the same implementation, with a discriminator field and a storage buffer
+large enough to hold the argument values of the alternatives. Any alternative
+parameter types that are incomplete (or have unknown size for any other reason)
+will be represented using owning pointers; among other things, this will allow
+users to define recursive choice types. The implementation will be hidden, of
+course, and the compiler may be able to generate better code, but we will design
+this feature to support at least that baseline implementation strategy.
+
+One consequence is that although the alternatives of a choice type can be
+overloaded (as in the `Variant` example below), they cannot be templates. More
+precisely, the parameter types of a pattern function must be fixed without
+knowing the values of any of the arguments. To see why, consider a choice type
+like the following, which attempts to emulate `std::any`:
+
+```
+choice Any {
+  Value[Type:$$ T](T: value)
+}
+```
+
+The problem is that since `T` could be any type, and a single `Any` object could
+hold values of different types throughout its lifetime, `Any` can't be
+implemented using a storage buffer within the `Any` object. Instead, the storage
+buffer for the `T` object would have to be allocated on the heap, but then the
+compiler would need to decide whether to apply a small buffer optimization, and
+if so what size threshold to use, etc. Allowing choice types to be implemented
+in terms of heap allocation would make their performance far less predictable,
+contrary to Carbon's performance goals, and would have little offsetting
+benefit: these sorts of types appear to be rare, and when needed they should be
+implemented in library code, where the performance tradeoffs are explicit and
+under programmer control.
+
+It may be possible to relax this restriction when and if we have a design for
+supporting non-fixed-size types, although it's worth noting that even that would
+not give us a way for `Any` to support assignment.
+
+Carbon will probably have some mechanism for allowing a struct to have
+compiler-generated default implementations of operations such as copy, move,
+assignment, hashing, and equality comparison, so long as the struct's members
+support those operations. Assuming that mechanism exists, choice types will
+support it as well, with the parameter types of the pattern functions taking the
+place of the member types. However, there are a couple of special cases:
+
+-   choice types cannot be default constructible, unless we provide a separate
+    mechanism for specifying which alternative is the default.
+-   choice types can be assignable, regardless of whether the parameter types
+    are assignable, because assigning to a choice type always destroys the
+    existing alternative, rather than assigning to it.
+
+A future proposal for this mechanism will need to consider whether to require an
+explicit opt-in to generate these operations.
+
+**Open question:** Should `choice` provide a way to directly access the
+discriminator? Correspondingly, should it provide a way to specify the
+discriminator type, and which discriminator values correspond to which
+alternatives? These features would enable choice types to support all the same
+use cases as C++ `enum`s, and permit zero-overhead conversion between the two at
+language boundaries.
+
+### Alternatives considered
+
+#### Different spelling for `choice`
+
+The Rust and Swift counterparts of `choice` are spelled `enum`. I have avoided
+this because these types are not really "enumerated types" in the sense of all
+values being explicitly enumerated in the code. I chose the spelling `choice`
+because "choice type" is one of the only available synonyms for "sum type" that
+doesn't have any potentially-misleading associations.
+
+## Alternatives considered
+
+### `choice` types only
+
+Rather than layering `choice` types on top of lower level features, we could
+make them a primitive language feature, and simply not provide a way for user
+code to customize the representation of sum types. However, this would mean that
+users who encounter performance problems with the compiler-generated code for a
+`choice` type would have no way to address those problems without rewriting all
+code that uses that type. This would be contrary to Carbon's performance and
+evolvability goals. Furthermore, the rewritten code would probably be
+substantially less readable, and less safe, because it wouldn't be able to use
+pattern matching.
+
+### Indexing by type
+
+Rather than requiring each alternative to have a distinct name (or at least a
+distinct function signature), we could pursue a design that requires each
+alternative to have a distinct type. With this approach, which I'll call
+"type-indexed" as opposed to "name-indexed", Carbon sum types would much more
+closely resemble C++'s `std::variant`, rather than Swift and Rust's `enum` or
+the sum types of various functional programming languages.
+
+Either approach can be emulated in terms of the other. For example, we don't yet
+have enough of a design for variadics to give an example of a Carbon counterpart
+for `std::variant<Ts...>`, but a variant with exactly three alternative types
+could be written like so:
+
+```
+choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) {
+  Value(T1: value),
+  Value(T2: value),
+  Value(T3: value)
+}
+```
+
+Conversely a type-indexed type like `std::variant` can model a name-indexed type
+like `Result(T,E)` by introducing a wrapper type for each name, leading to
+something like `std::variant<Value<T>, Error<E>, Cancelled>` (note that
+`std::variant<T,E>` would not work, because `T` and `E` can be the same type).
+In either case, emulating the other model introduces some syntactic overhead:
+with name-indexing, `Variant`'s factory functions must be given a name (`Value`)
+even though it doesn't really convey any information, and emulating
+`Result(T,E)` in terms of type-indexing requires separately defining the tagged
+wrapper templates `Value` and `Error`.
+
+The distinction between these two models of sum types seems analogous the
+distinction between the tuple and struct models of product types. Tuples and
+type-indexed sum types treat the data structurally, in terms of types and
+positional indices, but structs and name-indexed sum types require the
+components of the data to have names, which contributes to both readability and
+type-safety by attaching higher-level semantics to the data.
+
+It is possible that both models of sum types could coexist in Carbon, just as
+structs and tuples do. However, that seems unlikely to be a good idea: the
+coexistence of tuples and structs is necessitated by the fact that it is quite
+difficult to emulate either of them in terms of the other in a type-safe way,
+but as we've seen, it's fairly straightforward to emulate either model of sum
+types in terms of the other.
+
+Use cases that work best with type-indexing appear to be quite rare, just as use
+cases for tuples appear to be quite rare compared to use cases for structs.
+Consequently, if Carbon has only one form of sum types, it should probably be
+the name-indexed form, as proposed here.
+
+> **Open question:** Should Carbon have a native syntax for pattern matching on
+> the dynamic type of an object? If so, should types like `Variant` be able to
+> use it, instead of having the `.Value` boilerplate in every pattern? Should
+> this mechanism be aware of subtype relationships (so that a subtype pattern is
+> a better match than a supertype pattern)? If so, how are those subtype
+> relationships defined?
+
+### Pattern matching proxies
+
+As a variant of the previous approach, we could allow types to specify their
+pattern-matching behavior in terms of a proxy type that Carbon "natively" knows
+how to pattern-match against. In the case of a sum type, this proxy would be a
+`choice` type, which means that `choice` needs to be a fundamental part of the
+language, rather than syntactic sugar for a sum type struct. Returning once more
+to the `Result` example:
+
+```
+struct Result(Type:$$ T, Type:$$ Error) {
+  // Data members, factories, and special members same as above
+
+  choice Choice {
+    Success(T: value),
+    Failure(Error: error),
+    Cancelled
+  }
+
+  fn operator match(Ptr(Self): this) -> Choice {
+    match (discriminator) {
+      case 0 => {
+        return .Success(this->storage.Read(T));
+      }
+      case 1 => {
+        return .Failure(this->storage.Read(Error));
+      }
+      case 2 => {
+        return .Cancelled;
+      }
+      default => {
+        Assert(False);
+      }
+    }
+  }
+}
+```
+
+This approach has several advantages:
+
+-   It's somewhat simpler, because it uses return values instead of
+    continution-passing.
+-   It will be easier for the compiler to reason about, because of that
+    simplicity and the somewhat narrower API surface. This may lead to better
+    compiler performance, and better generated code.
+-   It could generalize more easily to allow things like user-defined types that
+    can match list patterns (if Carbon has those).
+
+However, it also has several drawbacks:
+
+-   It forces us to treat `choice` as a fundamental part of the language: in
+    order to implement a sum type, you have to work with an object type whose
+    layout and implementation is inherently opaque. This would be a substantial
+    departure from C++, and it's difficult to foresee the consequences of that.
+    Possibly the closest analogy in C++ is virtual calls, and especially virtual
+    base classes, where fundamental operations like `->` and pointer casts can
+    involve nontrivial generated code, and some aspects of object layout are
+    required to be hidden from the user. However, Carbon seems to be moving away
+    from C++ in precisely those ways.
+-   Although both approaches carry a risk of confusion due to duplicate names,
+    the risk is somewhat greater here: a naive reader might think that, for
+    example, `.Success` in the body of `operator match` refers to `Self.Success`
+    rather than `Self.Choice.Success`. Relatedly, there's some risk that the
+    author may omit the leading `.`, and thereby invoke `Self.Success` instead
+    of `Self.Choice.Success`. This will probably fail to build, but the errors
+    may be confusing.
+-   It has less expressive power, because there's no straightforward way for the
+    library type to specify code that should run after pattern matching is
+    complete. For example, the callback approach could allow `Match` to create a
+    mutable local variable, pass it to the callback by pointer/reference, and
+    then write the (possibly modified) contents of the variable back into the
+    sum type object.
+-   Extending this approach to support in-place mutation during pattern matching
+    is likely to require Carbon to support reference _types_, whereas the
+    primary proposal would probably only require reference _patterns_, which are
+    substantially less problematic. This is a consequence of using a return type
+    rather than a set of callback parameters (which are patterns) to define the
+    type's pattern matching interface.
+
+I very tentatively recommend the callback approach rather than this one,
+primarily because of the last point above: the Carbon type system is likely to
+be dramatically simplified if there are no reference types, but I think the
+proxy approach will make reference types all but unavoidable.
+
+### Pattern functions
+
+Rather than require user types to define both the pattern-matching logic and the
+factory functions, with the expectation that they will be inverses of each
+other, we could instead enable them to define the set of alternatives as factory
+functions that the compiler can invert automatically. This approach, like the
+primary proposal, would consist of several parts.
+
+A _pattern function_ is a function that can be invoked as part of a pattern,
+even with arguments that contain placeholders. Pattern functions use the
+introducer `pattern` instead of `fn`, and can only contain the sort of code that
+could appear directly in a pattern. This lets us define reusable pattern
+syntaxes that can do things like encapsulate hidden implementation details of
+the object they're matching.
+
+Next, we would introduce the concept of an `alternatives` block, which groups
+together a set of factory functions and designates them as a set of
+alternatives. They are required to be both exhaustive and unambiguous, meaning
+that for any possible value of the type, it must be possible to obtain it as the
+return value of exactly one of the alternatives. Alternatives that take no
+arguments, which represent singleton states such as `Cancelled`, can instead be
+written as static constants. An `alternatives` block can be marked `closed`,
+which plays the same role as omitting `operator default` in the primary
+proposal: it indicates that client code need not be prepared to accept
+alternatives other than the ones currently present.
+
+As with the primary proposal, `Storage` is used to represent a span of memory
+that the user can create objects within. However, with this approach we also
+need it to support initialization from a `ToStorage` factory function, because
+pattern functions can't contain procedural code. Note that for the same reason,
+`ToStorage` will probably need to be a language intrinsic, or implemented in
+terms of one.
+
+Using these features, the `Result(T, Error)` example can be written as follows:
+
+```
+struct Result(Type:$$ T, Type:$$ Error) {
+  var Int: discriminator;
+
+  var Storage(Max(Sizeof(T), Sizeof(Error)),
+              Max(Alignof(T), Alignof(Error))): storage;
+
+  closed alternatives {
+    pattern Success(T: value) -> Self {
+      return (.discriminator = 0, .storage = ToStorage(value));
+    }
+
+    pattern Failure(Error: error) -> Self {
+      return (.discriminator = 1, .storage = ToStorage(error));
+    }
+
+    var Self:$$ Cancelled = (.discriminator = 2);
+  }
+}
+```
+
+Notice that with this approach, we do not need to define the special member
+functions of `Result` manually. The compiler can infer appropriate definitions
+in the same way that it infers how to invert these functions during pattern
+matching.
+
+As with the primary proposal, `choice` would be available as syntactic sugar,
+but its syntax would mirror the syntax of an `alternatives` block:
+
+```
+closed choice Result(Type:$$ T, Type:$$ Error) {
+  pattern Success(T: value) -> Self;
+  pattern Failure(Error: error) -> Self;
+  var Self:$$ Cancelled;
+}
+```
+
+This ensures that a sum type's API is defined using essentially the same syntax,
+regardless of how the type author chooses to implement it.
+
+This approach is described in much more detail in an
+[earlier draft](https://github.com/carbon-language/carbon-lang/blob/4dbd31d71e02895892f97a211df4b5fff8cae5c3/proposals/p0157.md)
+of this document, where it was the primary proposal. It has a number of
+advantages over the primary proposal:
+
+-   Manually-defined sum types could probably be implemented with dramatically
+    less code, because both the special member functions and the code that
+    implements pattern matching can be generated automatically. This would not
+    only make these types less tedious to implement, it would probably also
+    reduce the risk of bugs, and avoid readability problems arising from the
+    duplication of names between factory functions and continuation interface
+    methods.
+-   Programmers would be able to define functions that encapsulate complex
+    pattern-matching logic behind simple interfaces.
+-   Pattern matching would not require the special overload resolution rules
+    that are needed to translate a pattern-matching operation into a `Match`
+    call.
+-   User-defined sum types would default to being open, whereas they default to
+    being closed under the primary proposal. This is probably a better default,
+    because an open sum type can always be closed, but a closed sum type can't
+    be opened (much less extended with new alternatives) without the risk of
+    breaking user code.
+
+However, this approach also carries some substantial drawbacks:
+
+-   Types can't use the full power of the Carbon language when defining their
+    pattern-matching behavior, or the corresponding factory functions. Instead,
+    they are restricted to a very narrow subset of the language that is valid in
+    both patterns and procedural code. This forces us to introduce intrinsics
+    like `ToStorage` to make that subset even minimally usable, and creates a
+    substantial risk that some user-defined sum types just won't be able to
+    support pattern matching.
+-   The language rules are substantially more complicated and harder to explain,
+    because we need to define the language subset that is usable in pattern
+    functions, and define both "forward" and "reverse" semantics for it.
+    Relatedly, it means that during pattern matching, there will be no
+    straightforward correspondence between the Carbon code and the generated
+    assembly, which could substantially complicate things like debugging.
+-   Carbon's pattern language can't allow you to write _underconstrained_
+    patterns, which are patterns where the values of all bindings aren't
+    sufficient to uniquely determine the state of the object being matched. This
+    rules out things like the `|` pattern operator, and even prevents us from
+    using pattern matching with types that have non-salient state, like the
+    capacity of a `std::vector`.
+
+These issues, especially the overall complexity of this approach, leads me to
+recommend against adopting it.