Many important programming use cases involve values that are most naturally represented as having one of several alternative forms (called alternatives for short). For example,
What unites these use cases is that the set of alternatives is fixed by the API, it is possible for user code to determine which alternative is present, and there is little or nothing you can usefully do with such a value without first making that determination.
Carbon needs to support defining and working with types representing such values. Following Carbon's principles, these types need to be easy to define, understand, and use, and they need to be safe -- in ordinary usage, the type system should ensure that user code cannot accidentally access the wrong alternative. These types should be writeable as well as readable, and writing should be type-safe and efficient. In particular, it should be possible to mutate a single sub-field of an alternative, without having to overwrite the entire alternative, and without a risk of accidentally doing so when that alternative is not present.
Furthermore, it needs to be possible for type owners to customize the representations of these types. For example:
It must be possible to implement such customizations without changing the type's API, and hence without altering the static safety guarantees for users of the type.
The terminology in this space is quite fragmented and inconsistent. This proposal will use the term sum types to refer to types of the kind described in the problem statement. Note that "sum type" is not being proposed as a specific Carbon feature, or even as a precisely defined term of art; it is merely an informal way for this proposal to refer to its motivating use cases, in much the same way that a structs proposal might refer to "value types".
Carbon as currently envisioned is already capable of approximating support for
sum types. In particular, pattern matching
gives us a natural way to express querying which alternative is active, and then
performing computations on that active alternative, which as discussed above is
the primary way of interacting with a sum type. For example, a value-or-error
type Result(T, Error) could be implemented like so:
// Result(T, Error) holds either a successfully-computed value of type T,
// or metadata about a failure during that computation, or a singleton
// "cancelled" state indicating that the computation successfully complied with
// a request to halt before completion.
struct Result(Type:$$ T, Type:$$ Error) {
// 0 if this represents a value, 1 if this represents an error, 2 if this
// represents the cancelled state.
var Int: discriminator;
var T: value;
var Error: error;
fn Success(T: value) -> Result(T, Error) {
return (.discriminator = 0, .value = value, .error = Error());
}
fn Failure(Error: error) -> Result(T, Error) {
return (.discriminator = 1, .value = T(), .error = error);
}
var Result(T, Error):$$ Cancelled =
(.discriminator = 2, .value = T(), .error = Error());
}
A typical usage might look like:
fn ParseAsInt(String: s) -> Result(Int, String) {
var Int: result = 0;
var auto: it = s.begin();
while (it != s.end()) {
if (*it < '0' || *it > '9') {
return Result(Int, String).Failure("String contains non-digit");
}
result += *it - '0';
result *= 10;
}
return Result(Int, String).Success(result);
}
fn GetIntFromUser() -> Int {
while(True) {
var String: s = UserPrompt("Please enter a number");
match (ParseAsInt(s)) {
case (.discriminator = 0, .value = Int: value, .error = String: _) => {
return value;
}
case (.discriminator = 1, .value = Int: _, .error = String: error) => {
Display(error);
}
case .Cancelled => {
// We didn't request cancellation, so something is very wrong.
Terminate();
}
default => {
// Can't happen, because the above cases are exhaustive.
Assert(False);
}
}
}
}
However, this code has several functional deficiencies:
.value and .error must both be live throughout the Result's lifetime,
even when they are not meaningful. Consequently, Success must populate
.error with a default-constructed dummy value (and so it won't work if
Error is not default-constructible), Failure must do the same for
.value, and Cancelled must do the same for both. Furthermore, Result
is bloated by the fact that the two fields must have separately-allocated
storage, even though at most one at a time actually stores any data.Result are not encapsulated. This makes the
Result API unsafe: nothing prevents client code from accessing .value
even when .discriminator is not 0. This also makes the patterns extremely
verbose..discriminator should never have any value other than 0, 1, or 2, but the
compiler can't enforce that property when Results are created, or exploit
it when Results are used. So, for example, the match must have a
default case in order for the compiler and other tools to consider it
exhaustive, even though that default case should never be entered.It also has a couple of ergonomic problems:
Result is largely boilerplate. Conceptually, the only
information needed to specify this type is the names and parameter types of
the two factory functions, the name of the static member Cancelled, plus
the fact that every possible value of Result is uniquely described by the
name and parameter values of a call to one of those two functions, or else
equal to Cancelled. Given that information, the compiler could easily
generate the rest of the struct definition. This generated implementation
may not always be as efficient as a hand-coded one could be, but in a lot of
cases that may not matter.return statements in ParseAsInt are quite verbose, due to the need
to explicitly qualify the function calls with Result(Int, String). In
fact, developers might prefer to avoid having a function call at all,
especially in the success case, and instead rely on implicit conversions to
write something like return result;.To summarize, the previous section identified several missing features in Carbon, which together would enable Carbon to support efficient and ergonomic sum types:
I propose supporting sum types by introducing several language features to supply the missing functionality identified above. These features are largely separable, although there are some dependencies between them, so their detailed design will be addressed in future proposals, and the details discussed here should be considered provisional. This proposal merely establishes the overall design direction for sum types, in the same way that #83 established the overall design direction for the language as a whole.
To support manual lifetime control and storage sharing, I propose introducing at least one and preferably both of the following:
Storage type, which represents a fixed-size buffer of untyped memory and
provides operations for creating and destroying objects within it.union facility, such as the one described in proposal
0139.To support encapsulation in pattern matching, I propose introducing a
Matchable interface, which a type can implement in order to specify how it
behaves in pattern matching, including what (if anything) constitutes an
exhaustive set of patterns for that type.
To allow users to concisely specify a sum type when they don't need to
micro-optimize the implementation details, I propose a choice syntax that
specifies a sum type purely in terms of its alternatives, which acts as a sugar
syntax for those lower-level features.
To avoid redundant boilerplate in functions that return sum types, I propose
allowing statements of the form return .X; and return .F();, which are
interpreted as if the function's return type appeared immediately prior to the
. character.
Cumulatively, these features will allow Status to be defined so that the usage
portions of the above example can be rewritten as follows:
fn ParseAsInt(String: s) -> Result(Int, String) {
var Int: result = 0;
var auto: it = s.begin();
while (it != s.end()) {
if (*it < '0' || *it > '9') {
return .Failure("String contains non-digit");
}
result += *it - '0';
result *= 10;
}
return .Success(result);
}
fn GetIntFromUser() -> Int {
while(True) {
var String: s = UserPrompt("Please enter a number");
match (ParseAsInt(s)) {
case .Success(var Int: value) => {
return value;
}
case .Failure(var String: error) => {
Display(error);
}
case .Cancelled => {
// We didn't request cancellation, so something is very wrong.
Terminate();
}
}
}
}
Open question: How will user-defined sum types (and pattern matching in general) support name bindings that allow you to mutate the underlying object?
This approach to sum types imposes relatively few requirements on the language
features used to implement shareable storage (meaning, storage that can be
inhabited by different objects at different times), and so this proposal doesn't
describe them in much detail. The primary options are typed unions along the
lines of proposal
0139, and untyped
byte buffers along the lines described below. Typed unions are somewhat safer
and more readable, but less general. For example, they can't support use cases
like implementing a small-object-optimized version of
std::any, because the set of
possible types is not known in advance.
This proposal takes the position that Carbon must have at least one of these two features. Whether it should have only untyped byte buffers, only typed unions, or both, is left as an open question, because the answer is orthogonal to the overall design direction that is the focus of this proposal. Consequently, I will not further discuss the tradeoffs between the two features. This proposal's examples focus on untyped byte buffers because they are simpler to describe, and aren't already covered by another proposal.
Regardless of the form that shareable storage takes, it won't be able to intrinsically keep track of whether it currently holds any objects, or the types or offsets of those objects, because that would require it to maintain additional hidden storage, and a major goal of this design is to give the developer explicit control of the object representation. Consequently, it is not safe to copy, move, assign to, or destroy shareable storage unless it is known not to be inhabited by an object.
This means that in the general case, the compiler will not be able to generate safe default implementations for any special member functions of types that have shareable storage members.
For purposes of illustration in this proposal, I will treat
Storage(SizeT size, SizeT align) as a library template representing an untyped
buffer of size bytes, aligned to align. It provides Create, Read, and
Destroy methods which create, access, and destroy an object of a specified
type within the buffer.
Using Storage, we can redefine Status as follows:
struct Result(Type:$$ T, Type:$$ Error) {
var Int: discriminator;
var Storage(Max(Sizeof(T), Sizeof(Error)),
Max(Alignof(T), Alignof(Error))): storage;
fn Success(T: value) -> Self {
Self result = (.discriminator = 0);
result.storage->Create(T, value);
return result;
}
fn Failure(Error: error) -> Self {
Self result = (.discriminator = 1);
result.storage->Create(Error, error);
return result;
}
var Self:$$ Cancelled = (.discriminator = 2);
// Copy, move, assign, destroy, and similar operations need to be defined
// explicitly, but are omitted for brevity.
}
As seen in the example above, we want to allow pattern-matching on Status to
look like this:
match (ParseAsInt(s)) {
case .Success(var Int: value) => {
return value;
}
case .Failure(var String: error) => {
Display(error);
}
case .Cancelled => {
// We didn't request cancellation, so something is very wrong.
Terminate();
}
}
For this to work, Status needs to specify two things:
match body, identify any
unreachable cases, and determine whether any cases are missing.Status object, determines which alternative is
present, and specifies the values of its parameters.Here's how Status can do that under this proposal:
struct Result(Type:$$ T, Type:$$ Error) {
var Int: discriminator;
var Storage(Max(Sizeof(T), Sizeof(Error)),
Max(Alignof(T), Alignof(Error))): storage;
interface MatchContinuation {
var Type:$$ ReturnType;
fn Success(T: value) -> ReturnType;
fn Failure(Error: error) -> ReturnType;
fn Cancelled() -> ReturnType;
}
impl Matchable(MatchContinuation) {
method (Ptr(Self): this) Match[MatchContinuation:$ Continuation](
Ptr(Continuation): continuation) -> Continuation.ReturnType {
match (discriminator) {
case 0 => {
return continuation->Success(this->storage.Read(T));
}
case 1 => {
return continuation->Failure(this->storage.Read(Error));
}
case 2 => {
return continuation->Cancelled();
}
default => {
Assert(false);
}
}
}
}
// Success() and Failure() factory functions, and the Cancelled static
// constant, are defined as above.
// Copy, move, assign, destroy, and similar operations need to be defined
// explicitly, but are omitted for brevity.
}
In this code, Result makes itself available for use in pattern matching by
declaring that it implements the Matchable interface. Matchable takes an
interface argument, MatchContinuation in this case, which specifies the set of
possible alternatives by declaring a method for each one. I'll call that
argument the continuation interface, for reasons that are about to become
clear.
The Match method of the Matchable interface. This method takes two
parameters: the value being matched against, and an instance of the continuation
interface, which the compiler generates from the match expression being
evaluated, with method bodies that correspond to the bodies of the corresponding
cases. Once Match has determined which alternative is present, and the
values of its parameters, it invokes the corresponding method of the
continuation object. Match is required to invoke exactly one continuation
method, and to do so exactly once.
This proposal assumes that Carbon will have support for defining and
implementing generic interfaces, including interfaces that take interface
parameters, and uses interface, impl, method etc. as placeholder
syntax. It can probably be revised to work if interfaces can't be parameterized
that way, or if we don't have a feature like this at all, but it might be
somewhat more awkward.
Notice that the names Success, Failure, and Cancelled are defined twice,
once as factory functions of Result and once as methods of
MatchContinuation, with the same parameter types in each case. The two
effectively act as inverses of each other: the factory functions compute a
Result from their parameters, and the methods are used to report the parameter
values that compute a given Result. This mirroring between expression and
pattern syntax is ultimately a design choice by the type author; there is no
language-level requirement that the alternatives correspond to the factory
functions.
This approach can be extended to support non-sum patterns as well. For example,
a type that wants to match a tuple-shaped pattern like (Int: i, String: s)
could define a continuation interface like
interface MatchContinuation {
fn operator()(Int: i, String: s);
}
A type's continuation interface can also include a function named
operator default, which implements the default case of the match.
Consequently, any match on that type will be required to have a default
case. This is valuable because it protects the type author's ability to add new
alternatives in the future, without causing build failures in client code.
Open question: Can
Matchactually invokeoperator default? It might sometimes be useful to define a type that can't be matched exhaustively, and the mere possibility that thedefaultcase could actually run might help encourage client code to implement it robustly, rather than blindly providing something likeAssert(False). However, if there's a language-level guarantee that thedefaultcase is unreachable if all other alternatives are handled, then we can allow code in the same library as the type to omit thedefaultcase. That's desirable because code in the same library doesn't pose an evolutionary risk, and it's often valuable to have a build-time guarantee that code within the type's own API explicitly handles every case.
Note that Match's continuation parameter type must be generic rather than
templated (:$ rather than :$$). Template specialization is driven by the
concrete values of the template arguments, but the type of the continuation
parameter may depend on code generation details that aren't yet known when
template specialization takes place. By the same token, we won't support
overloading Match, because overload resolution is likewise driven by the
concrete types of the function arguments, which may not be known at that point.
A designator is a token consisting of . followed by an identifier. The
canonical use case for designators is member access, as in foo.bar, where the
designator applies to the preceding expression. However, we expect Carbon will
also have some use cases for "bare" designators, where there is no preceding
expression. In particular, we expect to use bare designators to initialize named
tuple fields, as in (.my_int = 42, .my_str = "Foo"), which produces a tuple
with fields named my_int and my_str.
I propose to also permit using bare designators to refer to the alternatives of
a sum type, in cases where the sum type is clear from context. In particular, I
propose that in statements of the form return R.Alt; or
return R.Alt(<args>);, where R is the function return type, the R can be
omitted. Similarly, I propose that patterns of the form S.Alt or
S.Alt(<subpatterns>), where S is the type being matched against, the S can
be omitted. Note that both of these shorthands are allowed only at top level,
not as subexpressions or subpatterns.
Open question: Can we also permit these shorthands to be nested? This would be more consistent and less surprising, but could create ambiguity with the use of bare designators in tuple initialization: does
case (.Foo, .Bar)match a tuple of two fields namedFooandBar, or does it match a tuple of two positional fields, of sum types that respectively have.Fooand.Baras alternatives? Furthermore, allowing such nested usages may conflict with the proposed principle that type information should propagate only from an expression to its enclosing context, and not vice-versa.The issue of type propagation is particularly acute if we want to allow nesting of alternatives, such as
case .Foo(.Bar(_)). The problem there is that we are relying on the type ofFoo's parameter to tell us the type of the argument expression.Bar(_), but ifFoois overloaded, we can't determine the type of the parameter until the overload is resolved, and to do that we first need to know the type of the argument expression. And even ifFoois not currently overloaded, an overload might be added in the future (at least if the sum type has anoperator default). Adding overloads is a canonical example of the kind of software evolution that Carbon is intended to allow, so the validity of this code can't depend on whether or notFoois overloaded.
Rather than allowing code to omit the type altogether, we could allow code to
replace the type with a placeholder keyword, for example case auto.Foo. This
would avoid the ambiguity with tuple initialization, but would still mean that
type information is propagating into the expression from its surrounding context
rather than vice-versa (which also means that auto may not be an appropriate
spelling). Furthermore, this could wind up feeling fairly boilerplate-heavy,
even if the keyword is very short.
We could treat each bare designator as essentially defining its own type, which
may then implicitly convert to a suitable sum type. For example, we could think
of .Some(42) as having a type DesignatedTuple("Some", (Int)), and
Optional(Int) would define an implicit conversion from that type. This would
ensure that type information does not propagate into the expression from the
context, but would not resolve the ambiguity with tuple initialization.
Furthermore, this would mean that the names of bare designators do not need to
be declared before they are used, and in fact can't be meaningfully declared at
all.
This could have very surprising consequences. For example, a typo in a line of
code like var auto: x = .Sone(42); cannot be diagnosed at that line. Instead,
the problem can't be diagnosed until x is used, and even then it will show up
as an overload resolution error rather than a name lookup error. This is
particularly problematic because it is much harder to provide useful, actionable
diagnostics for overload resolution failure than for name lookup failure.
Relatedly, in the case of bare designators, IDEs would not be able to implement
tab-completion using Carbon's name lookup rules, but would effectively have to
invent their own ad hoc name lookup rules.
Note that this approach would work well with the "Indexing by type" alternative
discussed below, with instances of DesignatedTuple (or whatever we call it)
acting as tagged wrapper types.
As discussed above, we expect a well-behaved sum type to define factory functions that correspond to each of the alternatives in its continuation interface. This creates some potential ambiguity about whether a given use of an alternative name refers to the factory function or the continuation interface. For example, consider the following code:
var Result(Int, String): r = ...;
match (r) {
case Result(Int, String).Value(0) => ...
This could potentially be interpreted two ways:
Result(Int, String).Value(0) is evaluated as an ordinary function call,
and the result is compared with r to see if they match, presumably using
the == operator.match expression is evaluated by invoking
Result(Int, String)'s implementation of Matchable, with a continuation
object whose .Value(Int) method compares the Int parameter with 0.This can also be thought of as a name lookup problem: is .Value looked up in
Result(Int, String), or in Result(Int, String)'s implementation of
Matchable?
I propose to leave this choice unspecified, so that the compiler may validly
generate code either way. This gives the compiler more freedom to optimize, and
perhaps more importantly, it helps discourage sum type authors from
intentionally making its factory function behavior inconsistent with its pattern
matching behavior. By extension, I propose treating the shorthand syntax
case .Value(0) the same way.
We could instead avoid the ambiguity by providing separate syntaxes for the two
semantics. The more straightforward version of this would be to say that
Result(Int, String).Value(0) is always interpreted as an ordinary function
call, and patterns like Result(Int, String).Value(Int: i) are ill-formed
because they cannot be interpreted as function calls. However, this would
require us to have a syntax for matching alternatives that is disjoint from the
syntax for constructing alternatives. This would be at odds with existing
practice in languages like Rust, Swift, Haskell, and ML, all of which use the
same syntax for constructing and matching alternatives.
Note that it is tempting to use bare designators as the syntax for matching
alternatives, so that Result(Int, String).Value(0) is an expression, but
.Value(0) is a pattern. However, that is unlikely to be sufficient on its own,
because bare designators rely on type information that may not always be
available, especially in nested patterns. Hence, in order for this approach to
work, we would need to introduce a separate syntax for specifying the type of a
bare designator, such as .Value(0) as Result(Int, String).
Alternatively, we could say that code in a pattern-matching context is always
interpreted as a pattern, even if it could otherwise be interpreted as an
expression. We would then need to introduce a pattern operator for explicitly
evaluating a subpattern as an expression, such as
is Result(Int, String).Value(0) or == Result(Int, String).Value(0). However,
this would impose an educational and cognitive burden on users: FAQ entries like
"What's the difference between case .Foo and case is .Foo" and "How do I
choose between case .Foo and case is .Foo" seem inevitable, and would
require fairly nuanced answers. It would also add syntactic noise to the use
cases that correspond to C++ switch statements, where all of the cases are
fixed values.
We could instead specify that, when Result(Int, String).Value(0) appears in a
context where a pattern is expected, the name .Value is looked up as both an
ordinary function call and as a use of the Matchable interface, and specify
one of the two as the "winner" in the case where both lookups succeed. This is
effectively a variant of the previous option, except that some usages that would
be build errors under that approach would be saved by the fallback
interpretation here. In particular, it would still require us to introduce a
second syntax for the case where the programmer wants the lower-priority of the
two behaviors. Thus, it would carry largely the same drawbacks as the previous
option, with the additional drawback that there wouldn't be a consistent
correspondence between syntax and semantics.
We could instead specify that Result(Int, String).Value(0) is always
interpreted as an ordinary function call, but patterns like
Result(Int, String).Value(Int: i) are evaluated using the Matchable
interface, because no other implementation is possible. However, this would mean
that the name-lookup behavior of a function call depends on code that can be
arbitrarily deeply nested within it, which seems likely to be hostile to both
programmers and tools.
To allow users to define sum types without micromanaging the implementation
details, I propose introducing choice as a convenient syntax for defining a
sum type by specifying only the declarations of the set of alternatives. From
that information, the compiler generates an appropriate object representation,
and synthesizes definitions for the alternatives and special member functions.
Our manual implementation of Result above doesn't really benefit from having
direct control of the object representation, and doesn't seem to have any
additional API surfaces, so it's well-suited to being defined as a choice type
instead:
choice Result(Type:$$ T, Type:$$ Error) {
Success(T: value),
Failure(Error: error),
Cancelled
}
The body of a choice type definition consists of a comma-separated list of
alternatives. These have the same syntax as a function declaration, but with
fn and the return type omitted. If there are no arguments, the parentheses may
also be omitted, as with Cancelled. default may also be included in the
list, with the same meaning as operator default in a continuation interface:
it means that pattern-matching operations on this type must be prepared to
handle alternatives other than those explicitly listed.
The choice type will have a static factory function corresponding to each of the
alternatives with parentheses, and a static data member corresponding to each
alternative with no parentheses. The choice type will also implement the
Matchable interface, and its continuation interface will have a method
corresponding to each alternative.
In short, this definition of Result as a choice type will have the same
semantics as the earlier definition of it as a struct. It will probably also
have the same implementation, with a discriminator field and a storage buffer
large enough to hold the argument values of the alternatives. Any alternative
parameter types that are incomplete (or have unknown size for any other reason)
will be represented using owning pointers; among other things, this will allow
users to define recursive choice types. The implementation will be hidden, of
course, and the compiler may be able to generate better code, but we will design
this feature to support at least that baseline implementation strategy.
One consequence is that although the alternatives of a choice type can be
overloaded (as in the Variant example below), they cannot be templates. More
precisely, the parameter types of a pattern function must be fixed without
knowing the values of any of the arguments. To see why, consider a choice type
like the following, which attempts to emulate std::any:
choice Any {
Value[Type:$$ T](T: value)
}
The problem is that since T could be any type, and a single Any object could
hold values of different types throughout its lifetime, Any can't be
implemented using a storage buffer within the Any object. Instead, the storage
buffer for the T object would have to be allocated on the heap, but then the
compiler would need to decide whether to apply a small buffer optimization, and
if so what size threshold to use, etc. Allowing choice types to be implemented
in terms of heap allocation would make their performance far less predictable,
contrary to Carbon's performance goals, and would have little offsetting
benefit: these sorts of types appear to be rare, and when needed they should be
implemented in library code, where the performance tradeoffs are explicit and
under programmer control.
It may be possible to relax this restriction when and if we have a design for
supporting non-fixed-size types, although it's worth noting that even that would
not give us a way for Any to support assignment.
Carbon will probably have some mechanism for allowing a struct to have compiler-generated default implementations of operations such as copy, move, assignment, hashing, and equality comparison, so long as the struct's members support those operations. Assuming that mechanism exists, choice types will support it as well, with the parameter types of the pattern functions taking the place of the member types. However, there are a couple of special cases:
A future proposal for this mechanism will need to consider whether to require an explicit opt-in to generate these operations.
Open question: Should choice provide a way to directly access the
discriminator? Correspondingly, should it provide a way to specify the
discriminator type, and which discriminator values correspond to which
alternatives? These features would enable choice types to support all the same
use cases as C++ enums, and permit zero-overhead conversion between the two at
language boundaries.
choiceThe Rust and Swift counterparts of choice are spelled enum. I have avoided
this because these types are not really "enumerated types" in the sense of all
values being explicitly enumerated in the code. I chose the spelling choice
because "choice type" is one of the only available synonyms for "sum type" that
doesn't have any potentially-misleading associations.
choice types onlyRather than layering choice types on top of lower level features, we could
make them a primitive language feature, and simply not provide a way for user
code to customize the representation of sum types. However, this would mean that
users who encounter performance problems with the compiler-generated code for a
choice type would have no way to address those problems without rewriting all
code that uses that type. This would be contrary to Carbon's performance and
evolvability goals. Furthermore, the rewritten code would probably be
substantially less readable, and less safe, because it wouldn't be able to use
pattern matching.
Rather than requiring each alternative to have a distinct name (or at least a
distinct function signature), we could pursue a design that requires each
alternative to have a distinct type. With this approach, which I'll call
"type-indexed" as opposed to "name-indexed", Carbon sum types would much more
closely resemble C++'s std::variant, rather than Swift and Rust's enum or
the sum types of various functional programming languages.
Either approach can be emulated in terms of the other. For example, we don't yet
have enough of a design for variadics to give an example of a Carbon counterpart
for std::variant<Ts...>, but a variant with exactly three alternative types
could be written like so:
choice Variant(Type:$$ T1, Type:$$ T2, Type:$$ T3) {
Value(T1: value),
Value(T2: value),
Value(T3: value)
}
Conversely a type-indexed type like std::variant can model a name-indexed type
like Result(T,E) by introducing a wrapper type for each name, leading to
something like std::variant<Value<T>, Error<E>, Cancelled> (note that
std::variant<T,E> would not work, because T and E can be the same type).
In either case, emulating the other model introduces some syntactic overhead:
with name-indexing, Variant's factory functions must be given a name (Value)
even though it doesn't really convey any information, and emulating
Result(T,E) in terms of type-indexing requires separately defining the tagged
wrapper templates Value and Error.
The distinction between these two models of sum types seems analogous the distinction between the tuple and struct models of product types. Tuples and type-indexed sum types treat the data structurally, in terms of types and positional indices, but structs and name-indexed sum types require the components of the data to have names, which contributes to both readability and type-safety by attaching higher-level semantics to the data.
It is possible that both models of sum types could coexist in Carbon, just as structs and tuples do. However, that seems unlikely to be a good idea: the coexistence of tuples and structs is necessitated by the fact that it is quite difficult to emulate either of them in terms of the other in a type-safe way, but as we've seen, it's fairly straightforward to emulate either model of sum types in terms of the other.
Use cases that work best with type-indexing appear to be quite rare, just as use cases for tuples appear to be quite rare compared to use cases for structs. Consequently, if Carbon has only one form of sum types, it should probably be the name-indexed form, as proposed here.
Open question: Should Carbon have a native syntax for pattern matching on the dynamic type of an object? If so, should types like
Variantbe able to use it, instead of having the.Valueboilerplate in every pattern? Should this mechanism be aware of subtype relationships (so that a subtype pattern is a better match than a supertype pattern)? If so, how are those subtype relationships defined?
As a variant of the previous approach, we could allow types to specify their
pattern-matching behavior in terms of a proxy type that Carbon "natively" knows
how to pattern-match against. In the case of a sum type, this proxy would be a
choice type, which means that choice needs to be a fundamental part of the
language, rather than syntactic sugar for a sum type struct. Returning once more
to the Result example:
struct Result(Type:$$ T, Type:$$ Error) {
// Data members, factories, and special members same as above
choice Choice {
Success(T: value),
Failure(Error: error),
Cancelled
}
fn operator match(Ptr(Self): this) -> Choice {
match (discriminator) {
case 0 => {
return .Success(this->storage.Read(T));
}
case 1 => {
return .Failure(this->storage.Read(Error));
}
case 2 => {
return .Cancelled;
}
default => {
Assert(False);
}
}
}
}
This approach has several advantages:
However, it also has several drawbacks:
choice as a fundamental part of the language: in
order to implement a sum type, you have to work with an object type whose
layout and implementation is inherently opaque. This would be a substantial
departure from C++, and it's difficult to foresee the consequences of that.
Possibly the closest analogy in C++ is virtual calls, and especially virtual
base classes, where fundamental operations like -> and pointer casts can
involve nontrivial generated code, and some aspects of object layout are
required to be hidden from the user. However, Carbon seems to be moving away
from C++ in precisely those ways..Success in the body of operator match refers to Self.Success
rather than Self.Choice.Success. Relatedly, there's some risk that the
author may omit the leading ., and thereby invoke Self.Success instead
of Self.Choice.Success. This will probably fail to build, but the errors
may be confusing.Match to create a
mutable local variable, pass it to the callback by pointer/reference, and
then write the (possibly modified) contents of the variable back into the
sum type object.I very tentatively recommend the callback approach rather than this one, primarily because of the last point above: the Carbon type system is likely to be dramatically simplified if there are no reference types, but I think the proxy approach will make reference types all but unavoidable.
Rather than require user types to define both the pattern-matching logic and the factory functions, with the expectation that they will be inverses of each other, we could instead enable them to define the set of alternatives as factory functions that the compiler can invert automatically. This approach, like the primary proposal, would consist of several parts.
A pattern function is a function that can be invoked as part of a pattern,
even with arguments that contain placeholders. Pattern functions use the
introducer pattern instead of fn, and can only contain the sort of code that
could appear directly in a pattern. This lets us define reusable pattern
syntaxes that can do things like encapsulate hidden implementation details of
the object they're matching.
Next, we would introduce the concept of an alternatives block, which groups
together a set of factory functions and designates them as a set of
alternatives. They are required to be both exhaustive and unambiguous, meaning
that for any possible value of the type, it must be possible to obtain it as the
return value of exactly one of the alternatives. Alternatives that take no
arguments, which represent singleton states such as Cancelled, can instead be
written as static constants. An alternatives block can be marked closed,
which plays the same role as omitting operator default in the primary
proposal: it indicates that client code need not be prepared to accept
alternatives other than the ones currently present.
As with the primary proposal, Storage is used to represent a span of memory
that the user can create objects within. However, with this approach we also
need it to support initialization from a ToStorage factory function, because
pattern functions can't contain procedural code. Note that for the same reason,
ToStorage will probably need to be a language intrinsic, or implemented in
terms of one.
Using these features, the Result(T, Error) example can be written as follows:
struct Result(Type:$$ T, Type:$$ Error) {
var Int: discriminator;
var Storage(Max(Sizeof(T), Sizeof(Error)),
Max(Alignof(T), Alignof(Error))): storage;
closed alternatives {
pattern Success(T: value) -> Self {
return (.discriminator = 0, .storage = ToStorage(value));
}
pattern Failure(Error: error) -> Self {
return (.discriminator = 1, .storage = ToStorage(error));
}
var Self:$$ Cancelled = (.discriminator = 2);
}
}
Notice that with this approach, we do not need to define the special member
functions of Result manually. The compiler can infer appropriate definitions
in the same way that it infers how to invert these functions during pattern
matching.
As with the primary proposal, choice would be available as syntactic sugar,
but its syntax would mirror the syntax of an alternatives block:
closed choice Result(Type:$$ T, Type:$$ Error) {
pattern Success(T: value) -> Self;
pattern Failure(Error: error) -> Self;
var Self:$$ Cancelled;
}
This ensures that a sum type's API is defined using essentially the same syntax, regardless of how the type author chooses to implement it.
This approach is described in much more detail in an earlier draft of this document, where it was the primary proposal. It has a number of advantages over the primary proposal:
Match
call.However, this approach also carries some substantial drawbacks:
ToStorage to make that subset even minimally usable, and creates a
substantial risk that some user-defined sum types just won't be able to
support pattern matching.| pattern operator, and even prevents us from
using pattern matching with types that have non-salient state, like the
capacity of a std::vector.These issues, especially the overall complexity of this approach, leads me to recommend against adopting it.
This proposal gives us solid ground on which to continue developing features that rely on sum types and pattern matching thereof. The approach taken seems plausible, and while there's a good chance that we will revise it significantly before we finish our first pass over the complete Carbon design (for example, by switching to pattern matching proxies or by supporting mutation of matched elements), we don't think it will anchor us too much on this one particular direction.
With regard to Carbon's goals:
Note: At the decision meeting, it was stated that geoffromer will update the proposal to add a rationale to address austern's questions about the layered approach.