# Pattern matching syntax and semantics [Pull request](https://github.com/carbon-language/carbon-lang/pull/2188) ## Table of contents - [Abstract](#abstract) - [Problem](#problem) - [Background](#background) - [Proposal](#proposal) - [Details](#details) - [Expressions versus proper patterns](#expressions-versus-proper-patterns) - [Expression patterns](#expression-patterns) - [Bindings](#bindings) - [Name bindings](#name-bindings) - [Wildcard](#wildcard) - [Generic bindings](#generic-bindings) - [`auto` and type deduction](#auto-and-type-deduction) - [`var`](#var) - [Tuple patterns](#tuple-patterns) - [Struct patterns](#struct-patterns) - [Alternative patterns](#alternative-patterns) - [Templates](#templates) - [Guards](#guards) - [Refutability, overlap, usefulness, and exhaustiveness](#refutability-overlap-usefulness-and-exhaustiveness) - [Rationale](#rationale) - [Alternatives considered](#alternatives-considered) - [Shorthand for `auto`](#shorthand-for-auto) - [Struct pattern syntax](#struct-pattern-syntax) - [Type pattern matching](#type-pattern-matching) - [Introducer syntax for expression patterns](#introducer-syntax-for-expression-patterns) - [Allow guards on arbitrary patterns](#allow-guards-on-arbitrary-patterns) - [Treat expression patterns as exhaustive if they cover all possible values](#treat-expression-patterns-as-exhaustive-if-they-cover-all-possible-values) - [Allow non-exhaustive `match` statements](#allow-non-exhaustive-match-statements) - [Future work](#future-work) - [Or patterns](#or-patterns) - [As patterns](#as-patterns) - [Matching classes by dynamic type](#matching-classes-by-dynamic-type) - [User-defined pattern matching](#user-defined-pattern-matching) - [Matching classes with struct patterns](#matching-classes-with-struct-patterns) - [Matching by reference](#matching-by-reference) - [Type deduction](#type-deduction) - [Match expressions](#match-expressions) - [Examples](#examples) - [Examples from P0095R1](#examples-from-p0095r1) - [Figure 1. Declaration of a command data structure](#figure-1-declaration-of-a-command-data-structure) - [Figure 2. Implementation of an output operator](#figure-2-implementation-of-an-output-operator) - [Figure 3. Switching an enum](#figure-3-switching-an-enum) - [Figure 4. Expression datatype](#figure-4-expression-datatype) - [Figure 5. `struct` inspection](#figure-5-struct-inspection) - [Example from P1371R5: Red-black tree rebalancing](#example-from-p1371r5-red-black-tree-rebalancing) - [With P1371R5 pattern matching](#with-p1371r5-pattern-matching) - [With this proposal](#with-this-proposal) - [With this proposal plus #2187](#with-this-proposal-plus-2187) ## Abstract This paper proposes concrete syntax and semantic choices for Carbon patterns. ## Problem Carbon uses patterns wherever a value should be given a name, decomposed, or matched against: in function parameters, variable declarations, `match` statement `case`s, `for` statement loop variables, and so on. Simplified forms of patterns, required to be just a simple name binding, appear in additional contexts, such as fields in classes and implicit parameter lists. While we have syntax specified for some of these constructs, we do not have an approved proposal describing the syntax or semantics of patterns. ## Background See [pattern matching](https://en.wikipedia.org/wiki/Pattern_matching) on wikipedia for a broad overview of the subject. We refer to the value being matched by a pattern as the _scrutinee_. ## Proposal Patterns in Carbon are a generalization of the expression grammar. Compared to expressions, patterns add: - Bindings, of the form `name: type`, which give a name for the scrutinee. - `var` _pattern_, which creates a separate object to hold the value of the scrutinee, and causes any nested bindings to be mutable lvalues instead of immutable rvalues. - Additional syntax to make matching against structs more convenient. ## Details ### Expressions versus proper patterns Expressions are patterns, as described below. A pattern that is not an expression, because it contains pattern-specific syntax such as a binding, is a _proper pattern_. Many expression forms, such as arbitrary function calls, are not permitted as proper patterns, so cannot contain bindings. - _pattern_ ::= _proper-pattern_ ``` fn F(n: i32) -> i32 { return n; } match (F(42)) { // ❌ Error: binding can't appear in a function call. case (F(n: i32)) => {} } ``` ### Expression patterns An expression is a pattern. - _pattern_ ::= _expression_ The pattern is compared with the expression using the `==` operator: _pattern_ `==` _scrutinee_. ``` fn F(n: i32) { match (n) { // ✅ Results in an `n == 5` comparison. // OK despite `n` and `5` having different types. case 5 => {} } } ``` Any `==` operations performed by a pattern match occur in lexical order, but for repeated matches against the same _pattern_, later comparisons may be skipped by reusing the result from an earlier comparison: ``` class ChattyIntMatcher { external impl as EqWith(i32) { fn Eq[me: ChattyIntMatcher](other: i32) { Print("Matching {0}", other); return other == 1; } } } fn F() { // Prints `Matching 1` then `Matching 2`, // may or may not then print `Matching 1` again. match ((1, 2)) { case ({} as ChattyIntMatcher, 0) => {} case (1, {} as ChattyIntMatcher) => {} case ({} as ChattyIntMatcher, 2) => {} } } ``` ### Bindings #### Name bindings A name binding is a pattern. - _binding-pattern_ ::= `unused`? _identifier_ `:` _expression_ - _proper-pattern_ ::= _binding-pattern_ The type of the _identifier_ is specified by the _expression_. The scrutinee is implicitly converted to that type if necessary. ``` fn F() -> i32 { match (5) ( // ✅ `5` is implicitly converted to `i32`. // Returns `5 as i32`. case n: i32 => { return n; } } } ``` When a new object needs to be created for the binding, the lifetime of the bound value matches the scope of the binding. ``` class NoisyDestructor { fn Make() -> Self { return {}; } external impl i32 as ImplicitAs(NoisyDestructor) { fn Convert[me: i32]() -> Self { return Make(); } } destructor { Print("Destroyed!"); } } fn G() { // Does not print "Destroyed!". let n: NoisyDestructor = NoisyDestructor.Make(); Print("Body of G"); // Prints "Destroyed!" here. } fn H(n: i32) { // Does not print "Destroyed!". let (v: NoisyDestructor, w: i32) = (n, n); Print("Body of H"); // Prints "Destroyed!" here. } ``` As specified in [#2022](/proposals/p2022.md#the-behavior-of-unused-name-bindings), the `unused` keyword indicates that the binding is intended to not be used. #### Wildcard A syntax like a binding but with `_` in place of an identifier can be used to ignore part of a value. - _binding-pattern_ ::= `_` `:` _expression_ See [#2022](/proposals/p2022.md) for details. The behavior is similar to that of an `unused` binding with a unique name. ``` fn F(n: i32) { match (n) { // ✅ Matches and discards the value of `n`. case _: i32 => {} // ❌ Error: unreachable. default => {} } } ``` As specified in [#1084](/proposals/p1084.md), function redeclarations may replace named bindings with wildcards but may not use different names. ``` fn G(n: i32); fn H(n: i32); fn J(n: i32); // ✅ Does not use `n`. fn G(_: i32) {} // ❌ Error: name of parameter does not match declaration. fn H(m: i32) {} // ✅ Does not use `n`. fn J(unused n: i32); ``` #### Generic bindings A `:!` can be used in place of `:` for a binding that is usable at compile time. - _generic-pattern_ ::= `unused`? `template`? _identifier_ `:!` _expression_ - _generic-pattern_ ::= `template`? `_` `:!` _expression_ - _proper-pattern_ ::= _generic-pattern_ ``` // ✅ `F` takes a generic type parameter `T` and a parameter `x` of type `T`. fn F(T:! Type, x: T) { var v: T = x; } ``` The `template` keyword indicates the binding is introducing a template parameter, so name lookups into the parameter should be deferred until its value is known. #### `auto` and type deduction The `auto` keyword is a placeholder for a unique deduced type. - _expression_ ::= `auto` ``` fn F(n: i32) { var v: auto = SomeComplicatedExpression(n); // Equivalent to: var w: T = SomeComplicatedExpression(n); // ... where `T` is the type of the initializer. } ``` The `auto` keyword is only permitted in specific contexts. Currently these are: - As the return type of a function. - As the type of a binding. It is anticipated that `auto` may be permitted in more contexts in the future, for example as a generic argument in a parameterized type that appears in a context where `auto` is allowed, such as `Vector(auto)` or `auto*`. When the type of a binding requires type deduction, the type is deduced against the type of the scrutinee and deduced values are substituted back into the type before pattern matching is performed. ``` fn G[T:! Type](p: T*); class X { external impl as ImplicitAs(i32*); } // ✅ Deduces `T = i32` then implicitly and // trivially converts `p` to `i32*`. fn H1(p: i32*) { G(p); } // ❌ Error, can't deduce `T*` from `X`. fn H2(p: X) { G(p); } ``` The above is only an illustration; the behavior of type deduction is not specified in this proposal. ### `var` A `var` prefix indicates that a pattern provides mutable storage for the scrutinee. - _proper-pattern_ ::= `var` _proper-pattern_ A `var` pattern matches when its nested pattern matches. The type of the storage is the resolved type of the nested _pattern_. Any bindings within the nested pattern refer to portions of the corresponding storage rather than to the scrutinee. ``` fn F(p: i32*); fn G() { match ((1, 2)) { // `n` is a mutable `i32`. case (var n: i32, 1) => { F(&n); } // `n` and `m` are the elements of a mutable `(i32, i32)`. case var (n: i32, m: i32) => { F(if n then &n else &m); } } } ``` Pattern matching precedes the initialization of the storage for any `var` patterns. An introduced variable is only initialized if the complete pattern matches. ``` class X { destructor { Print("Destroyed!"); } } fn F(x: X) { match ((x, 1 as i32)) { case (var y: X, 0) => {} case (var z: X, 1) => {} // Prints "Destroyed!" only once, when `z` is destroyed. } } ``` A `var` pattern cannot be nested within another `var` pattern. The declaration syntax `var` _pattern_ `=` _expresson_ `;` is equivalent to `let` `var` _pattern_ `=` _expression_ `;`. ### Tuple patterns A tuple of patterns can be used as a pattern. - _tuple-pattern_ ::= `(` [_expression_ `,`]\* _proper-pattern_ [`,` _pattern_]\* `,`? `)` - _proper-pattern_ ::= _tuple-pattern_ A _tuple-pattern_ containing no commas is treated as grouping parens: the contained _proper-pattern_ is matched directly against the scrutinee. Otherwise, the behavior is as follows. A tuple pattern is matched left-to-right. The scrutinee is required to be of tuple type. Note that a tuple pattern must contain at least one _proper-pattern_. Otherwise, it is a tuple-valued expression. However, a tuple pattern and a corresponding tuple-valued expression are matched in the same way because `==` for a tuple compares fields left-to-right. ### Struct patterns A struct can be matched with a struct pattern. - _proper-pattern_ ::= `{` [_field-init_ `,`]\* _proper-field-pattern_ [`,` _field-pattern_]\* `}` - _proper-pattern_ ::= `{` [_field-pattern_ `,`]+ `_` `}` - _field-init_ ::= _designator_ `=` _expression_ - _proper-field-pattern_ ::= _designator_ `=` _proper-pattern_ - _proper-field-pattern_ ::= _binding-pattern_ - _field-pattern_ ::= _field-init_ - _field-pattern_ ::= _proper-field-pattern_ A struct pattern resembles a struct literal, with at least one field initialized with a proper pattern: ``` match ({.a = 1, .b = 2}) { // Struct literal as an expression pattern. case {.b = 2, .a = 1} => {} // Struct pattern. case {.b = n: i32, .a = m: i32} => {} } ``` The scrutinee is required to be of struct type, and to have the same set of field names as the pattern. The pattern is matched left-to-right, meaning that matching is performed in the field order specified in the pattern, not in the field order of the scrutinee. This is consistent with the behavior of matching against a struct-valued expression, where the expression pattern becomes the left operand of the `==` and so determines the order in which `==` comparisons for fields are performed. In the case where a field will be bound to an identifier with the same name, a shorthand syntax is available: `a: T` is synonymous with `.a = a: T`. ``` match ({.a = 1, .b = 2}) { case {a: i32, b: i32} => { return a + b; } } ``` If some fields should be ignored when matching, a trailing `, _` can be added to specify this: ``` match ({.a = 1, .b = 2}) { case {.a = 1, _} => { return 1; } case {b: i32, _} => { return b; } } ``` This is valid even if all fields are actually named in the pattern. ### Alternative patterns An alternative pattern is used to match one alternative of a choice type. - _proper-pattern_ ::= _callee-expression_ _tuple-pattern_ - _proper-pattern_ ::= _designator_ _tuple-pattern_? Here, _callee-expression_ is syntactically an expression that is valid as the callee in a function call expression, and an alternative pattern is syntactically a function call expression whose argument list contains at least one _proper-pattern_. If a _callee-expression_ is provided, it is required to name a choice type alternative that has a parameter list, and the scrutinee is implicitly converted to that choice type. Otherwise, the scrutinee is required to be of some choice type, and the designator is looked up in that type and is required to name an alternative with a parameter list if and only if a _tuple-pattern_ is specified. The pattern matches if the active alternative in the scrutinee is the specified alternative, and the arguments of the alternative match the given tuple pattern (if any). ``` choice Optional(T:! Type) { None, Some(T) } match (Optional(i32).None) { // ✅ `.None` resolved to `Optional(i32).None`. case .None => {} // ✅ `.Some` resolved to `Optional(i32).Some`. case .Some(n: i32) => { Print("{0}", n); } // ❌ Error, no such alternative exists. case .Other => {} } class X { external impl as ImplicitAs(Optional(i32)); } match ({} as X) { // ✅ OK, but expression pattern. case Optional(i32).None => {} // ✅ OK, implicitly converts to `Optional(i32)`. case Optional(i32).Some(n: i32) => { Print("{0}", n); } } ``` Note that a pattern of the form `Optional(T).None` is an expression pattern and is compared using `==`. ### Templates Any checking of the type of the scrutinee against the type of the pattern that cannot be performed because the type of the scrutinee involves a template parameter is deferred until the template parameter's value is known. During instantiation, patterns that are not meaningful due to a type error are instead treated as not matching. This includes cases where an `==` fails because of a missing `EqWith` implementation. ``` fn TypeName[template T:! Type](x: T) -> String { match (x) { // ✅ OK, the type of `x` is a template parameter. case _: i32 => { return "int"; } case _: bool => { return "bool"; } case _: auto* => { return "pointer"; } default => { return "unknown"; } } } ``` Cases where the match is invalid for reasons not involving the template parameter are rejected when type-checking the template: ``` fn MeaninglessMatch[template T:! Type](x: T*) { match (*x) { // ✅ OK, `T` could be a tuple. case (_: auto, _: auto) => {} default => {} } match (x->y) { // ✅ OK, `T.y` could be a tuple. case (_: auto, _: auto) => {} default => {} } match (x) { // ❌ Error, tuple pattern cannot match value of non-tuple type `T*`. case (_: auto, _: auto) => {} default => {} } } ``` ### Guards We allow `case`s within a `match` statement to have _guards_. These are not part of pattern syntax, but instead are specific to `case` syntax: - _case_ ::= `case` _pattern_ [`if` _expression_]? `=>` _block_ A guard indicates that a `case` only matches if some predicate holds. The bindings in the pattern are in scope in the guard: ``` match (x) { case (m: i32, n: i32) if m + n < 5 => { return m - n; } } ``` For consistency, this facility is also available for `default` clauses, so that `default` remains equivalent to `case _: auto`. ### Refutability, overlap, usefulness, and exhaustiveness Some definitions: - A pattern _P_ is _useful_ in the context of a set of patterns _C_ if there exists a value that _P_ can match that no pattern in _C_ matches. - A set of patterns _C_ is _exhaustive_ if it matches all possible values. Equivalently, _C_ is exhaustive if the pattern `_: auto` is not useful in the context of _C_. - A pattern _P_ is _refutable_ if there are values that it does not match, that is, if the pattern `_: auto` is useful in the context of {_P_}. Equivalently, the pattern _P_ is _refutable_ if the set of patterns {_P_} is not exhaustive. - A set of patterns _C_ is _overlapping_ if there exists any value that is matched by more than one pattern in _C_. For the purpose of these terms, expression patterns that match a constant tuple, struct, or choice value are treated as if they were tuple, struct, or alternative patterns, respectively, and `bool` is treated like a choice type. Any expression patterns that remain after applying this rule are considered to match a single value from an infinite set of values so that a set of expression patterns is never exhaustive: ``` fn IsEven(n: u8) -> bool { // Not considered exhaustive. match (n) { case 0 => { return true; } case 1 => { return false; } ... case 255 => { return false; } } // Code here is considered to be reachable. } ``` ``` fn IsTrue(b: bool) -> bool { // Considered exhaustive. match (b) { case false => { return false; } case true => { return true; } } // Code here is considered to be unreachable. } ``` When determining whether a pattern is useful, no attempt is made to determine the value of any guards, and instead a worst-case assumption is made: a guard on that pattern is assumed to evaluate to true and a guard on any pattern in the context set is assumed to evaluate to false. We will diagnose the following situations: - A pattern is not useful in the context of prior patterns. In a `match` statement, this happens if a pattern or `default` cannot match because all cases it could cover are handled by prior cases or a prior `default`. For example: ``` choice Optional(T:! Type) { None, Some(T) } fn F(a: Optional(i32), b: Optional(i32)) { match ((a, b)) { case (.Some(a: i32), _: auto) => {} // ✅ OK, but only matches values of the form `(None, Some)`, // because `(Some, Some)` is matched by the previous pattern. case (_: auto, .Some(b: i32)) => {} // ✅ OK, matches all remaining values. case (.None, .None) => {} // ❌ Error, this pattern never matches. case (_: auto, _: auto) => {} } } ``` - A pattern match is not exhaustive and the program doesn't explicitly say what to do when no pattern matches. For example: - If the patterns in a `match` are not exhaustive and no `default` is provided. ``` fn F(n: i32) -> i32 { // ❌ Error, this `match` is not exhaustive. match (n) { case 0 => { return 2; } case 1 => { return 3; } case 2 => { return 5; } case 3 => { return 7; } case 4 => { return 11; } } } ``` - If a refutable pattern appears in a context where only one pattern can be specified, such as a `let` or `var` declaration, and there is no fallback behavior. This currently includes all pattern matching contexts other than `match` statements, but the `var`/`let`-`else` feature in [#1871](https://github.com/carbon-language/carbon-lang/pull/1871) would introduce a second context permitting refutable matches, and overloaded functions might introduce a third context. ``` fn F(n: i32) { // ❌ Error, refutable expression pattern `5` used in context // requiring an irrefutable pattern. var 5 = n; } // ❌ Error, refutable expression pattern `5` used in context // requiring an irrefutable pattern. fn G(n: i32, 5); ``` - When a set of patterns have no ordering or tie-breaker, it is an error for them to overlap unless there is a unique best match for any value that matches more than one pattern. However, this situation does not apply to any current language rule: - For `match` statements, patterns are matched top-down, so overlap is permitted. - We do not yet have an approved design for overloaded functions, but it is anticipated that declaration order will be used in that case too. - For a set of `impl`s that match a given `impl` lookup, argument deduction is used rather than pattern matching, but `impl`s with the same type structure are an error unless a `match_first` declaration is used to order the `impl`s. (This is a pre-existing rule and is unchanged by this proposal.) ## Rationale - [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) - The `, _` syntax for struct patterns enables a style where adding a struct member is not a breaking change. - The requirement that matches be exhaustive makes it easier to add new cases to a choice type, by requiring the compiler to detect places where the new value is not handled. - [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) - Pattern syntax makes it easier to match complex values. - Modeling pattern syntax after expressions eases the burden of learning a new sub-language for pattern-matching: patterns are an extension of expressions, and expressions are a special case of patterns. - Requiring exhaustiveness for matches makes control flow easier to understand as there is never a value for which the `match` skips all cases. - [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) - The rules for matching a templated value can be used to replace `if constexpr` in many cases. ## Alternatives considered ### Shorthand for `auto` We could provide a shorter syntax for `name: auto`. [Proposal #851](https://github.com/carbon-language/carbon-lang/blob/trunk/proposals/p0851.md#elide-the-type-instead-of-using-auto) considered the following shorthands and decided against using them: ``` var n: _ = init; var n = init; ``` A novel suggestion that avoids some of the disadvantages of those syntaxes would be to use: ``` var n:= init; ``` Advantages: - Shorter syntax for variables with a deduced type. - Potentially allows removal of the `auto` keyword. Disadvantages: - Appears to introduce a `:=` syntax, but that only arises in cases where an initializer immediately follows the name. - Cases such as `var (a:, b:) = my_pair;` would either be invalid or would not use the `:=` syntax. - If we accept such cases, there is a risk of grammar ambiguities. - If we reject such cases, we may still want to keep `auto` around for them, creating inconsistency. - Not a complete replacement for `auto` if we want to also allow things like `v: Vector(auto)`. C++ doesn't allow the equivalent syntax currently, but it was part of the Concepts TS and seems likely to return at some point. - No syntactic difference between accidentally omitting a type entirely and requesting type deduction. However, the mistake of omitting a type but retaining the `:` seems unlikely, and the `:` followed by the absence of a type is a signal that something is happening, so this seems to be less of a concern than for the `var n = init;` syntax. See discussion topics [1](https://github.com/carbon-language/carbon-lang/discussions/1495) and [2](https://github.com/carbon-language/carbon-lang/discussions/1988). ### Struct pattern syntax We could omit the `, _` syntax. This would simplify struct patterns, but at the cost of removing a feature that can be useful for reducing verbosity and making library evolution easier. We could always allow a struct pattern to match a struct with more fields, without requiring a `, _` suffix. This would aid evolution by reducing the cases where adding a field to a struct can be a breaking change, but such cases would still exist. Further, this would make matches against a struct-valued expression inconsistent with matches against a struct pattern. We could use a different syntax instead of `, _`. Other options that were explicitly considered: - `, ...` seems visually evocative of "and more stuff", but risks conflicting with variadic syntax or at least being confusing when used in a variadic context, given that variadics are expected to claim `...` for pack expansion. - `, ._` suggests matching a field without specifying a name, but might create an impression of matching just one field, and three low punctuation characters in a row seems to be pushing the limits of readability. On balance, `, _` harmonizes well with the use of `_` to introduce a wildcard, without being too visually confusing given that the other use of `_` has a following `:`. We could remove the `{field: type}` shorthand and require `{.field = field: type}`. This may avoid encouraging reusing the field name even when it is not appropriate, and would remove some syntactic sugar that's not formally necessary. However, we expect this to be a common case whose ergonomics are important. We could use a different syntax for `{field: type}` that is less divergent from other struct syntaxes: - `{.field: type}` seems like a contender, but doesn't work because that is already recognized as a struct type literal. Also, the use of normal binding syntax means that every locally-introduced name is always introduced by `name: type` where `name` is not preceded by `.`. - `{.=field: type}` might be a reasonable mnemonic shorthand for `{.field = field: type}`, but looks a little surprising. This is probably the best choice if concerns are found with `{field: type}` syntax. ### Type pattern matching We could treat type deduction as a form of pattern matching. For example, we could allow ``` fn F(a: Vector(T:! Type)) -> T { return a[0]; } fn G() -> i32 { let v: Vector(i32) = (1, 2, 3); // Deduces `T = i32`. return F(v); } ``` where the value of `T` is determined by pattern-matching `Vector(T:! Type)` against the supplied type `Vector(i32)` of `v`. And symmetrically: ``` fn H[m: i32, n: i32](k: i32, (m, n)) { return m + n; } fn I() { // Deduces `m = 2`, `n = 3`. H(1, (2, 3)); } ``` This would ensure consistency between pattern matching and deduction, potentially reducing the number of rules that Carbon developers need to learn. We find that use of pattern-matching in type position can harm readability. For example, the first of these two examples may be easier to read due to having less nesting: ``` fn F[T:! Type](x: T); fn F(x: (T:! Type)); ``` Having distinct syntax for type-level matching and value-level matching helps guide the reader to the correct interpretation, even though the underlying matching process is expected to be similar or identical. As a result, we keep type deduction syntax and pattern matching syntax separate for now: - In pattern matching, bindings and wildcards are introduced by nested `:` / `:!` patterns, and the right-hand side of a binding pattern is never a proper pattern. - In type deduction, deduced values are specified separately and an expression written in terms of those bindings describes the type. There are some contexts where there is no syntactic location for introducing deduced values, such as in `case` labels. For syntactic consistency, such cases should be addressed by adding `forall` syntax, if there is motivation to support deduction: ``` match (templated_value) { case forall [template T:! Type] (p: T*) => { heap.Delete(p); } } ``` ### Introducer syntax for expression patterns We could have some separate introducer syntax to distinguish expression patterns from other kinds of patterns: ``` match ((a, b)) { case is (1, 2) => {} case (is 3, n: i32) => {} case (m: i32, is 4) => {} } ``` This would reduce the chance of confusion in cases where an expression and a similar-looking pattern are treated differently: ``` class TupleLike { external impl as (i32, i32); } fn MatchTupleLike(t: TupleLike) { match (t) { // ✅ OK, expression pattern; // `t` implicitly converted to `(i32, i32)` by // built-in `impl EqWith` for tuples. case (1, 2) => {} // ❌ Error, `t` is not a tuple. case (n: i32, 3) => {} } } ``` This would also permit the same, or overlapping, syntax to be used in expressions and patterns with different meanings, for example: ``` fn F() { // Here, `[...]` is an array type. var array: [i32; 5]; match (&array) { // Here, `[...]` could introduce deduced parameters. case [T:! Type] x: T* => {} } } ``` Similarly, if we added a `&PATT` pattern to match the address of a value, a construct like `&n` would be ambiguous without in introducer: ``` var n: i32 = 5; fn MatchPointerOrPointee(p: i32*) { match (p) { case is &n => { Print("given pointer to n"); } // Here, the pattern `&PATT` could match the address of // a value that matches `PATT`. case &(is n) => { Print("given pointer to i32 whose value equals the value of n"); } case &(m: i32) => { Print("given pointer to value {0}", m); } } } ``` Such an introducer would also denote the portions of a pattern that are evaluated at runtime rather than at compile time, benefitting Carbon's goals of readability and of predictable performance, and would also add syntactic separation between the parts of a pattern that have full exhaustiveness checking and the parts that do not. However, this would also introduce additional ceremony for the common case where part of a pattern is a specific value. This could be mitigated by permitting certain kinds of value as patterns without an introducer, such as numeric literals and `true` and `false`, at the cost of introducing more complexity and more confusion over which cases require `is` and which do not. We have also not identified a good choice for the introducer syntax, should we pursue this direction. `is` is not an ideal choice, because elsewhere in Carbon syntax, it is a relation between a value and its type, so `is T` may be misread as matching values whose type is `T`. `==` _expression_ has been suggested, but that would imply the opposite operand order of that in this proposal -- _scrutinee_ `==` _expression_ rather than _expression_ `==` _scrutinee_ -- which would compare struct fields in a surprising order that diverges from the order of comparison for a struct pattern. A prefix `==` may also be visually surprising. For now, we do not add such an introducer, but this decision is expected to be revisited by a future proposal. See: - [2022-09-16 Discord discussion in #syntax](https://discord.com/channels/655572317891461132/709488742942900284/1020559500597538886) - [2022-12-05 open discussion](https://docs.google.com/document/d/1tEt4iM6vfcY0O0DG0uOEMIbaXcZXlNREc2ChNiEtn_w/edit#bookmark=id.utvzosvvfg80) ### Allow guards on arbitrary patterns We could treat guards as part of pattern syntax instead of as part of `case` syntax. However, since guards make a pattern refutable, this wouldn't allow them anywhere other than in `case`s in the current language design. It would allow them to be nested within cases: ``` match (x) { case (n: i32 if n > 5, "some string") => { ... } } ``` Such nesting might allow an expensive later check to be avoided. For example, in the above case we can avoid an `==` comparison on a string if a cheaper comparison of `n > 5` fails. However, this would introduce complexity into the grammar, and it's not clear that this feature would add sufficient value to justify that complexity. An additional concern is that if we add `let`...`else` syntax, this would presumably permit things like: ``` let n: i32 if n > 5 = 20 else { return 0; }; ``` ... where it would be easy to misparse the `if ... else` as being a single construct, where the intended parse would be: ``` let ((n: i32) if n > 5) = 20 else { return 0; }; ``` See also a [related Discord discussion](https://discord.com/channels/655572317891461132/748959784815951963/981691016040030279). ### Treat expression patterns as exhaustive if they cover all possible values We could do more work to treat a set of expression patterns as being exhaustive, if each pattern has a constant value and between those constant values, all possible values of the type are covered. The advantage of this would be that we improve the precision of our language rules. This change in rules has some disadvantages and problems: - It would add some complexity to the rules and to implementations in order to track whether all possible values have been created. - For even the simplest types where this would apply, such as `i8`, it seems unlikely that a `match` covering all possible values would be written, due to the large number of patterns required. - In many cases, the value being matched will carry an invariant so that matching a subset of the representable values would match all meaningful values. We would still have imprecise rules in those cases. - Expression patterns are matched with `==`, and for an arbitrary `==` it is not computable in general to tell whether a given set of values is exhaustive, so we would only be able to apply this in some subset of cases. This restriction is straightforward to work around by adding a final unreachable `default`, or by replacing the final value match with a `default` or `_`. ### Allow non-exhaustive `match` statements We could permit `match` statements that are not exhaustive, and execute none of the case blocks if none of the patterns match. This is a very common choice in the design of such language features. However, it is also a common source of errors, for example when matching a sum type and an alternative is missed. By making this a language rule, we ensure that developers can rely on such mistakes being caught. There is an easy syntactic way to disable this check, by adding an explicit `default => {}` case. If we made the opposite choice, we would not automatically have an easy way to request that the check be performed. We could add syntax to request it, but such syntax would likely be forgotten and the default behavior would be that errors are silently permitted. This seems sufficient to outweigh the potential ergonomic cost of requiring the `default => {}` case to be written explicitly. Another motivation for requiring exhaustiveness is that it simplifies other language rules. For example, when determining whether control flow can reach the end of a function with a declared return type, a separate exhaustiveness analysis is not necesasry. One concern with exhaustiveness checking is that it will cause the addition of an alternative to a choice type to be a breaking change by default. However, this is also one of the main advantages, and the design of choice types is intended to eventually provide a mechanism to specify that a choice type is extensible, which if used would mean that a set of patterns for that choice type would only be considered exhaustive if it includes a wildcard pattern. ## Future work ### Or patterns We could provide "or patterns", allowing matching of one pattern or another with the same handler: ``` match (x) { case (m: i32, 0) | (0, m: i32) => { return m; } } ``` See the [red-black tree rebalancing example](#example-from-p1371r5-red-black-tree-rebalancing) for a real-world example where this would result in a simplification. ### As patterns We could provide ["as-patterns"](https://en.wikibooks.org/wiki/Haskell/Pattern_matching#As-patterns) as a convenient way to give a name to a value while still matching parts of that value. Following Haskell, we could use: - _pattern_ ::= _identifier_ `@` _pattern_ For example: ``` match (x) { // `s` names the first element of the tuple. case (s@{.a = n: i32, .b = 12}, 4) if n > 1 => { return s; } } ``` ### Matching classes by dynamic type We could provide a way to match polymorphic class objects based on their dynamic type. This might be the default when matching a polymorphic class, or might require opt-in. The behavior in this proposal is that only the static type of the operand is considered. For example, we could default to matching the static type, and allow a `dyn` _pattern_ syntax for matching a pointer to a polymorphic class type, meaning that we match the dynamic type rather than the static type of the pointer: ``` abstract class Base { virtual fn F[me: Self](); } class Derived1 extends Base {} class Derived2 extends Base {} fn PrintType(b: Base*) { match (b) { // `case d1: Derived*` would be invalid here, // because it could never match. case dyn d1: Derived1* => { Print("Derived1"); } case dyn d2: Derived2* => { Print("Derived2"); } default => { Print("Unknown derived class"); } } } fn PrintTemplateType[template T:! Type](p: T*) { match (p) { // OK, dispatch is based on the static type. case b: Base* => { Print("Base"); } case d1: Derived1* => { Print("Derived1"); } case d2: Derived2* => { Print("Derived2"); } default => { Print("Unknown class"); } } } ``` However, at this time we do not have a design for a checked down-cast, so it's not clear how this matching operation would fit into the design of classes, either syntactically or semantically. ### User-defined pattern matching We plan to provide a mechanism for allowing a user-defined type to specify how it can be matched by patterns. See [proposal #157](/proposals/p0157.md) for details. ### Matching classes with struct patterns We could allow a class to be matched by a struct pattern that matches its fields. This would make sense especially for data classes, and would be consistent with the behavior of `==` in the case where a struct is implicitly convertible to the class type. However, without a design for user-defined pattern matching and matching on dynamic type, there is a significant risk that this would conflict with the rules there, so it is deferred for now. ### Matching by reference When the scrutinee is an lvalue, it is sometimes desirable to form a mutable binding to it. For example, see the [`struct` inspection example](#figure-5-struct-inspection) below. We currently support this only for the `me` binding in a method, using the `addr` keyword, but could allow this more generally. ```carbon fn takeDamage(p: player*) { match (*p) { case {.hitpoints = 0, .lives = 0, _} => { gameOver(); } case {.hitpoints = addr hp: i32*, .lives = addr l: i32*, _} if *hp == 0 => { *hp = 10; --*l; } case {.hitpoints = addr hp: i32*, _} if *hp <= 3 => { --*hp; messageAlmostDead(); } case {.hitpoints = addr hp: i32*, _} => { --*hp; } } } ``` Work in this area will need to consider whether we can provide this feature ergonomically without introducing reference-like behavior. ### Type deduction This proposal does not cover type deduction, instead considering it to be a separate topic from pattern matching syntax, even though the semantic behavior of the two may be quite similar or identical. We will need a proposal to explore type deduction and describe its functioning. ### Match expressions As demonstrated in the [switching an enum](#figure-3-switching-an-enum) example below, it would be valuable to have an expression `match` syntax in addition to the statement `match` syntax. We could follow the same approach as for `if` statements, and say that a `match` that appears at the start of a statement is a statement `match` and any other `match` is an expression `match`. As candidate syntax, braced `case` bodies could be replaced by an expression followed by a comma. For example: ``` let opengl_color: Vec3 = match (c) { case .red => Vec3.Make(1.0, 0.0, 0.0), case .yellow => Vec3.Make(1.0, 1.0, 0.0), case .green => Vec3.Make(0.0, 1.0, 0.0), case .blue => Vec3.Make(0.0, 0.0, 1.0) }; ``` ## Examples These examples are translations of examples in WG21 paper [P0095R1](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0095r1.html) and [P1371R3](https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2020/p1371r3.pdf), with permission from the author of those examples, David Sankel. Thank you, David! ### Examples from P0095R1 #### Figure 1. Declaration of a command data structure
C++P0095R1This proposal
```c++ struct set_score { std::size_t value; }; struct fire_missile {}; struct fire_laser { unsigned intensity; }; struct rotate { double amount; }; struct command { std::variant< set_score, fire_missile, fire_laser, rotate > value; }; ``` ```c++ lvariant command { std::size_t set_score; std::monostate fire_missile; unsigned fire_laser; double rotate; }; ``` ```carbon choice command { set_score(u64), fire_missile, fire_laser(u32), rotate(f64) } ```
#### Figure 2. Implementation of an output operator
C++P0095R1This proposal
```c++ namespace { struct Output { std::ostream& operator()(std::ostream& stream, const set_score& ss) const { return stream << "Set the score to " << ss.value << ".\n"; } std::ostream& operator()(std::ostream& stream, const fire_missile&) const { return stream << "Fire a missile.\n"; } std::ostream& operator()(std::ostream& stream, const fire_laser& fl) const { return stream << "Fire a laser with " << fl.intensity << " intensity.\n"; } std::ostream& operator()(std::ostream& stream, const rotate& r) const { return stream << "Rotate by " << r.degrees << " degrees.\n" } }; } std::ostream& operator<<(std::ostream& stream, const command& cmd) { return std::visit(std::bind(Output(), std::ref(stream), std::placeholders::_1), cmd.value); } ``` ```c++ std::ostream& operator<<(std::ostream& stream, const command& cmd) { return inspect(cmd) { set_score value => stream << "Set the score to " << value << ".\n" fire_missile _ => stream << "Fire a missile.\n" fire_laser intensity => stream << "Fire a laser with " << intensity << " intensity.\n" rotate degrees => stream << "Rotate by " << degrees << " degrees.\n" } } ``` ```carbon impl command as Printable { fn Print[me: Self](stream: Cpp.std.ostream*) { match (me) { case .set_score(value: u64) => { stream->Print("Set the score to {0}.\n", value); } case .fire_missile => { stream->Print("Fire a missile.\n"); } case .fire_laser(intensity: u32) => { stream->Print("Fire a laser with {0} intensity.\n", intensity); } case .rotate(degrees: f64) => { stream->Print("Rotate by {0} degrees.\n", degrees); } } } } ```
#### Figure 3. Switching an enum
C++P0095R1This proposal
```c++ enum color { red, yellow, green, blue }; ``` ```carbon choice color { red, yellow, green, blue } ```
```c++ const Vec3 opengl_color = [&c] { switch(c) { case red: return Vec3(1.0, 0.0, 0.0); break; case yellow: return Vec3(1.0, 1.0, 0.0); break; case green: return Vec3(0.0, 1.0, 0.0); break; case blue: return Vec3(0.0, 0.0, 1.0); break; default: std::abort(); }(); ``` ```c++ const Vec3 opengl_color = inspect(c) { red => Vec3(1.0, 0.0, 0.0) yellow => Vec3(1.0, 1.0, 0.0) green => Vec3(0.0, 1.0, 0.0) blue => Vec3(0.0, 0.0, 1.0) }; ``` ```carbon // Carbon has neither expression-match nor lambdas yet, // so this can't easily be done in line. fn GetOpenGLColor(c: color) -> Vec3 { match (c) { case .red => { return Vec3.Make(1.0, 0.0, 0.0); } case .yellow => { return Vec3.Make(1.0, 1.0, 0.0); } case .green => { return Vec3.Make(0.0, 1.0, 0.0); } case .blue => { return Vec3.Make(0.0, 0.0, 1.0); } } } let opengl_color: Vec3 = GetOpenGLColor(c); ```
#### Figure 4. Expression datatype
C++P0095R1This proposal
```c++ struct expression; struct sum_expression { std::unique_ptr left_hand_side; std::unique_ptr right_hand_side; }; struct expression { std::variant value; }; expression simplify(const expression & exp) { if(sum_expression const * const sum = std::get_if(&exp)) { if( int const * const lhsInt = std::get_if( sum->left_hand_side.get() ) && *lhsInt == 0 ) { return simplify(*sum->right_hand_side); } else if( int const * const rhsInt = std::get_if( sum->right_hand_side.get() ) && *rhsInt == 0 ) { return simplify(*sum->left_hand_side); } else { return {sum_expression{ std::make_unique(simplify(*sum->left_hand_side)), std::make_unique(simplify(*sum->right_hand_side))}} } } return exp; } void simplify2(expression & exp) { if(sum_expression * const sum = std::get_if(&exp)) { if( int * const lhsInt = std::get_if( sum->left_hand_side.get() ) && *lhsInt == 0 ) { expression tmp(std::move(*sum->right_hand_side)); exp = std::move(tmp); simplify(exp); } else if( int * const rhsInt = std::get_if( sum->right_hand_side.get() ) && *rhsInt == 0 ) { expression tmp(std::move(*sum->left_hand_side)); exp = std::move(tmp); simplify(exp); } else { simplify(*sum->left_hand_side); simplify(*sum->right_hand_side); } } return exp; } ``` ```c++ lvariant expression; struct sum_expression { std::unique_ptr left_hand_side; std::unique_ptr right_hand_side; }; lvariant expression { sum_expression sum; int literal; std::string var; }; expression simplify(const expression & exp) { return inspect(exp) { sum {*(literal 0), *rhs} => simplify(rhs) sum {*lhs , *(literal 0)} => simplify(lhs) sum {*lhs , *rhs} => expression::sum{ std::make_unique(simplify(lhs)), std::make_unique(simplify(rhs))}; _ => exp }; } void simplify2(expression & exp) { inspect(exp) { sum {*(literal 0), *rhs} => { expression tmp(std::move(rhs)); exp = std::move(tmp); simplify2(exp); } sum {*lhs , *(literal 0)} => { expression tmp(std::move(lhs)); exp = std::move(tmp); simplify2(exp); } sum {*lhs , *rhs} => { simplify2(lhs); simplify2(rhs); } _ => ; }; } ``` ```carbon choice expression { sum(UniquePtr(expression), UniquePtr(expression)), literal(i32), var: String } // This assumes that UniquePtr provides the matching // functionality described in #2187. fn simplify(exp: expression) -> expression { match (exp) { case .sum(.PtrTo(.literal(0)), .PtrTo(rhs: expression)) => { return simplify(rhs); } case .sum(.PtrTo(lhs: expression), .PtrTo(.literal(0))) => { return simplify(lhs); } case .sum(.PtrTo(lhs: expression), .PtrTo(rhs: expression)) => { return expression.sum(MakeUnique(simplify(lhs)), MakeUnique(simplify(rhs))); } default => { return exp; } } } ```
#### Figure 5. `struct` inspection
C++P0095R1This proposal
```c++ struct player { std::string name; int hitpoints; int lives; }; ``` ```carbon class player { var name: String; var hitpoints: i32; var lives: i32; } ```
```c++ void takeDamage(player &p) { if(p.hitpoints == 0 && p.lives == 0) gameOver(); else if(p.hitpoints == 0) { p.hitpoints = 10; p.lives--; } else if(p.hitpoints <= 3) { p.hitpoints--; messageAlmostDead(); } else { p.hitpoints--; } } ``` ```c++ void takeDamage(player &p) { inspect(p) { {hitpoints: 0, lives:0} => gameOver(); {hitpoints:hp@0, lives:l} => hp=10, l--; {hitpoints:hp} if (hp <= 3) => { hp--; messageAlmostDead(); } {hitpoints:hp} => hp--; } } ``` ```carbon fn takeDamage(p: player*) { match (*p) { case {.hitpoints = 0, .lives = 0, _} => { gameOver(); } case {.hitpoints = 0, _} => { p->hitpoints = 10; --p->lives; } case {.hitpoints = hp: i32, _} if hp <= 3 => { --p->hitpoints; messageAlmostDead(); } default => { --p->hitpoints; } } } ```
### Example from P1371R5: Red-black tree rebalancing #### With P1371R5 pattern matching ```c++ enum Color { Red, Black }; template struct Node { void balance(); Color color; std::shared_ptr lhs; T value; std::shared_ptr rhs; }; template void Node::balance() { *this = inspect (*this) { [case Black, (*?) [case Red, (*?) [case Red, a, x, b], y, c], z, d] => Node{Red, std::make_shared(Black, a, x, b), y, std::make_shared(Black, c, z, d)}; [case Black, (*?) [case Red, a, x, (*?) [case Red, b, y, c]], z, d] // left-right case => Node{Red, std::make_shared(Black, a, x, b), y, std::make_shared(Black, c, z, d)}; [case Black, a, x, (*?) [case Red, (*?) [case Red, b, y, c], z, d]] // right-left case => Node{Red, std::make_shared(Black, a, x, b), y, std::make_shared(Black, c, z, d)}; [case Black, a, x, (*?) [case Red, b, y, (*?) [case Red, c, z, d]]] // right-right case => Node{Red, std::make_shared(Black, a, x, b), y, std::make_shared(Black, c, z, d)}; self => self; }; } ``` #### With this proposal ```carbon choice Color { Red, Black } class Node(T:! Type) { fn balance[addr me: Self*](); var color: Color; var lhs: SharedPtr(Node); var value: T; var rhs: SharedPtr(Node); } fn MakeBalanced[T:! Type](a: SharedPtr(Node(T)), x: T, b: SharedPtr(Node(T)), y: T, c: SharedPtr(Node(T)), z: T, d: SharedPtr(Node(T))) -> Node(T) { return {.color = Color.Red, .lhs = MakeShared({.color = Color.Black, .lhs = a, .value = x, .rhs = b}), .value = y, .rhs = MakeShared({.color = Color.Black, .lhs = c, .value = z, .rhs = d})}; } fn Node(T:! Type).balance[addr me: Self*]() { match (*me) { {.color = .Black, .lhs = .PtrTo( {.color = .Red, .lhs = .PtrTo( {.color = .Red, .lhs = a: auto, .value = x: T, .rhs = b: auto}), .value = y: T, .rhs = c: auto}), .value = z: T, .rhs = d: auto} => { *me = MakeBalanced(a, x, b, y, c, z, d); } {.color = .Black, .lhs = .PtrTo( {.color = .Red, .lhs = a: auto, .value = x: T, .rhs = .PtrTo( {.color = .Red, .lhs = b: auto, .value = y: T, .rhs = c: auto})}), .value = z: T, .rhs = d: auto} => { *me = MakeBalanced(a, x, b, y, c, z, d); } {.color = .Black, .lhs = a: auto, .value = x: T, .rhs = .PtrTo( {.color = .Red, .lhs = .PtrTo( {.color = .Red, .lhs = b: auto, .value = y: T, .rhs = c: auto}), .value = z: T, .rhs = d: auto})} => { *me = MakeBalanced(a, x, b, y, c, z, d); } {.color = .Black, .lhs = a: auto, .value = x: T, .rhs = .PtrTo( {.color = .Red, .lhs = b: auto, .value = y: T, .rhs = .PtrTo( {.color = .Red, .lhs = c: auto, .value = z: T, .rhs = d: auto})})} => { *me = MakeBalanced(a, x, b, y, c, z, d); } default => {} }; } ``` #### With this proposal plus #2187 ```carbon choice Color { Red, Black } class Node(T:! Type) { fn balance[addr me: Self*](); var color: Color; var lhs: SharedPtr(Self); var value: T; var rhs: SharedPtr(Self); external impl as Match { interface Continuation { extends Match.BaseContinuation; fn Red[addr me: Self*](lhs: SharedPtr(Self), value: T, rhs: SharedPtr(Self)) -> ReturnType; fn Black[addr me: Self*](lhs: SharedPtr(Self), value: T, rhs: SharedPtr(Self)) -> ReturnType; } fn Op[me: Self, C:! Continuation](continuation: C*) -> C.ReturnType { match (me.color) { case .Red => { return continuation->Red(me.lhs, me.value, me.rhs); } case .Black => { return continuation->Black(me.lhs, me.value, me.rhs); } } } } } fn MakeBalanced[T:! Type](a: SharedPtr(Node(T)), x: T, b: SharedPtr(Node(T)), y: T, c: SharedPtr(Node(T)), z: T, d: SharedPtr(Node(T))) -> Node(T) { return {.color = Color.Red, .lhs = MakeShared({.color = Color.Black, .lhs = a, .value = x, .rhs = b}), .value = y, .rhs = MakeShared({.color = Color.Black, .lhs = c, .value = z, .rhs = d})}; } fn Node(T:! Type).balance[addr me: Self*]() { match (*me) { .Black(.PtrTo(.Red(.PtrTo(.Red(a: auto, x: T, b: auto)), y: T, c: auto)), z: T, d: auto) => { *me = MakeBalanced(a, x, b, y, c, z, d); } .Black(.PtrTo(.Red(a: auto, x: T, .PtrTo(.Red(b: auto, y: T, c: auto)))), z: T, d: auto) => { *me = MakeBalanced(a, x, b, y, c, z, d); } .Black(a: auto, x: T, .PtrTo(.Red(.PtrTo(.Red(b: auto, y: T, c: auto)), z: T, d: auto))) => { *me = MakeBalanced(a, x, b, y, c, z, d); } .Black(a: auto, x: T, .PtrTo(.Red(b: auto, y: T, .PtrTo(.Red(c: auto, z: T, d: auto))))) => { *me = MakeBalanced(a, x, b, y, c, z, d); } default => {} }; } ```