Эх сурвалжийг харах

Expression form basics (#5545)

This proposal introduces the concept of a _form_, which is a
generalization of
"type" that encompasses all of the information about an expression
that's
visible to the type system, including type and expression category.
Forms can be
composed into _tuple forms_ and _struct forms_, which lets us track the
categories of individual tuple and struct literal elements.

The proposal PR also adds `ref` bindings to the pattern matching
documentation,
but that is not part of the proposal itself; it's just bringing the
documentation
up to date with proposal
[#5434](https://github.com/carbon-language/carbon-lang/pull/5434).

---------

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
Co-authored-by: Carbon Infra Bot <carbon-external-infra@google.com>
Geoff Romer 2 сар өмнө
parent
commit
2d5e5e9692

+ 15 - 13
docs/design/classes.md

@@ -601,19 +601,21 @@ Assert(different_order.y == 2);
 ```
 
 Initialization and assignment occur field-by-field. The order of fields is
-determined from the target on the left side of the `=`. This rule matches what
-we expect for classes with encapsulation more generally.
-
-**Open question:** What operations and in what order happen for assignment and
-initialization?
-
--   Is assignment just destruction followed by initialization? Is that
-    destruction completed for the whole object before initializing, or is it
-    interleaved field-by-field?
--   When initializing to a literal value, is a temporary containing the literal
-    value constructed first or are the fields initialized directly? The latter
-    approach supports types that can't be moved or copied, such as mutex.
--   Perhaps some operations are _not_ ordered with respect to each other?
+determined by the source on the right side of the `=`, and individual operations
+are generally interleaved field-by-field. See [here](values.md#type-conversions)
+for details about the semantics, and
+[here](pattern_matching.md#evaluation-order) for details about the order of
+operations.
+
+> **Open question:** Do we need a way for a class to require the source order to
+> match? Should that be the default, with an opt out?
+
+> **Open question:** What operations and in what order happen for assignment?
+>
+> -   Is assignment just destruction followed by initialization? Is that
+>     destruction completed for the whole object before initializing, or is it
+>     interleaved field-by-field?
+> -   Perhaps some operations are _not_ ordered with respect to each other?
 
 ### Operations performed field-wise
 

+ 5 - 0
docs/design/expressions/implicit_conversions.md

@@ -20,6 +20,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Same type](#same-type)
     -   [Pointer conversions](#pointer-conversions)
     -   [Facet types](#facet-types)
+    -   [Struct, tuple, and array types](#struct-tuple-and-array-types)
 -   [Consistency with `as`](#consistency-with-as)
 -   [Extensibility](#extensibility)
 -   [Alternatives considered](#alternatives-considered)
@@ -189,6 +190,10 @@ implicitly converted to the facet type `TT2` if `T`
 [satisfies the requirements](../generics/details.md#subtyping-between-facet-types)
 of `TT2`.
 
+### Struct, tuple, and array types
+
+See [here](/docs/design/values.md#type-conversions).
+
 ## Consistency with `as`
 
 An implicit conversion of an expression `E` of type `T` to type `U`, when

+ 31 - 16
docs/design/expressions/member_access.md

@@ -13,7 +13,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 -   [Overview](#overview)
 -   [Member resolution](#member-resolution)
     -   [Package and namespace members](#package-and-namespace-members)
-    -   [Types and facets](#types-and-facets)
+    -   [Types, forms, and facets](#types-forms-and-facets)
     -   [Tuple indexing](#tuple-indexing)
     -   [Values](#values)
     -   [Facet binding](#facet-binding)
@@ -127,13 +127,18 @@ A member access expression is processed using the following steps:
 The process of _member resolution_ determines which member `M` a member access
 expression is referring to.
 
-For a simple member access, if the first operand is a type, facet, package, or
-namespace, a search for the member name is performed in the first operand.
-Otherwise, a search for the member name is performed in the type of the first
-operand. In either case, the search must succeed. In the latter case, if the
-result is an instance member, then [instance binding](#instance-binding) is
+For a simple member access, if the first operand is a type, form, facet,
+package, or namespace, a search for the member name is performed in the first
+operand. Otherwise, a search for the member name is performed in the type of the
+first operand. In either case, the search must succeed. In the latter case, if
+the result is an instance member, then [instance binding](#instance-binding) is
 performed on the first operand.
 
+A search for a name within a form searches for the name in its
+[type component](/docs/design/values.md#expression-forms). Note that this means
+that the form of an expression never affects simple member access into that
+expression, except through its type component.
+
 For a compound member access, the second operand is evaluated as a compile-time
 constant to determine the member being accessed. The evaluation is required to
 succeed and to result in a member of a type, interface, or non-type facet, or a
@@ -189,11 +194,12 @@ class Bar {
 }
 ```
 
-### Types and facets
+### Types, forms, and facets
 
-If the first operand is a type or facet, it must be a compile-time constant.
-This disallows member access into a type except during compile-time, see leads
-issue [#1293](https://github.com/carbon-language/carbon-lang/issues/1293).
+If the first operand is a type, form, or facet, it must be a compile-time
+constant. This disallows member access into a type except during compile-time,
+see leads issue
+[#1293](https://github.com/carbon-language/carbon-lang/issues/1293).
 
 Like the previous case, types (including
 [facet types](/docs/design/generics/terminology.md#facet-type)) have member
@@ -228,6 +234,9 @@ class Avatar {
 Simple member access `(Avatar as Cowboy).Draw` finds the `Cowboy.Draw`
 implementation for `Avatar`, ignoring `Renderable.Draw`.
 
+Similarly, a form has members, specifically the members of the form's type
+component.
+
 ### Tuple indexing
 
 Tuple types have member names that are *integer-literal*s, not *word*s.
@@ -273,9 +282,9 @@ let n: i32 = p->(e);
 
 ### Values
 
-If the first operand is not a type, package, namespace, or facet, it does not
-have member names, and a search is performed into the type of the first operand
-instead.
+If the first operand is not a type, form, package, namespace, or facet, it does
+not have member names, and a search is performed into the type of the first
+operand instead.
 
 ```carbon
 interface Printable {
@@ -723,16 +732,22 @@ fn SumIntegers(v: Vector(Integer)) -> Integer {
 ## Instance binding
 
 Next, _instance binding_ may be performed. This associates an expression with a
-particular object instance. For example, this is the value bound to `self` when
-calling a method.
+particular object or value instance. For example, this is the value bound to
+`self` when calling a method.
 
 For the simple member access syntax `x.y`, if `x` is an entity that has member
 names, such as a namespace or a type, then `y` is looked up within `x`, and
 instance binding is not performed. Otherwise, `y` is looked up within the type
 of `x` and instance binding is performed if an instance member is found.
 
-If instance binding is performed:
+If instance binding is to be performed, the result of instance binding depends
+on what instance member `M` was found:
 
+-   For a field member of a struct type or tuple type, `x` is converted to a
+    struct or tuple form by
+    [form decomposition](/docs/design/values.md#category-conversions), and the
+    `.f` element of the result of that conversion becomes the result of `x.f`.
+    All other elements are [discarded](/docs/design/values.md#form-conversions).
 -   For a field member in class `C`, `x` is required to be of type `C` or of a
     type derived from `C`. The result is the corresponding subobject within `x`.
     If `x` is an

+ 6 - 0
docs/design/functions.md

@@ -97,6 +97,9 @@ possible syntaxes:
         `fn Sleep(seconds: i64) -> ();`.
     -   `()` is similar to a `void` return type in C++.
 
+> **TODO:** Update this section to cover return forms, as discussed
+> [here](values.md#function-calls-and-returns).
+
 ### `return` statements
 
 The [`return` statement](control_flow/return.md) is essential to function
@@ -111,6 +114,9 @@ When the return clause is provided, including when it is `-> ()`, the `return`
 statement must have an expression that is convertible to the return type, and a
 `return` statement must be used to end control flow of the function.
 
+> **TODO:** Update this section to cover the requirements on the form of the
+> expression.
+
 ## Function declarations
 
 Functions may be declared separate from the definition by providing only a

+ 259 - 144
docs/design/pattern_matching.md

@@ -18,7 +18,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
         -   [Name binding patterns](#name-binding-patterns)
         -   [Anonymous bindings](#anonymous-bindings)
             -   [Alternatives considered](#alternatives-considered-1)
-        -   [Compile-time bindings](#compile-time-bindings)
         -   [`auto` and type deduction](#auto-and-type-deduction)
         -   [Alternatives considered](#alternatives-considered-2)
     -   [`var`](#var)
@@ -36,10 +35,12 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
         -   [Alternatives considered](#alternatives-considered-6)
         -   [Guards](#guards)
     -   [Pattern matching in local variables](#pattern-matching-in-local-variables)
+-   [Evaluation order](#evaluation-order)
+    -   [Alternatives considered](#alternatives-considered-7)
 -   [Open questions](#open-questions)
     -   [Slice or array nested value pattern matching](#slice-or-array-nested-value-pattern-matching)
     -   [Pattern matching as function overload resolution](#pattern-matching-as-function-overload-resolution)
--   [Alternatives considered](#alternatives-considered-7)
+-   [Alternatives considered](#alternatives-considered-8)
 -   [References](#references)
 
 <!-- tocstop -->
@@ -60,14 +61,13 @@ of the full pattern as well.
 
 ## Pattern Syntax and Semantics
 
-Expressions are patterns, as described below. A pattern that is not an
+All expressions are patterns, but they may be either tuple patterns, struct
+patterns, or expression patterns, as described below. A pattern that is not an
 expression, because it contains pattern-specific syntax such as a binding
 pattern, is a _proper pattern_. Many expression forms, such as arbitrary
 function calls, are not permitted as proper patterns, so cannot contain binding
 patterns.
 
--   _pattern_ ::= _proper-pattern_
-
 ```carbon
 fn F(n: i32) -> i32 { return n; }
 
@@ -81,10 +81,11 @@ match (F(42)) {
 
 An expression is a pattern.
 
--   _pattern_ ::= _expression_
+-   _expression-pattern_ ::= _expression_
+-   _pattern_ ::= _expression-pattern_
 
-The pattern is compared with the expression using the `==` operator: _pattern_
-`==` _scrutinee_.
+The scrutinee is compared with the expression using the `==` operator:
+_expression_ `==` _scrutinee_.
 
 ```carbon
 fn F(n: i32) {
@@ -96,31 +97,30 @@ fn F(n: i32) {
 }
 ```
 
-Any `==` operations performed by a pattern match occur in lexical order, but for
-repeated matches against the same _pattern_, later comparisons may be skipped by
-reusing the result from an earlier comparison:
+As depicted here, _expression-pattern_ is ambiguous with _tuple-pattern_,
+_struct-pattern_, and _alternative-pattern_. In the case of _tuple-pattern_ and
+_struct-pattern_, the ambiguity is resolved in their favor, meaning that a tuple
+or struct literal in a pattern context is not interpreted as an expression
+pattern, but as a tuple or struct pattern whose elements are expression
+patterns. For example:
 
 ```carbon
-class ChattyIntMatcher {
-  impl as EqWith(i32) {
-    fn Eq[self: ChattyIntMatcher](other: i32) {
-      Print("Matching {0}", other);
-      return other == 1;
-    }
-  }
-}
-
-fn F() {
-  // Prints `Matching 1` then `Matching 2`,
-  // may or may not then print `Matching 1` again.
-  match ((1, 2)) {
-    case ({} as ChattyIntMatcher, 0) => {}
-    case (1, {} as ChattyIntMatcher) => {}
-    case ({} as ChattyIntMatcher, 2) => {}
-  }
+match (0, 1, 2) {
+  case (F(), 0, G()) => ...
 }
 ```
 
+Here `(F(), 0, G())` is not an expression, but three separate expressions in a
+tuple pattern. As a result, this code will call `F()` but not `G()`, because the
+mismatch between the middle tuple elements will cause pattern matching to fail
+before reaching `G()`. Other than this short-circuiting behavior, a tuple
+pattern of expression patterns behaves the same as if it were a single
+expression pattern.
+
+The resolution of the _alternative-pattern_ ambiguity is not specified, because
+_alternative-pattern_ is specified to behave the same way an expression pattern
+would, in the cases where they overlap.
+
 #### Alternatives considered
 
 -   [Introducer syntax for expression patterns](/proposals/p2188.md#introducer-syntax-for-expression-patterns)
@@ -131,14 +131,16 @@ fn F() {
 
 A name binding pattern is a pattern.
 
--   _binding-pattern_ ::= _identifier_ `:` _expression_
--   _proper-pattern_ ::= _binding-pattern_
+-   _binding-pattern_ ::= `ref`? (_identifier_ | `self`) `:` _expression_
+-   _binding-pattern_ ::= `template`? _identifier_ `:!` _expression_
+-   _pattern_ ::= _binding-pattern_
 
 A name binding pattern declares a _binding_ with a name specified by the
 _identifier_, which can be used as an expression. If the binding pattern is
-enclosed by a `var` pattern, it is a _reference binding pattern_, and the
-binding is a durable reference expression. Otherwise, it is a _value binding
-pattern_, and the binding is a value expression.
+prefixed with `ref` or enclosed by a `var` pattern, it is a _reference binding
+pattern_, and otherwise it is a _value binding pattern_. A binding pattern
+enclosed by a `var` pattern cannot have a `ref` prefix, because it would be
+redundant.
 
 A _variable binding pattern_ is a special kind of reference binding pattern,
 which is the immediate subpattern of its enclosing `var` pattern.
@@ -147,15 +149,36 @@ which is the immediate subpattern of its enclosing `var` pattern.
 > expected to be the only difference between variable binding patterns and other
 > reference binding patterns.
 
-The type of the binding is specified by the _expression_. If the pattern is a
-value binding pattern, the scrutinee is implicitly converted to a value
-expression of that type if necessary, and the binding is _bound_ to the
-converted value. If the pattern is a reference binding pattern, the enclosing
-`var` pattern will ensure that the scrutinee is already a durable reference
-expression with the specified type, and the binding is bound directly to it.
-
-A use of a value binding is a value expression of the declared type, and a use
-of a reference binding is a durable reference expression of the declared type.
+If the pattern syntax uses `:` it is a _runtime binding pattern_. If it uses
+`:!`, it is a _compile-time binding pattern_, and it cannot appear inside a
+`var` pattern. A compile-time binding pattern is either a _symbolic binding
+pattern_ or a _template binding pattern_, depending on whether it is prefixed
+with `template`.
+
+The binding declared by a binding pattern has a
+[primitive form](values.md#expression-forms) with the following components:
+
+-   The type is _expression_.
+-   The category is "value" if the pattern is a value binding pattern, "durable
+    entire reference" if it's a variable binding pattern, or "durable non-entire
+    reference" if it's a non-variable reference binding pattern.
+-   The phase is "runtime", "symbolic", or "template" depending on whether the
+    pattern is a runtime, symbolic, or template binding pattern.
+
+During pattern matching, the scrutinee is implicitly converted as needed to have
+the same form, and the binding is _bound_ to (and consumes) the result of these
+conversions. This makes a runtime or template binding a kind of reusable alias
+for the converted scrutinee expression, with the same form and value. Symbolic
+bindings are more complex: the binding will have the same type, category, and
+phase as the converted scrutinee expression, but its constant value is an opaque
+symbol introduced by the binding, which the type system knows to be equal to the
+converted scrutinee expression.
+
+Note that there is no way to implicitly convert to a durable reference
+expression from any other category, so the scrutinee of a reference binding
+pattern must already be a durable reference. `var` pattern matching ensures that
+this is the case for the bindings nested inside it, but for `ref` binding
+patterns the user-provided scrutinee must meet this requirement itself.
 
 ```carbon
 fn F() -> i32 {
@@ -170,41 +193,23 @@ fn F() -> i32 {
 }
 ```
 
-When a new object needs to be created for the binding, the lifetime of the bound
-value matches the scope of the binding.
-
-```carbon
-class NoisyDestructor {
-  fn Make() -> Self { return {}; }
-  impl i32 as ImplicitAs(NoisyDestructor) {
-    fn Convert[self: i32]() -> Self { return Make(); }
-  }
-  destructor {
-    Print("Destroyed!");
-  }
-}
-
-fn G() {
-  // Does not print "Destroyed!".
-  let n: NoisyDestructor = NoisyDestructor.Make();
-  Print("Body of G");
-  // Prints "Destroyed!" here.
-}
-
-fn H(n: i32) {
-  // Does not print "Destroyed!".
-  let (v: NoisyDestructor, w: i32) = (n, n);
-  Print("Body of H");
-  // Prints "Destroyed!" here.
-}
-```
+When `self` is used instead of an identifier, the pattern must appear in the
+implicit parameter list of a method (as discussed [here](classes.md#methods)).
+During pattern matching in a method call, the parameter pattern containing
+`self` is matched with the object that the method was invoked on. In all other
+respects, the `self` pattern behaves just like an ordinary binding pattern,
+introducing a binding named `self` into scope, just as if `self` were an
+identifier rather than a keyword.
 
 #### Anonymous bindings
 
-A syntax like a binding but with `_` in place of an identifier can be used to
-ignore part of a value.
+A syntax like a binding but with `_` in place of an identifier is an anonymous
+binding. It does not participate in name lookup (so there can be multiple such
+patterns in the same scope), and in all other respects it behaves as if it were
+wrapped in an [`unused` pattern](#unused).
 
 -   _binding-pattern_ ::= `_` `:` _expression_
+-   _binding-pattern_ ::= `template`? `_` `:!` _expression_
 
 ```carbon
 fn F(n: i32) {
@@ -239,27 +244,6 @@ fn H(m: i32) {}
 -   [Anonymous, named identifiers](/proposals/p2022.md#anonymous-named-identifiers)
 -   [Attributes](/proposals/p2022.md#attributes)
 
-#### Compile-time bindings
-
-A `:!` can be used in place of `:` for a binding that is usable at compile time.
-
--   _compile-time-pattern_ ::= `template`? _identifier_ `:!` _expression_
--   _compile-time-pattern_ ::= `template`? `_` `:!` _expression_
--   _compile-time-pattern_ ::= `unused` `template`? _identifier_ `:!`
-    _expression_
--   _proper-pattern_ ::= _compile-time-pattern_
-
-```carbon
-// ✅ `F` takes a symbolic facet parameter `T` and a parameter `x` of type `T`.
-fn F(T:! type, x: T) {
-  var v: T = x;
-}
-```
-
-The `template` keyword indicates the binding pattern is introducing a template
-binding, so name lookups into the binding will not be fully resolved until its
-value is known.
-
 #### `auto` and type deduction
 
 The `auto` keyword is a placeholder for a unique deduced type.
@@ -310,12 +294,18 @@ specified.
 A `var` prefix indicates that a pattern provides mutable storage for the
 scrutinee.
 
--   _proper-pattern_ ::= `var` _proper-pattern_
+-   _pattern_ ::= `var` _pattern_
 
-A `var` pattern matches when its nested pattern matches. The type of the storage
-is the resolved type of the nested _pattern_. Any binding patterns within the
-nested pattern are reference binding patterns, and their bindings refer to
-portions of the corresponding storage rather than to the scrutinee.
+The scrutinee is expected to have the same type as the resolved type of the
+nested _pattern_, and it is expected to be a runtime-phase ephemeral entire
+reference expression, which therefore refers to a newly-allocated temporary
+object. The scrutinee expression is converted as needed to satisfy those
+expectations, and the `var` pattern takes ownership of the referenced object,
+promotes it to a _durable_ entire reference expression, and matches the nested
+_pattern_ with it.
+
+The lifetime of the allocated object extends to the end of scope of the `var`
+pattern (that is the scope that any bindings declared within it would have).
 
 ```carbon
 fn F(p: i32*);
@@ -382,56 +372,48 @@ fn G() {
 
 A tuple of patterns can be used as a pattern.
 
--   _tuple-pattern_ ::= `(` [_expression_ `,`]\* _proper-pattern_ [`,`
-    _pattern_]\*
-    `,`? `)`
--   _proper-pattern_ ::= _tuple-pattern_
-
-A _tuple-pattern_ containing no commas is treated as grouping parens: the
-contained _proper-pattern_ is matched directly against the scrutinee. Otherwise,
-the behavior is as follows.
+-   _tuple-pattern_ ::= `(` [_pattern_ `,` [_pattern_ [`,` _pattern_]\* `,`? ] ]
+    `)`
+-   _pattern_ ::= _tuple-pattern_
 
-A tuple pattern is matched left-to-right. The scrutinee is required to be of
-tuple type.
-
-Note that a tuple pattern must contain at least one _proper-pattern_. Otherwise,
-it is a tuple-valued expression. However, a tuple pattern and a corresponding
-tuple-valued expression are matched in the same way because `==` for a tuple
-compares fields left-to-right.
+The scrutinee is required to be of tuple type, with the same arity as the number
+of nested _patterns_. It is converted to a tuple form by
+[form decomposition](values.md#form-conversions), and then each nested _pattern_
+is matched against the corresponding element of the converted scrutinee's
+[result](values.md#expression-forms). The tuple pattern matches if all of these
+sub-matches succeed.
 
 ### Struct patterns
 
 A struct can be matched with a struct pattern.
 
--   _proper-pattern_ ::= `{` [_field-init_ `,`]\* _proper-field-pattern_ [`,`
-    _field-pattern_]\*
-    `}`
--   _proper-pattern_ ::= `{` [_field-pattern_ `,`]+ `_` `}`
--   _field-init_ ::= _designator_ `=` _expression_
--   _proper-field-pattern_ ::= _designator_ `=` _proper-pattern_
--   _proper-field-pattern_ ::= _binding-pattern_
--   _field-pattern_ ::= _field-init_
--   _field-pattern_ ::= _proper-field-pattern_
+-   _struct-pattern_ ::= `{` [_field-pattern_ [`,` _field-pattern_ ]\* ] `}`
+-   _struct-pattern_ ::= `{` [_field-pattern_ `,`]+ `_` `}`
+-   _field-pattern_ ::= _designator_ `=` _pattern_
+-   _field-pattern_ ::= _binding-pattern_
+-   _pattern_ ::= _struct-pattern_
 
-A struct pattern resembles a struct literal, with at least one field initialized
-with a proper pattern:
+A struct pattern resembles a struct literal, except that the initializers can be
+patterns.
 
 ```carbon
 match ({.a = 1, .b = 2}) {
-  // Struct literal as an expression pattern.
+  // Struct literal as a pattern.
   case {.b = 2, .a = 1} => {}
-  // Struct pattern.
+  // Proper struct pattern.
   case {.b = n: i32, .a = m: i32} => {}
 }
 ```
 
-The scrutinee is required to be of struct type, and to have the same set of
-field names as the pattern. The pattern is matched left-to-right, meaning that
-matching is performed in the field order specified in the pattern, not in the
-field order of the scrutinee. This is consistent with the behavior of matching
-against a struct-valued expression, where the expression pattern becomes the
-left operand of the `==` and so determines the order in which `==` comparisons
-for fields are performed.
+The scrutinee is required to be of struct type, and every field name in the
+pattern must be a field name in the scrutinee. It is converted to a struct form
+by [form decomposition](values.md#form-conversions) and then each
+_field-pattern_ is matched with the same-named element of the converted
+scrutinee's [result](values.md#expression-forms). If the scrutinee result has
+any field names not present in the pattern, those sub-results are
+[discarded](values.md#form-conversions) in lexical order if the pattern has a
+trailing `_` (as in `{.a = 1, _}`), or diagnosed as an error if it does not. The
+struct pattern matches if all of these sub-matches succeed.
 
 In the case where a field will be bound to an identifier with the same name, a
 shorthand syntax is available: `a: T` is synonymous with `.a = a: T`.
@@ -442,6 +424,9 @@ match ({.a = 1, .b = 2}) {
 }
 ```
 
+Likewise, `ref a: T` is synonymous with `.a = ref a: T`, and `var a: T` is
+synonymous with `.a = var a: T`.
+
 If some fields should be ignored when matching, a trailing `, _` can be added to
 specify this:
 
@@ -462,19 +447,22 @@ This is valid even if all fields are actually named in the pattern.
 
 An alternative pattern is used to match one alternative of a choice type.
 
--   _proper-pattern_ ::= _callee-expression_ _tuple-pattern_
--   _proper-pattern_ ::= _designator_ _tuple-pattern_?
+-   _alternative-pattern_ ::= _callee-expression_ _tuple-pattern_?
+-   _alternative-pattern_ ::= _designator_ _tuple-pattern_? \_ _pattern_ ::=
+    _alternative-pattern_
 
 Here, _callee-expression_ is syntactically an expression that is valid as the
 callee in a function call expression, and an alternative pattern is
-syntactically a function call expression whose argument list contains at least
-one _proper-pattern_.
+syntactically a function call expression whose argument list may contain proper
+patterns.
 
-If a _callee-expression_ is provided, it is required to name a choice type
-alternative that has a parameter list, and the scrutinee is implicitly converted
-to that choice type. Otherwise, the scrutinee is required to be of some choice
-type, and the designator is looked up in that type and is required to name an
-alternative with a parameter list if and only if a _tuple-pattern_ is specified.
+Semantically, if the argument list contains no proper patterns, it behaves like
+an expression pattern. Otherwise, if a _callee-expression_ is provided, it is
+required to name a choice type alternative that has a parameter list, and the
+scrutinee is implicitly converted to that choice type. Otherwise, the scrutinee
+is required to be of some choice type, and the designator is looked up in that
+type and is required to name an alternative with a parameter list if and only if
+a _tuple-pattern_ is specified.
 
 The pattern matches if the active alternative in the scrutinee is the specified
 alternative, and the arguments of the alternative match the given tuple pattern
@@ -750,8 +738,10 @@ In order to match a value, whatever is specified in the pattern must match.
 Using `auto` for a type will always match, making `_: auto` the wildcard
 pattern.
 
-Any initializing expressions in the scrutinee of a `match` statement are
-[materialized](values.md#temporary-materialization) before pattern matching
+If the scrutinee expression's [form](values.md#expression-forms) contains any
+primitive forms with category "initializing", they are converted to ephemeral
+non-entire reference expressions by
+[materialization](values.md#temporary-materialization) before pattern matching
 begins, so that the result can be reused by multiple `case`s. However, the
 objects created by `var` patterns are not reused by multiple `case`s:
 
@@ -813,6 +803,131 @@ fn Foo() -> i32 {
 This extracts the first value from the result of calling `Bar()` and binds it to
 a local variable named `p` which is then returned.
 
+## Evaluation order
+
+A pattern matching operation's potentially-observable side effects are a series
+of calls to functions that might be user-defined. This includes function calls
+and operators in the scrutinee and in expression patterns, and also type
+conversions and category conversions. Note that category conversions on tuple
+and struct types, and type conversions between tuple and struct types, are not
+modeled as function calls, but are broken down into function calls on their
+elements. Note also that for function and operator calls in expressions, we are
+only considering top-level calls, that is calls that aren't inputs to other
+calls within the expression, because the entire sub-expression of a top-level
+call acts as a single unit for purposes of evaluation ordering.
+
+For example, suppose `A` is implicitly convertible to a `C` and `B` is
+implicitly convertible to `D`, but both conversions are value expressions
+(rather than initializing expressions), and consider the following code:
+
+```
+fn MakeA() -> A;
+fn MakeB() -> B;
+
+var cd: (C, D) = (MakeA(), MakeB());
+```
+
+Evaluation of the last line involves 6 function calls:
+
+1. Call `MakeA`.
+2. Call `A.(Core.ImplicitAsPrimitive(C)).Convert`, to convert the `A` object to
+   a `C` value, as part of type conversion.
+3. Call `A.(Core.Copy).Op` to copy the `C` value into the storage for `cd.0`, as
+   part of category conversion.
+4. Call `MakeB`.
+5. Call `B.(Core.ImplicitAsPrimitive(D)).Convert`.
+6. Call `B.(Core.Copy).Op`.
+
+> **Note:** These `Core` interfaces haven't been specified yet, and their
+> details may change.
+
+To define the evaluation order of these calls, we have to consider the
+dependencies between them, which we'll model as a DAG, with function calls as
+nodes, and edges representing data dependencies. It will also be useful to
+include leaf patterns (that is, patterns that have no subpatterns) as nodes in
+the graph; they don't have side effects as such, so they aren't part of the
+evaluation order, but they do constrain the evaluation order.
+
+```mermaid
+%%{init: {'themeVariables': {'fontFamily': 'monospace'}}}%%
+flowchart BT
+  2["A.(Core.ImplicitAsPrimitive(C))"]-->1[/"MakeA"\]
+  3["A.(Core.Copy)"]-->2
+  5["B.(Core.ImplicitAsPrimitive(D))"]-->4[/"MakeB"\]
+  6["B.(Core.Copy)"]-->5
+  7[\"cd: (C, D)"/]-->3
+  7-->6
+```
+
+This DAG will always have a few key properties:
+
+-   The sources are the primitive patterns. Only the sources can have multiple
+    out-edges.
+-   The sinks are function calls in the scrutinee expression, and in expression
+    patterns. Only scrutinee expression sinks can have multiple in-edges.
+-   The interior nodes always have one edge in and one edge out, forming a set
+    of paths that connect a source to a sink.
+-   The paths to calls in expression patterns are trivial: they consist of a
+    single edge from an expression pattern source to a function call sink.
+    Furthermore, a given source or sink has at most one such edge.
+-   Each path to the scrutinee connects a type in the pattern to a type in the
+    scrutinee, and together the paths uniquely cover the entire pattern and
+    scrutinee types. Furthermore, they are minimal, in the sense that unless the
+    path is a single edge, its source and sink types won't both be tuple or
+    struct types.
+
+> **Future work:** this design needs to be reconciled with the design for
+> [user-defined sum types](sum_types.md#user-defined-sum-types), because
+> `Match.Op` can violate this topology. This should probably be folded into a
+> broader redesign of sum type customization, which we expect to be necessary
+> for other reasons.
+
+The order of evaluation is determined by a depth-first postorder traversal of
+the this DAG: while visiting a node, we recursively visit all its children, and
+a call occurs when we finish visiting the corresponding node (revisiting a node
+is a no-op). By eagerly consuming the result of each function call as soon as
+possible, this minimizes the number of simultaneously-live temporaries, which
+enables more efficient code generation.
+
+When visiting a pattern, we visit its out-paths in the scrutinee type's
+left-to-right source code order (recall that each path is associated with a
+unique part of the scrutinee type). An edge to an expression pattern call, if
+any, is visited last. The patterns themselves are visited in their own
+left-to-right source code order. So, returning to our earlier example, the 6
+function calls will be evaluated in the order we listed them.
+
+In some cases, visiting the patterns in their own order may lead to visiting the
+types within a scrutinee call out of order, but if it would lead to visiting the
+scrutinee calls themselves out of order, the program is ill-formed. For example:
+
+```carbon
+// ❌ Error: visiting `.c: C` first leads to evaluating `MakeA()` before
+// `MakeB()`
+var {c: C, d: D} = {.d = MakeB(), .c = MakeA()};
+
+// ✅ OK: only one pattern, and we use scrutinee order to visit its children.
+var cd: {.c: C, .d: D} = {.d = MakeB(), .c = MakeA()};
+
+// ✅ OK: only one scrutinee call, so it can't be out of order.
+fn MakeAB() -> {.d: B, .c: A};
+var {c: C, d: D} = MakeAB();
+```
+
+As a result, the overall evaluation order is always consistent with the written
+order of the patterns, and with the written order of the scrutinee expressions.
+Within those constraints, the order of the scrutinee types acts as a
+tie-breaker. Note in particular that this means the fields of a struct-type
+binding are not necessarily initialized in declaration order.
+
+Note that generally speaking, pattern-match evaluation stops as soon as it's
+known that the match will fail, in which case only a prefix of the full
+evaluation order will be evaluated.
+
+### Alternatives considered
+
+-   [Breadth-first evaluation order](/proposals/p5545.md#breadth-first-evaluation-order)
+-   [Depth-first evaluation with a different "horizontal" order](/proposals/p5545.md#depth-first-evaluation-with-a-different-horizontal-order)
+
 ## Open questions
 
 ### Slice or array nested value pattern matching

+ 9 - 0
docs/design/tuples.md

@@ -12,6 +12,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
 -   [Overview](#overview)
 -   [Element access](#element-access)
+-   [Conversion](#conversion)
     -   [Empty tuples](#empty-tuples)
     -   [Trailing commas and single-element tuples](#trailing-commas-and-single-element-tuples)
     -   [Tuple of types and tuple types](#tuple-of-types-and-tuple-types)
@@ -64,6 +65,14 @@ fn Choose(template N:! i32) -> i32 {
 }
 ```
 
+## Conversion
+
+A tuple type `Source` can be converted to a tuple type `Dest` if they have the
+same number of elements, and each element type of `Source` is convertible to the
+corresponding element type of `Dest`, and the conversion is implicit if all of
+the element type conversions are implicit. See
+[here](values.md#type-conversions) for full details.
+
 ### Empty tuples
 
 `()` is the empty tuple. This is used in other parts of the design, such as

+ 543 - 81
docs/design/values.md

@@ -20,6 +20,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Local variables](#local-variables)
     -   [Consuming function parameters](#consuming-function-parameters)
 -   [Reference expressions](#reference-expressions)
+    -   [Entire reference expressions](#entire-reference-expressions)
     -   [Durable reference expressions](#durable-reference-expressions)
     -   [Ephemeral reference expressions](#ephemeral-reference-expressions)
 -   [Value expressions](#value-expressions)
@@ -31,6 +32,11 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
     -   [Function calls and returns](#function-calls-and-returns)
         -   [Deferred initialization from values and references](#deferred-initialization-from-values-and-references)
         -   [Declared `returned` variable](#declared-returned-variable)
+-   [Expression forms](#expression-forms)
+    -   [Initializing results](#initializing-results)
+    -   [Form conversions](#form-conversions)
+        -   [Type conversions](#type-conversions)
+        -   [Category conversions](#category-conversions)
 -   [Pointers](#pointers)
     -   [Reference types](#reference-types)
     -   [Pointer syntax](#pointer-syntax)
@@ -64,21 +70,21 @@ itself.
 
 ### Expression categories
 
-There are three expression categories in Carbon:
+There are three primary expression categories in Carbon:
 
 -   [_Value expressions_](#value-expressions) produce abstract, read-only
     _values_ that cannot be modified or have their address taken.
 -   [_Reference expressions_](#reference-expressions) refer to _objects_ with
     _storage_ where a value may be read or written and the object's address can
     be taken.
--   [_Initializing expressions_](#initializing-expressions) require storage to
-    be provided implicitly when evaluating the expression. The expression then
-    initializes an object in that storage. These are used to model function
-    returns, which can construct the returned value directly in the caller's
-    storage.
+-   [_Initializing expressions_](#initializing-expressions) require a result
+    location to be provided implicitly when evaluating the expression. The
+    expression then initializes an object in that location. These are used to
+    model function returns, which can construct the returned value directly in
+    the caller's storage.
 
-Expressions in one category can be converted to any other category when needed.
-The primitive conversion steps used are:
+Expressions in one category can be implicitly converted to any other primary
+category when needed. The primitive conversion steps used are:
 
 -   [_Value acquisition_](#value-acquisition) forms a value expression from the
     current value of the object referenced by a reference expression.
@@ -97,13 +103,36 @@ These conversion steps combine to provide the transitive conversion table:
 |    to **reference** | direct init + materialize | ==        | materialize           |
 | to **initializing** | direct init               | copy init | ==                    |
 
-Reference expressions formed through temporary materialization are called
-[_ephemeral reference expressions_](#ephemeral-reference-expressions) and have
-restrictions on how they are used. In contrast, reference expressions that refer
-to declared storage are called
-[_durable reference expressions_](#durable-reference-expressions). Beyond the
-restrictions on what is valid, there is no distinction in their behavior or
-semantics.
+Reference expressions are divided into 2x2 sub-categories: they can be either
+[_ephemeral_](#ephemeral-reference-expressions) or
+[_durable_](#durable-reference-expressions), and either _entire_ or
+_non-entire_.
+
+Ephemeral reference expressions are formed through temporary materialization,
+and have restrictions on how they are used. In contrast, durable reference
+expressions refer to storage that outlives the expression, and typically has a
+declared name. Entire reference expressions can only refer to complete objects,
+whereas non-entire reference expressions can refer to both complete objects and
+sub-objects (such as class fields and base class sub-objects). As a consequence,
+only entire reference expressions can be destructively moved.
+
+> **Future work:** This means that pointer-dereference expressions are
+> non-entire, but we will presumably want to be able to destructively move from
+> them. We need to figure out how to support that without violating the
+> invariant that a live object has live fields.
+
+Value binding and copy initialization can be applied to any reference
+expression, but materialization only produces ephemeral entire reference
+expressions. An entire reference expression can be implicitly converted to
+non-entire; this has no run-time effect because it merely discards static
+object-completeness information. Non-entire reference expressions can only be
+converted to entire reference expressions by round-tripping through
+copy-initialization and materialization. Non-durable-reference expressions
+cannot be implicitly converted to durable reference expressions at all.
+
+> **TODO:** Determine how these reference sub-categories relate to memory-safety
+> properties like uniqueness, and make sure their names are aligned with
+> memory-safety terminology.
 
 #### Value acquisition
 
@@ -157,9 +186,20 @@ fn Sum(x: i32, y: i32) -> i32 {
 Value bindings require the matched expression to be a _value expression_,
 converting it into one as necessary.
 
-A _variable pattern_ is introduced with the `var` keyword. It declares storage
-for a new object, and initializes it from the matched expression, which must be
-an initializing expression.
+A _variable pattern_ is introduced with the `var` keyword. The matched
+expression must be an ephemeral entire reference expression (which typically
+requires the matched expression to be materialized); the `var` pattern takes
+ownership of the newly-allocated temporary storage it refers to, which extends
+its lifetime to the end of the enclosing scope. The subpattern is then matched
+against a _durable_ entire reference expression to the object in that storage.
+
+> **Open question:** This implies that `var field: T = F().field;` doesn't
+> perform any copies or moves on `T`. This, in turn, implies that the storage
+> for `field` must be laid out as part of a complete `typeof(F())` object
+> layout, which is initialized by the call to `F()`. All other members of that
+> layout are immediately destroyed, and their storage is theoretically reusable
+> after that point, but it's unclear if this is the right default, or how to
+> enable user code to override that default when it's the wrong tradeoff.
 
 A _reference binding pattern_ is a binding pattern that is nested under a `var`
 pattern. It introduces a name called a _reference binding_ that is a
@@ -174,8 +214,10 @@ fn Example() {
   let x: i64 = 1;
 
   // `2` also starts as a value expression, but the variable pattern requires it
-  // to be converted to an initializing expression by using the value `2` to
-  // initialize the provided variable storage that `y` will refer to.
+  // to be converted to an ephemeral entire reference expression by using the
+  // value `2` to initialize temporary storage, which the variable pattern
+  // takes ownership of. The reference binding pattern is then bound to a
+  // durable reference to the newly-initialized object.
   var y: i64 = 2;
 
   // Allowed to take the address and mutate `y` as it is a durable reference
@@ -214,8 +256,10 @@ inner `var` pattern here:
 fn DestructuringExample() {
   // Both `1` and `2` start as value expressions. The `x` binding directly
   // matches `1`. For `2`, the variable pattern requires it to be converted to
-  // an initializing expression by using the value `2` to initialize the
-  // provided variable storage that `y` will refer to.
+  // an ephemeral entire reference expression by using the value `2` to
+  // initialize temporary storage, which the variable pattern takes ownership
+  // of. The reference binding `y` is then bound to a durable reference to the
+  // newly-initialized object.
   let (x: i64, var y: i64) = (1, 2);
 
   // Just like above, we can take the address and mutate `y`:
@@ -250,9 +294,9 @@ This allows us to model an important special case of function inputs -- those
 that are _consumed_ by the function, either through local processing or being
 moved into some persistent storage. Marking these in the pattern and thus
 signature of the function changes the expression category required for arguments
-in the caller. These arguments are required to be _initializing expressions_,
-potentially being converted into such an expression if necessary, that directly
-initialize storage dedicated-to and owned-by the function parameter.
+in the caller. These arguments are required to be _ephemeral entire reference
+expressions_, potentially being converted into such an expression if necessary,
+whose storage will be dedicated-to and owned-by the function parameter.
 
 This pattern serves the same purpose as C++'s pass-by-value when used with types
 that have non-trivial resources attached to pass ownership into the function and
@@ -264,9 +308,60 @@ makes this a use case that requires a special marking on the declaration.
 _Reference expressions_ refer to _objects_ with _storage_ where a value may be
 read or written and the object's address can be taken.
 
-There are two sub-categories of reference expressions: _durable_ and
-_ephemeral_. These refine the _lifetime_ of the underlying storage and provide
-safety restrictions reflecting that lifetime.
+Reference expressions can be either _durable_ or _ephemeral_. These refine the
+_lifetime_ of the underlying storage and provide safety restrictions reflecting
+that lifetime. Reference expressions can also be either _entire_ or
+_non-entire_, depending on whether the referenced object is known to be complete
+(rather than a sub-object of another object).
+
+### Entire reference expressions
+
+An _entire reference expression_ is one that is statically known to refer to a
+complete object. Other references are _non-entire_. Durable and ephemeral
+reference expressions can both be either entire or non-entire (although
+non-entire ephemeral references are rare). Unless otherwise specified, an
+expression or operation that produces a reference produces a non-entire
+reference.
+
+Note that a non-entire reference expression still _might_ refer to a complete
+object; the language rules just don't _guarantee_ that is does. As a result, an
+entire reference can be implicitly converted to a non-entire reference (with the
+same durability), because this merely discards the knowledge that the object is
+complete. By the same token, there is no context that requires a non-entire
+reference; only contexts that accept both, that accept only entire references,
+or that don't accept references at all.
+
+Currently, the only context that requires an entire reference is the scrutinee
+of a `var` pattern, which is required to be an entire ephemeral reference (and
+is [converted](#category-conversions) to that category if necessary).
+
+> **Note:** This extends the lifetime of the reference, so it must be possible
+> to determine _which_ temporary an ephemeral entire reference refers to, so
+> that the implementation knows which lifetime to extend. Under the current
+> language rules, this can be done statically.
+
+> **Open question:** Should we extend the language in ways that would force that
+> determination to be dynamic? For example, should we allow
+> `if c then r1 else r2` to be an entire ephemeral reference expression if `r1`
+> and `r2` are? As a more extreme example, should we support functions that take
+> and return entire ephemeral references?
+
+There are several kinds of expressions that produce entire references. For
+example:
+
+-   The name of an object introduced with a
+    [variable binding pattern](pattern_matching.md#name-binding-patterns) (in
+    other words, a name that was declared with `var <name> : <type>`) is a
+    durable entire reference.
+-   a member access expression `x.member` or `x.(member)` is an entire reference
+    if `x` is an initializing or entire ephemeral reference expression with a
+    struct or tuple type.
+-   The result of materialization is an entire ephemeral reference.
+-   When a [tuple pattern](pattern_matching.md#tuple-patterns) or
+    [struct pattern](pattern_matching.md#struct-patterns) is matched with an
+    ephemeral entire reference scrutinee, that scrutinee is destructured into
+    ephemeral entire references to its elements, which are then matched with the
+    corresponding subpatterns.
 
 ### Durable reference expressions
 
@@ -274,7 +369,8 @@ _Durable reference expressions_ are those where the object's storage outlives
 the full expression and the address could be meaningfully propagated out of it
 as well.
 
-There are two contexts that require a durable reference expression in Carbon:
+There are several contexts where durable reference expressions are required. For
+example:
 
 -   [Assignment statements](/docs/design/assignment.md) require the
     left-hand-side of the `=` to be a durable reference. This stronger
@@ -282,9 +378,14 @@ There are two contexts that require a durable reference expression in Carbon:
     the `Carbon.Assign.Op` interface method.
 -   [Address-of expressions](#pointer-syntax) require their operand to be a
     durable reference and compute the address of the referenced object.
+-   [`ref` binding patterns](pattern_matching.md#name-binding-patterns) require
+    their scrutinee to be a durable reference.
+-   If a function's [return form](#function-calls-and-returns) contains `ref`
+    tags, `return` statements require the corresponding parts of the operand to
+    be durable reference expressions.
 
-There are several kinds of expressions that produce durable references in
-Carbon:
+There are also several kinds of expressions that produce durable references. For
+example:
 
 -   Names of objects introduced with a
     [reference binding](#binding-patterns-and-local-variables-with-let-and-var):
@@ -295,6 +396,8 @@ Carbon:
 -   [Indexing](/docs/design/expressions/indexing.md) into a type similar to
     C++'s `std::span` that implements `IndirectIndexWith`, or indexing into any
     type with a durable reference expression such as `local_array[i]`.
+-   Calls to functions whose [return forms](#function-calls-and-returns) contain
+    `ref`.
 
 Durable reference expressions can only be produced _directly_ by one of these
 expressions. They are never produced by converting one of the other expression
@@ -305,22 +408,34 @@ categories into a reference expression.
 We call the reference expressions formed through
 [temporary materialization](#temporary-materialization) _ephemeral reference
 expressions_. They still refer to an object with storage, but it may be storage
-that will not outlive the full expression. Because the storage is only
-temporary, we impose restrictions on where these reference expressions can be
-used: their address can only be taken implicitly as part of a method call whose
-`self` parameter is marked with the `ref` specifier.
-
-**Future work:** The current design allows directly requiring an ephemeral
-reference for `ref`-methods because this replicates the flexibility in C++ --
-very few C++ methods are L-value-ref-qualified which would have a similar effect
-to `ref`-methods requiring a durable reference expression. This is leveraged
-frequently in C++ for builder APIs and other patterns. However, Carbon provides
-more tools in this space than C++ already, and so it may be worth evaluating
-whether we can switch `ref`-methods to the same restrictions as assignment and
-`&`. Temporaries would never have their address escaped (in a safe way) in that
-world and there would be fewer different kinds of entities. But this is reserved
-for future work as we should be very careful about the expressivity hit being
-tolerable both for native-Carbon API design and for migrated C++ code.
+that will not outlive the full expression, and so it can't be used where a
+durable reference is expected.
+
+> **Future work:** The current design does not support mutating ephemeral
+> references (or initializing expressions): assigning to an ephemeral reference
+> is disallowed directly, and invoking mutating methods is disallowed because
+> the `ref self` parameter can only bind to a durable reference. In C++ it's
+> unusual but not rare to intentionally mutate a temporary, such as in a
+> builder-style method chain (for example `MakeFoo().SetBar().AddBaz()`), so
+> Carbon will need to provide some interop and migration target for that kind of
+> code.
+
+There is one context that requires an ephemeral reference expression in Carbon:
+the scrutinee of a
+[`var` pattern](#binding-patterns-and-local-variables-with-let-and-var) (which
+also requires the reference to be entire).
+
+There are only a few ways to produce an ephemeral reference expression. Most
+notably:
+
+-   The result of materialization is an entire ephemeral reference.
+-   A member access expression `x.member` or `x.(member)` is an ephemeral
+    reference if `x` is an initializing or ephemeral reference.
+-   When a [tuple pattern](pattern_matching.md#tuple-patterns) or
+    [struct pattern](pattern_matching.md#struct-patterns) is matched with an
+    initializing or ephemeral reference scrutinee, that scrutinee is
+    destructured into ephemeral references to its elements, which are then
+    matched with the corresponding subpatterns.
 
 ## Value expressions
 
@@ -487,8 +602,9 @@ The specific tradeoff here is covered in a proposal
 ## Initializing expressions
 
 Storage in Carbon is initialized using _initializing expressions_. Their
-evaluation produces an initialized object in the storage, although that object
-may still be _unformed_.
+evaluation takes a _result location_ as an implicit input, and produces an
+initialized object at that location, although that object may still be
+_unformed_.
 
 **Future work:** More details on initialization and unformed objects should be
 added to the design from the proposal
@@ -506,29 +622,37 @@ the provided storage.
 **Future work:** The design should be expanded to fully cover how copying is
 managed and linked to from here.
 
-The first place where an initializing expression is _required_ is to satisfy
-[_variable patterns_](#binding-patterns-and-local-variables-with-let-and-var).
-These require the expression they match to be an initializing expression for the
-storage they create. The simplest example is the expression after the `=` in a
-local `var` declaration.
-
-The next place where a Carbon expression requires an initializing expression is
-the expression operand to `return` statements. We expand more completely on how
-return statements interact with expressions, values, objects, and storage
-[below](#function-calls-and-returns).
-
-The last path that requires forming an initializing expression in Carbon is when
-attempting to convert a non-reference expression into an ephemeral reference
-expression: the expression is first converted to an initializing expression if
-necessary, and then temporary storage is materialized to act as its output, and
-as the referent of the resulting ephemeral reference expression.
+There are no syntactic contexts in Carbon that always require an initializing
+expression, and no expression syntax that always produces an initializing
+expression. By default, function call expressions are initializing expressions,
+and correspondingly the operand of `return` is required to be an initializing
+expression, but this default can be overridden by the
+[function signature](#function-calls-and-returns).
+
+Initializing expressions can also be created implicitly, when attempting to
+convert an expression into an ephemeral entire reference expression
+(particularly to match a `var` pattern): the expression is first converted to an
+initializing expression if necessary, and then temporary storage is materialized
+to act as its output, and as the referent of the resulting ephemeral reference
+expression.
 
 ### Function calls and returns
 
-Function calls in Carbon are modeled directly as initializing expressions --
-they require storage as an input and when evaluated cause that storage to be
-initialized with an object. This means that when a function call is used to
-initialize some variable pattern as here:
+The [result](#expression-forms) of a function call can have an almost arbitrary
+form. The return clause of a function signature consists of `->` followed by a
+_return form_, an expression-like syntax that specifies not only the type but
+also the form of the function call's result. `return` expressions in the
+function body are expected to have that form, and are converted to it if
+necessary. When a function is declared without a return clause, it behaves from
+the caller's point of view as if the return clause were `-> ()`, but `return`
+statements in the function body don't take operands (and can be omitted at the
+end of the function).
+
+In the common case, the return form is a type expression, in which case calls
+are modeled directly as initializing expressions -- they require storage as an
+input and when evaluated cause that storage to be initialized with an object.
+This means that when a function call is used to initialize some variable pattern
+as here:
 
 ```carbon
 fn CreateMyObject() -> MyType {
@@ -541,18 +665,64 @@ var x: MyType = CreateMyObject();
 The `<return-expression>` in the `return` statement actually initializes the
 storage provided for `x`. There is no "copy" or other step.
 
-> **Future work:** Extend this to also apply when a variable pattern is
-> initialized from a tuple/struct literal, or a tuple/struct pattern with
-> variable subpatterns is initialized from a single function call.
+In the body of such a function, all `return` statement expressions are required
+to be initializing expressions and in fact initialize the storage provided to
+the function's call expression. This in turn causes the property to hold
+_transitively_ across an arbitrary number of function calls and returns. The
+storage is forwarded at each stage and initialized exactly once.
+
+More generally, the syntax and semantics of a return form are as follows:
+
+-   _return-clause_ ::= `->` _return-form_
+-   _return-form_ ::= _nesting-return-form_ | _auto-return-form_
+-   _nesting-return-form_ ::= _expression-return-form_ | _proper-return-form_
+
+Return forms can usually be nested, but syntaxes involving `auto` can only occur
+at top level. We further divide nesting return forms into expressions and
+"proper" return forms, but this is just a technical means of avoiding formal
+ambiguity in the grammar; it has no greater significance.
+
+-   _category-tag_ ::= `val` | `ref` | `var`
+
+These tags are used to specify "value", "non-entire durable reference", or
+"initializing" expression category (respectively). Note that there is no way to
+express an entire or ephemeral reference category in a return form.
+
+-   _auto-return-form_ ::= _category-tag_? `auto`
+
+This denotes a primitive form with runtime phase and deduced type. The category
+is determined by _category-tag_ if present, or "initializing" otherwise.
+
+-   _proper-return-form_ ::= _category-tag_ _expression_
+
+This denotes a primitive form with runtime phase, category _category-tag_, and
+type "_expression_ `as type`".
+
+-   _expression-return-form_ ::= _expression_
+
+An expression with no _category-tag_ is equivalent to "`var` _expression_".
+
+-   _proper-return-form_ ::= `(` [_expression-return-form_ `,`]\* _proper-return-form_
+    [`,` _nesting-return-form_]\* `,`? `)`
 
-All `return` statement expressions are required to be initializing expressions
-and in fact initialize the storage provided to the function's call expression.
-This in turn causes the property to hold _transitively_ across an arbitrary
-number of function calls and returns. The storage is forwarded at each stage and
-initialized exactly once.
+A tuple literal of return forms denotes a tuple form whose sub-forms are
+specified by the comma-separated elements. To avoid formal ambiguity, this
+grammar rule requires at least one of the sub-forms to be proper.
 
-Note that functions without a specified return type work exactly the same as
-functions with a `()` return type for the purpose of expression categories.
+-   _expression-field-form_ ::= _designator_ `:` _expression-return-form_
+-   _proper-field-form_ ::= _designator_ `:` _proper-return-form_
+-   _field-form_ ::= _field-decl_
+-   _field-form_ ::= _proper-field-form_
+-   _proper-return-form_ ::= `{` [_expression-field-form_ `,`]\* _proper-field-form_
+    [`,` _field-form_]\* `}`
+
+A struct literal of return forms denotes a struct form whose field names and
+their forms are specified by the comma-separated field forms. To avoid formal
+ambiguity, this grammar rule requires at least one of the field forms to be
+proper.
+
+> **Open question:** Should there be a way to specify symbolic or template phase
+> in return forms?
 
 #### Deferred initialization from values and references
 
@@ -625,6 +795,295 @@ The model of initialization of returns also facilitates the use of
 [`returned var` declarations](control_flow/return.md#returned-var). These
 directly observe the storage provided for initialization of a function's return.
 
+## Expression forms
+
+We typically treat the category and type of an expression as independent
+properties. However, in some cases we need to deal with them as an integrated
+whole. The _form_ of an expression captures all of the information about it that
+is visible to the type system, while abstracting away all other information
+about it. Thus, forms are a generalization of types: what we conventionally call
+"types" are really the types of objects and values, whereas forms are the types
+of expressions and patterns.
+
+A _primitive form_ currently consists of a type, an expression category, an
+expression phase, and optionally a constant value (which is present if and only
+if the expression phase is not "runtime"). When dealing with primitive forms,
+which is the common case, we can treat each of those properties as independent.
+For convenience, in this section we will use the notation `<T, C, P, V>` to
+represent a primitive form with type `T`, category `C`, phase `P` and value `V`,
+but this is not Carbon syntax.
+
+Other forms are called _composite forms_, and there are two kinds:
+
+A _tuple form_ can be thought of as a tuple of forms, just as a tuple type can
+be thought of as a tuple of types. The form of a tuple literal is a tuple form,
+whose elements are the forms of the literal elements.
+
+> **TODO:** Extend this to support variadic forms.
+
+A _struct form_ can be thought of as a struct whose fields are forms, just as a
+struct type can be thought of as a struct whose fields are types. The form of a
+struct literal is a struct form with the same field names, whose values are the
+forms of the corresponding fields of the struct literal.
+
+The _type component_ of a form is defined as follows:
+
+-   The type component of a primitive form `<T, C, P, V>` is `T`.
+-   The type component of a tuple form is a tuple of the type components of its
+    elements.
+-   The type component of a struct form is a struct whose field names are the
+    field names of the struct form and whose field types are the type components
+    of the corresponding elements.
+
+The _category component_ and _phase component_ of a form are defined likewise.
+The category component of a struct form is called a _struct category_, and the
+category component of a tuple form is called a _tuple category_.
+
+The type of an expression is the type component of the expression's form.
+
+Evaluating an expression produces a _result_. It can be defined recursively in
+terms of the expression's form:
+
+-   The result of an initializing expression is an
+    [initializing result](#initializing-results).
+-   The result of a value expression is a value.
+-   The result of a reference expression is a reference of the same kind.
+-   The result of an expression with tuple form is a tuple of results.
+-   The result of an expression with struct form is a struct of results.
+
+An expression and its result always have the same form.
+
+The code that accesses the result of an expression is said to _consume_ that
+result, and every primitive-form result is consumed exactly once (except in
+certain narrow contexts where the result is known not to be initializing). If a
+result isn't explicitly accessed, such as when the expression is used as a
+statement, it is said to be _discarded_, which consumes it in the absence of an
+explicit consumer. Discarding an initializing result materializes and then
+immediately destroys it. Discarding an entire ephemeral reference destroys the
+object it refers to. Discarding a value or any other kind of reference is a
+no-op.
+
+### Initializing results
+
+As discussed earlier, evaluation of an initializing expression takes as an input
+the result location that it initializes, which is implicitly provided by the
+context in which the evaluation takes place. In some cases, the context may
+obtain the location from its own context, and so on. For example:
+
+```carbon
+class C {
+  private var i: i32;
+
+  fn Make() -> C {
+    return {.i = 0};
+  }
+}
+
+fn F() -> C {
+  return C.Make();
+}
+
+fn G() {
+  var c: C = F();
+}
+```
+
+By default, a function call is an initializing expression, and a `return`
+statement initializes the call's result location (which is passed as a hidden
+output parameter). So when the declaration of `c` is evaluated, its storage is
+implicitly passed into `F()` as an output parameter, which is initialized by the
+`return` statement inside `F`. When that `return` statement is evaluated to
+initialize the result location, it likewise implicitly passes the storage into
+`C.Make()` as an output parameter, which is initialized by the `return`
+statement inside `C.Make`. Finally, that `return` statement initializes the
+result location (which is still the location of `c`'s storage) by
+[direct initialization](#direct-initialization) from the value expression
+`{.i = 0}`.
+
+Notice that the implicit storage parameter propagates "backwards", into an
+expression from the code that uses its result. In order to simplify the
+description of the language, we usually won't explicitly discuss the result
+locations of initializing expressions, or how they're propagated. Instead, this
+propagation is encapsulated inside the _initializing result_, which is the
+notional result of an initializing expression.
+
+Whenever an initializing result is consumed, that implicitly means that the
+consumer passes a result location into the evaluation of the initializing
+expression. The source of that location depends on the consumer:
+
+-   If the consumer is a temporary materialization conversion, the result
+    location is newly-allocated temporary storage (which the consumer may
+    subsequently lifetime-extend to durable storage).
+-   If the consumer is a `return` statement, and the initializing result
+    corresponds to an initializing sub-form of the function's return form, the
+    result location is the implicit output parameter corresponding to that
+    initializing sub-form.
+
+### Form conversions
+
+A conversion between forms can be broken down into up to three steps: type
+conversion, category conversion, and phase conversion. These convert the form to
+a particular target type, category, and phase component (respectively). These
+steps aren't fully orthogonal: type conversions can change the category and
+phase components as a byproduct, and category conversions can change the phase
+component. However, category conversions can't change the type component, and
+phase conversions can't change either of the other two, so converting the type,
+then category, then phase, ensures that we converge on the desired result.
+
+Any of these steps may be omitted, depending on whether the context imposes
+requirements on the corresponding component. Most commonly, an operand position
+requires its operand to have a primitive form with a particular category,
+usually with a particular type, and sometimes with a particular phase.
+
+Phase conversions cannot change the form structure; they can only apply
+primitive phase conversions to primitive sub-forms. Type and category
+conversions are more complex, and are covered in the next two sections.
+
+Note that these rules will implicitly convert between primitive and composite
+forms in both directions (except that a composite containing references cannot
+be converted to a primitive form). As a result, although the difference between
+primitive and composite forms is observable by way of overloading, it can't
+reliably carry any higher-level meaning, and should be used only as an
+optimization tool.
+
+Note that this section describes the _logical structure_ of form conversions. As
+such, it primarily describes them "breadth-first", as a sequence of operations
+that each applies to the whole expression by recursively operating on its parts.
+However, the _physical execution_ of these conversions is actually depth-first,
+applying as many operations as possible to a minimal subexpression before moving
+on to the next one. The details of that process are described
+[here](pattern_matching.md#evaluation-order).
+
+#### Type conversions
+
+See [here](expressions/implicit_conversions.md) for overall information about
+type conversions. Conversions involving struct, tuple, and array types are
+described here because of their unique interactions with expression forms.
+
+> **TODO:** A forthcoming proposal is expected to update the type conversion
+> interfaces to permit user-defined conversions to depend on the form of the
+> input, and customize the form of the output. Once that is done, these "built
+> in" conversions should be presented as implementations of those interfaces,
+> possibly with some "magic" for things like introspecting on struct field
+> names.
+
+Each of the conversions described in this section is explicit if and only if it
+invokes another explicit type conversion. Otherwise, it is implicit.
+
+A type conversion of a primitive-form expression to a
+[compatible type](generics/terminology.md#compatible-types) just re-interprets
+the expression's result with a new type, so it requires no run-time work, and
+has the same category as the input expression.
+
+A result `source` that has a struct type can be converted to a struct type
+`Dest` if they have the same set of field names:
+
+-   If the type of `source` is `Dest`, return `source`.
+-   If `source` is a struct result, for each field name `F` in `Dest`,
+    type-convert `source.F` to `Dest.F`. Return a struct result where each field
+    `F` is set to the result of the corresponding conversion.
+-   If `source` is a primitive result, convert it to a struct result by
+    [form decomposition](#category-conversions), and then type-convert the
+    result to `Dest` and return the result.
+
+Note that the sub-conversions invoked here are not necessarily defined; if so,
+the conversion itself is not defined.
+
+There is a conversion to a class type `Dest` from a result `source` that has a
+struct type, if there is a conversion from `source` to a struct type that has
+the same field names as `Dest` (including a `.base` field if `Dest` is a derived
+class), with the same types, in the same order. The conversion type-converts
+`source` to that struct type, category-converts that to an initializing
+expression of the struct type, and then reinterprets it as an initializing
+expression of `Dest` (which is layout-compatible with the struct type by
+construction).
+
+Note that some fields of an object may be initialized directly by the evaluation
+of the source expression, while others may be initialized by the conversions
+described here. The conversions initialize fields in their declaration order,
+but the evaluation of the source expression always happens before any of the
+conversions, and happens in the source expression's lexical order, so the fields
+of an object are not necessarily initialized in declaration order.
+
+Conversions between tuple types are defined in the same way, treating tuples as
+structs that have fields named `.0`, `.1`, etc, in numerical order.
+
+There is a conversion to `array(T, N)` from any expression with a tuple form of
+exactly `N` elements, whose type components are convertible to `T`. The
+conversion is an initializing expression, which type-converts each source
+element to `T`, and initializes the corresponding array element from the result
+of that conversion.
+
+#### Category conversions
+
+_Form composition_ converts an expression of composite form with consistent
+category to a primitive form as follows (where `min` as applied to phases uses
+the ordering "runtime" < "symbolic" < "template"):
+
+-   An expression of tuple form
+    `(<T1, C, P1, V1>, <T2, C, P2, V2>, ... <TN, C, PN, VN>)` can be converted
+    to a primitive form
+    `<(T1, T2, ..., TN), C, min(P1, P2, ..., PN), (V1, V2, ... VN)>`.
+-   An expression of struct form
+    `{.a = <Ta, C, Pa, Va>, .b = <Tb, C, Pb, Vb>, ... .z = <Tz, C, Pz, Vz>}` can
+    be converted to a primitive form
+    `<{.a = Ta, .b = Tb, ... .z = Tz}, C, min(Pa, Pb, ... Pz), {.a = Va, .b = Vb, ... .z = Vz}>`.
+
+When `C` is "value", composition forms a value representation of the aggregate
+from value representations of the elements. When `C` is "initializing", it
+transforms initializing expressions for each element into a single initializing
+expression that initializes the whole aggregate. `C` cannot be a reference
+category, because an aggregate of references to independent objects can't be
+replaced by a reference to a single aggregate object in a single step.
+
+_Form decomposition_ is the inverse of form composition. It converts a
+primitive-form expression to a composite form as follows:
+
+-   An expression with primitive form `<(T0, T1, ..., TN), C, P, V>` can be
+    converted to a tuple form
+    `(<T0, CC, P, V.0>, <T1, CC, P, V.1>, ... <TN, CC, P, V.N>)`.
+-   An expression with primitive form
+    `<{.a = Ta, .b = Tb, ... .z = Tz}, C, P, V>` can be converted to a struct
+    form
+    `{.a = <Ta, CC, P, V.a>, .b = <Tb, CC, P, V.b>, ... .z = <Tz, CC, P, V.z>}`.
+
+The category `CC` of the resulting sub-forms is the same as `C`, with two
+exceptions:
+
+-   If `C` is "durable entire reference", `CC` will be "durable non-entire
+    reference", because the sub-forms don't refer to complete objects. This
+    doesn't apply to ephemeral entire references, because in that case form
+    decomposition implicitly ends the lifetime of the original aggregate,
+    promoting its elements to complete objects with independent lifetimes.
+-   If `C` is "initializing", the original expression is materialized before it
+    is decomposed, so `CC` will be "ephemeral entire reference".
+
+By convention, form decomposition is a no-op when applied to an expression with
+struct or tuple form.
+
+_Category conversion_ converts an expression to have a given category component
+without changing its type. The conversion works by combining form composition
+and decomposition with primitive category conversions, and is defined
+recursively:
+
+-   If the target category component is a tuple, the source form must have a
+    tuple type with the same arity. Convert the source to a tuple form by form
+    decomposition, and then category-convert each source sub-form to the
+    corresponding target sub-category.
+-   If the target category component is a struct, the source form must have a
+    struct type with the same set of field names in the same order. Convert the
+    source to a struct form by form decomposition, and then category-convert
+    each source sub-form to the corresponding target sub-category.
+-   If the target category is a primitive category `C`:
+    -   If the source form is primitive, convert to `C` by applying primitive
+        category conversions.
+    -   If the source form is composite and `C` is a reference category,
+        category-convert the source form to "initializing", and then convert the
+        result to `C` by applying primitive category conversions.
+    -   If the source form is composite and `C` is not a reference category,
+        category-convert each source sub-form to `C`, and then convert the
+        aggregate result of these conversions to `C` by form composition.
+
 ## Pointers
 
 Pointers in Carbon are the primary mechanism for _indirect access_ to storage
@@ -859,8 +1318,7 @@ functionality already proposed here or for [classes](/docs/design/classes.md):
     the most appealing as it _doesn't_ have the combinatorial explosion. But it
     is also very limited as it only applies to the implicit object parameter.
 -   Allow overloading between `var` and non-`var` parameters.
--   Expand the `ref` technique from object parameters to all parameters, and
-    allow overloading based on it.
+-   Allow overloading between `ref` and non-`ref` parameters in general.
 
 Perhaps more options will emerge as well. Again, the goal isn't to completely
 preclude pursuing this direction, but instead to try to ensure it is only
@@ -1040,6 +1498,8 @@ itself.
 -   [Exclusively using references](/proposals/p2006.md#exclusively-using-references)
 -   [Alternative pointer syntaxes](/proposals/p2006.md#alternative-pointer-syntaxes)
 -   [Alternative syntaxes for locals](/proposals/p2006.md#alternative-syntaxes-for-locals)
+-   [Mixed expression categories](/proposals/p5545.md#mixed-expression-categories)
+-   [Don't implicitly convert to less-primitive forms](/proposals/p5545.md#dont-implicitly-convert-to-less-primitive-forms)
 
 ## References
 
@@ -1048,9 +1508,11 @@ itself.
 -   [Proposal #618: `var` ordering][p0618]
 -   [Proposal #851: auto keyword for vars][p0851]
 -   [Proposal #2006: Values, variables, and pointers][p2006]
+-   [Proposal #5545: Expression form basics][p5545]
 
 [p0257]: /proposals/p0257.md
 [p0339]: /proposals/p0339.md
 [p0618]: /proposals/p0618.md
 [p0851]: /proposals/p0851.md
 [p2006]: /proposals/p2006.md
+[p5545]: /proposals/p5545.md

+ 477 - 0
proposals/p5545.md

@@ -0,0 +1,477 @@
+# Expression form basics
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/5545)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Mixed expression categories](#mixed-expression-categories)
+    -   [Don't implicitly convert to less-primitive forms](#dont-implicitly-convert-to-less-primitive-forms)
+    -   [Breadth-first evaluation order](#breadth-first-evaluation-order)
+    -   [Depth-first evaluation with a different "horizontal" order](#depth-first-evaluation-with-a-different-horizontal-order)
+    -   [Support binding `ref self` to ephemeral references](#support-binding-ref-self-to-ephemeral-references)
+
+<!-- tocstop -->
+
+## Abstract
+
+This proposal introduces the concept of a _form_, which is a generalization of
+"type" that encompasses all of the information about an expression that's
+visible to the type system, including type and expression category. Forms can be
+composed into _tuple forms_ and _struct forms_, which lets us track the
+categories of individual tuple and struct literal elements.
+
+## Problem
+
+It's unclear what expression category tuple and struct literals should have. For
+example, this code can only compile if the tuple literal is an initializing
+expression:
+
+```carbon
+var t: (NonMovable, NonMovable) = (MakeNonMovable(), MakeNonMovable())
+```
+
+But this code can only compile if the tuple literal is a value expression:
+
+```carbon
+let x: NonCopyable = MakeNonCopyable();
+let t: (NonCopyable, NonCopyable) = (x, MakeNonCopyable());
+```
+
+And there's plausible code that can't compile if the tuple literal has _any_
+single expression category:
+
+```carbon
+let x: NonCopyable = MakeNonCopyable();
+let (a: NonCopyable, var b: NonMovable) = (x, MakeNonMovable());
+```
+
+At present it's always possible to rewrite examples like that to avoid the
+problem by disaggregating the tuple patterns into separate statements. However,
+when the copy and move operations in question are expensive rather than outright
+disabled, those examples will result in silent inefficiency rather than a noisy
+build failure, which is less harmful but easier to overlook.
+
+## Background
+
+The Carbon toolchain already implements a solution to this problem: it treats
+tuple and struct literals as having a "mixed" expression category, and when
+individual elements of the literal are accessed (such as during pattern
+matching), the element's original category is propagated.
+
+Proposal [#5434](https://github.com/carbon-language/carbon-lang/pull/5434)
+introduces plausible use cases that cannot compile if we assign any single
+expression category to a tuple or struct literal, and there is no way to avoid
+the problem by rewriting. For example:
+
+```carbon
+fn F() -> (ref NonCopyable, NonMovable);
+let (a: NonCopyable, var b: NonMovable) = F();
+```
+
+## Proposal
+
+This proposal solves that problem by introducing the concept of a _form_, which
+is a generalization of "type" that encompasses all of the information about an
+expression that's visible to the type system, including type and expression
+category. In the common case, an expression has a _primitive form_ which
+consists of a type, an expression category, and a few other properties. However,
+a tuple literal has a _tuple form_, which is a tuple of the forms of its
+elements. This allows us to directly represent the fact that different elements
+have different categories, and propagate that difference into operations that
+access those elements.
+
+In order to help describe the semantics of forms, this proposal also formalizes
+the concept of the _result_ of an expression evaluation. Results are a
+generalization of values and references in the same way that forms are a
+generalization of types. In this proposal they are primarily a descriptive
+convenience, but they are also intended to function as the thing that a
+form-generic binding binds to, when that is proposed.
+
+The results of initializing expressions, called _initializing results_, have
+somewhat subtle semantics. Results present an idealized model of expression
+evaluation where information flows from each expression to the context where it
+is used, but initializing expressions require information to flow in both
+directions: the context supplies a storage location, and then the expression
+supplies the contents of that storage location. We finesse this "impedance
+mismatch" by saying that the initializing result represents an obligation on the
+context to supply a storage location (somewhat like a callback or
+`std::promise`), which it must fulfill by either materializing or transferring
+the result. Furthermore, even though this formally happens after the expression
+is evaluated, it is constrained in such a way that it can actually be computed
+beforehand and passed to the expression's hidden output parameter.
+
+This proposal also integrates type, category, and phase conversions into a
+unified set of rules for form conversions. Notably, those rules call for
+conversions to be evaluated depth-first (along with the expressions they convert
+from and the pattern-matches they feed into), and for struct conversions to use
+the field order of the source, not the target.
+
+Finally, this proposal splits the reference expression categories into _entire_
+and _non-entire_ references, where an entire reference is known to refer to a
+complete object. This lets us decouple materialization (which now produces an
+ephemeral entire reference) from `var` binding (which now expects an ephemeral
+entire reference), which lets us resolve a TODO to allow an initializing
+expression to be destructured into multiple `var` bindings.
+
+In the process of doing this, it became clear that the special case that allowed
+`ref self` patterns to match ephemeral references was not internally consistent,
+so that special case has been removed. We will need some way of supporting the
+use cases that were intended to be covered by that rule, but that is being left
+as future work.
+
+## Details
+
+See the edits in the
+[pull request](https://github.com/carbon-language/carbon-lang/pull/5545)
+associated with this proposal, particularly in `values.md`.
+
+## Rationale
+
+This proposal supports
+[performance-critical software](/docs/project/goals.md#performance-critical-software)
+and making
+[code easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+by ensuring that tuple and struct literals don't introduce unnecessary category
+conversions (which may cause build failures and performance overhead).
+
+## Alternatives considered
+
+### Mixed expression categories
+
+This proposal models an expression like `(x, MakeNonMovable())` from the earlier
+example as having a tuple form consisting of primitive forms with `NonCopyable`
+and `NonMovable` types (respectively) and "value" and "initializing" categories
+(respectively). We could instead support composite expression categories, so
+that it has type `(NonCopyable, NonMovable)` and expression category
+`(value, initializing)`.
+
+This would avoid the need to introduce the concept of "form", and preserve the
+existing separation between types and categories. That separation would be
+somewhat superficial (for example, an expression couldn't have a tuple
+expression category if it doesn't have a tuple type), but no more so than the
+separation between types and values.
+
+However, we expect to need an explicit syntax to express these properties of
+expressions, for example to define functions that return tuples whose elements
+have different categories. A syntax consisting of separate type and category
+tuples will be much less ergonomic, and much easier to misuse, than a syntax
+that combines both in a single tuple (for example,
+[#5434](https://github.com/carbon-language/carbon-lang/pull/5434) represents the
+form of `(x, MakeNonMovable())` as `(val NonCopyable, NonMovable)`).
+
+Furthermore, we anticipate needing to support code that is generic with respect
+to forms, not just with respect to types. We plan to achieve that with
+parameters of a special "form" type together with ways of deducing and using
+them. It might be possible to instead support category parameters that are
+deduced and used in conjunction with type parameters, but that would be
+syntactically onerous, and oblige users to keep each category correctly paired
+with the corresponding type, in order to bring them together at the point of
+use. Those challenges will be further compounded when/if it becomes possible to
+manipulate types and categories by way of metaprogramming.
+
+Given that we need to present forms as an integrated whole at the syntactic and
+metaprogramming levels, there is very little to be gained by decoupling them at
+the level of language semantics.
+
+### Don't implicitly convert to less-primitive forms
+
+Consider the following possible ways of initializing an array, where there is an
+implicit conversion from integer literals to `X`:
+
+```carbon
+impl Y as Core.ImplicitAs(X);
+let a: array(X, 3) = (MakeY(), MakeY(), MakeY());
+
+let x_tuple: (X, X, X) = (MakeY(), MakeY(), MakeY());
+let b: array(X, 3) = x_tuple;
+```
+
+Under this proposal, both `a` and `b` are valid. However, some people's
+intuition is that a tuple is different enough from an array that there should
+not be implicit conversions between them in general. In this mental model, a
+tuple-form expression wouldn't represent a tuple per se; rather, it would
+abstractly represent a sequence of results, which can be used to initialize
+either a tuple or an array (or a user-defined type). Similarly, a struct-form
+expression wouldn't represent a struct per se, but rather a more abstract
+sequence of _named_ results. Consequently, there would be no implicit conversion
+from a tuple value to a tuple form, and so `array` could disallow `b` by
+requiring its initializer to have a tuple form.
+
+Perhaps more importantly, making this distinction could simplify C++ interop,
+because we could map C++ braced initializer lists to tuple and struct forms,
+while mapping C++ `std::pair`s and `std::tuple`s to primitive-form Carbon
+tuples, and mapping C++ structs to primitive-form Carbon structs. We cannot do
+that under the status quo, because Carbon's implicit conversions from primitive
+to composite forms would have no C++ counterpart, and so overload resolution
+behavior would change too radically when crossing the language boundary.
+
+However, consider the following examples:
+
+```carbon
+let c: array(X, 3) = (MakeY(), MakeY(), MakeY()) as (X, X, X);
+
+let y_tuple: (Y, Y, Y) = (MakeY(), MakeY(), MakeY());
+let d: array(X, 3) = y_tuple as (X, X, X);
+```
+
+Presumably we would want the declaration of `c` to be valid, because it just
+makes the implicit type conversion from `a` explicit. That result emerges pretty
+naturally, because the `as` conversion operates element-wise on its tuple-form
+input, so its output would likewise have a tuple form. On the other hand, the
+declaration of `d` should not be valid, because it just makes the implicit type
+conversion from `b` explicit. That result does not emerge naturally; instead, we
+have to add a final form composition step at the end of the type conversion, to
+ensure the result has a primitive form (but only when the input had a primitive
+form).
+
+But that means we need to choose the category of that primitive form, that is
+whether the conversion from `(Y, Y, Y)` to `(X, X, X)` is a value expression or
+an initializing expression (form composition can't produce reference
+expressions). Upcoming proposals are expected to enable the user to define the
+implicit conversion from `Y` to `X` to have any category, so suppose that the
+conversion from `Y` to `X` is a reference expression. In that case, the
+requirement to convert to a primitive form breaks some use cases that would
+otherwise be valid:
+
+```carbon
+let (ref s1: X, ref s2: X, ref r3: X) = y_tuple as (X, X, X);
+```
+
+Even for usages that are not broken outright, this conversion may add
+substantial inefficiency, depending on which category we convert to:
+
+```carbon
+let (p1: X, p2: X, p3: X) = y_tuple as (X, X, X);
+```
+
+This requires 3 value acquisitions in either case, but if we convert to an
+initializing expression it also requires 3 copy-init and materialization steps.
+
+```carbon
+let (var q1: X, var q2: X, var q3: X) = y_tuple as (X, X, X);
+```
+
+In either case we're starting and ending with the object representation of `X`,
+passing through the initializing representation (which is typically very closely
+tied to the object representation), but if the type conversion produces a value
+expression we also pass through the value representation (which can be much less
+trivial to convert to and from).
+
+Finally, examples like this one are less efficient if we have to convert to a
+primitive form, regardless of which category we convert to:
+
+```carbon
+let (r1: X, var r2: X, var r3: X) = y_tuple as (X, X, X);
+```
+
+Some but not all of these problems can be mitigated by making the target
+category an input to the type conversion, but that has a number of unwelcome
+consequences:
+
+-   It adds complexity to the conversion APIs that user-defined types will
+    almost never need.
+-   We would need to change the syntax for `x as T` so that it specifies the
+    category as well as the type.
+-   It would enable user-defined conversions to produce different values
+    depending on the target category, which we definitely don't want.
+-   It makes overload resolution more complicated and more surprising.
+
+This alternative was considered and rejected in
+[leads issue #6160](https://github.com/carbon-language/carbon-lang/issues/6160).
+
+### Breadth-first evaluation order
+
+Under this proposal, the observable effects of a pattern-matching operation take
+place in depth-first order (with respect to tuple and struct elements), but
+breadth-first order would have several advantages:
+
+-   It would help enable us to define tuple type conversions as ordinary impls
+    of the type conversion APIs, because modeling conversion as a function call
+    is inherently breadth-first. Opting for depth-first evaluation seems to rule
+    that out.
+-   It would give us more options for supporting struct and class conversions
+    where the field orders doesn't match. For example, the previously-preferred
+    approach was to evaluate the source expression in its own field order, but
+    then perform conversions in the target's field order. This seemed to provide
+    a good balance of efficiency and ergonomics, but it's inherently
+    breadth-first. With the depth-first approach, we have to use a single field
+    order.
+-   It would simplify the specification, because the logical semantics of
+    conversion and pattern matching are much more naturally described
+    breadth-first, so opting for depth-first creates an "impedance mismatch"
+    between the logical and physical semantics.
+
+However, breadth-first evaluation order has a crucial efficiency cost: in all
+but the most trivial use cases, it forces (and in some sense even maximizes)
+lifetime overlap between the results of the function calls that make up the
+pattern-matching operation. That means it requires more temporary storage, and
+more work to manage that storage (particularly at the register level).
+
+For example, consider this
+[generated code](https://cpp.compiler-explorer.com/z/Pxaar4edb) for C++
+approximating the two options. The `LayerWise` function templates model the
+breadth-first evaluation order, while the `ElementWise` model the proposed
+depth-first order. Looking at the two element case for ARM:
+
+```asm
+void LayerWise<S1, S1>(S1, S1):
+        stp     x29, x30, [sp, #-32]!
+        stp     x20, x19, [sp, #16]
+        mov     x29, sp
+        mov     x19, x1
+        bl      Convert1(S1)
+        mov     x20, x0
+        mov     x0, x19
+        bl      Convert1(S1)
+        mov     x19, x0
+        mov     x0, x20
+        bl      Convert2(S2)
+        mov     x20, x0
+        mov     x0, x19
+        bl      Convert2(S2)
+        mov     x1, x0
+        mov     x0, x20
+        ldp     x20, x19, [sp, #16]
+        ldp     x29, x30, [sp], #32
+        b       void Target<S1, S1>(S1, S1)
+
+void ElementWise<S1, S1>(S1, S1):
+        stp     x29, x30, [sp, #-32]!
+        stp     x20, x19, [sp, #16]
+        mov     x29, sp
+        mov     x19, x1
+        bl      Convert1(S1)
+        bl      Convert2(S2)
+        mov     x20, x0
+        mov     x0, x19
+        bl      Convert1(S1)
+        bl      Convert2(S2)
+        mov     x1, x0
+        mov     x0, x20
+        ldp     x20, x19, [sp, #16]
+        ldp     x29, x30, [sp], #32
+        b       void Target<S1, S1>(S1, S1)
+```
+
+The breadth-first approach forces significantly more data movement (the extra
+moves between `x20` and `x19`) and temporary registers. This distinction
+continues in versions with more elements, making the problem more and more
+severe. This is an inherent cost, and not something we can realistically expect
+an optimizing compiler to recover.
+
+Furthermore, we don't see any general way for developers to work around this
+problem, and achieve the performance they would get automatically with
+depth-first evaluation. On the other hand, with depth-first evaluation within
+pattern-matching operations, developers can still ensure a breadth-first
+evaluation order relatively easily, by expressing each "layer" as a separate
+pattern-matching operation (for example with a sequence of statements, or a
+chain of function calls). In some cases, this may involve move operations that
+could be elided in a language-native breadth-first evaluation, but that depends
+on the to-be-determined design of move semantics.
+
+In short, the ergonomic costs of depth-first evaluation appear to be manageable,
+whereas the performance cost of breadth-first evaluation is unavoidable (and
+carries more weight, because supporting
+[performance-critical software](/docs/project/goals.md#performance-critical-software)
+is our top goal for the language).
+
+This alternative was considered and rejected in
+[leads issue #6456](https://github.com/carbon-language/carbon-lang/issues/6456).
+
+### Depth-first evaluation with a different "horizontal" order
+
+To order operations that don't have a dependency relationship, this proposal
+uses a hybrid scheme that follows both the lexical order of the primitive
+patterns and the lexical order of the scrutinee function calls (diagnosing an
+error if they conflict), and also follows the lexical order of the scrutinee's
+declared type to the extent that it doesn't conflict with the primitive pattern
+order.
+
+One potential drawback of this rule is it means that it means that, unlike C++,
+Carbon doesn't guarantee that the fields of an object are initialized in
+declaration order (or in any fixed order), and so it can't guarantee that
+they'll be destroyed in reverse order of initialization.
+
+We could solve that problem by guaranteeing to evaluate in the scrutinee type's
+field order. However, that would mean evaluating function calls out of lexical
+order in cases like this:
+
+```carbon
+var ab: {.a: A, .b: B} = {.b = MakeB(), .a = MakeA()};
+```
+
+Evaluating `MakeA()` before `MakeB()` risks causing surprises, or even bugs, and
+it would mean that the evaluation order within a single expression depends on
+how it's used.
+
+Instead, we adopt a rule that ensures side effects are evaluated in lexical
+order within both the pattern and the scrutinee. We expect that to be sufficient
+for struct types, where destruction order is unlikely to be an issue. Class
+types may be more sensitive to these ordering issues, so we may also need some
+way for the class to disallow initializers whose field order doesn't match the
+class, but this is left as future work.
+
+### Support binding `ref self` to ephemeral references
+
+`ref` patterns can only match durable reference expressions, but prior to this
+proposal, `ref self` patterns could match ephemeral references as a special-case
+exception. This was intended to support certain C++ idioms that rely on
+materializing a temporary and then mutating it in place, such as fluent
+builders. For example:
+
+```carbon
+class FooBuilder {
+   // These methods mutate `self` and then return a reference to it.
+   fn SetBar[ref self: Self]() -> ref Self;
+   fn SetBaz[ref self: Self]() -> ref Self;
+
+   fn Build[ref self: Self]() -> Foo;
+}
+fn MakeFoo() -> FooBuilder;
+
+let foo: Foo = MakeFoo().SetBar().SetBaz().Build();
+```
+
+Prior to this proposal, this code would be valid: `MakeFoo()` is an initializing
+expression, but when it is matched with `ref self: Self` as part of the `SetBar`
+call, it is implicitly converted to an ephemeral reference, and then the
+special-case rule allows `ref self: Self` to bind to the materialized temporary.
+
+However, those rules also imply that code like this would be valid:
+
+```carbon
+let builder: FooBuilder = MakeFoo();
+builder.SetBar();
+builder.SetBaz();
+let foo: Foo = builder.Build();
+```
+
+Here the programmer has accidentally used `let` instead of `var`, so `builder`
+is an immutable value. But a value expression can be implicitly converted to an
+initializing expression by direct initialization, and as we already saw, it's
+valid to call `MakeFoo()` and `MakeBar()` on an initializing expression. So this
+code repeatedly materializes, mutates, and then discards a copy of `builder`,
+and then ultimately initializes `foo` with the state returned by `MakeFoo()`,
+which is surely not what the programmer intended. This sort of "lost mutation"
+bug is exactly what the distinction between durable and ephemeral references was
+intended to prevent, but the `ref self` special case combines with the
+transitivity of category conversions to defeat that protection.
+
+A proper resolution of this issue seems beyond the scope of this proposal, so
+this proposal removes that special case without replacement, leaving the problem
+of supporting idioms like fluent builders as future work.