3 лет назад · 4885a2bf8e
--- a/proposals/p2274.md
+++ b/proposals/p2274.md
@@ -0,0 +1,405 @@
 
				+# Subscript syntax and semantics
			
 
				+
			
 
				+<!--
			
 
				+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
			
 
				+Exceptions. See /LICENSE for license information.
			
 
				+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
			
 
				+-->
			
 
				+
			
 
				+[Pull request](https://github.com/carbon-language/carbon-lang/pull/2274)
			
 
				+
			
 
				+<!-- toc -->
			
 
				+
			
 
				+## Table of contents
			
 
				+
			
 
				+-   [Abstract](#abstract)
			
 
				+-   [Problem](#problem)
			
 
				+-   [Background](#background)
			
 
				+-   [Proposal](#proposal)
			
 
				+-   [Details](#details)
			
 
				+    -   [Examples](#examples)
			
 
				+-   [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
			
 
				+-   [Alternatives considered](#alternatives-considered)
			
 
				+    -   [Different subscripting syntaxes](#different-subscripting-syntaxes)
			
 
				+    -   [Multiple indices](#multiple-indices)
			
 
				+    -   [Read-only subscripting](#read-only-subscripting)
			
 
				+    -   [Rvalue-only subscripting](#rvalue-only-subscripting)
			
 
				+    -   [Map-like subscripting](#map-like-subscripting)
			
 
				+        -   [Usage-based rewriting](#usage-based-rewriting)
			
 
				+
			
 
				+<!-- tocstop -->
			
 
				+
			
 
				+## Abstract
			
 
				+
			
 
				+Adds support for subscripting using the conventional square-bracket syntax, with
			
 
				+support for both array-like and slice-like semantics.
			
 
				+
			
 
				+Cloned from [#1356](https://github.com/carbon-language/carbon-lang/pull/1356),
			
 
				+which was cloned from
			
 
				+[#1059](https://github.com/carbon-language/carbon-lang/pull/1059).
			
 
				+
			
 
				+## Problem
			
 
				+
			
 
				+Carbon needs a convenient way of indexing into objects like arrays and slices.
			
 
				+
			
 
				+## Background
			
 
				+
			
 
				+For purposes of this proposal, a "slice" is an object that refers to some region
			
 
				+of an array, and provides access to it, such as a `std::string_view` or
			
 
				+`std::span` in C++. Slices have subtly different indexing API semantics than
			
 
				+arrays: indexing into an array can produce an l-value only if the array itself
			
 
				+is an l-value, but indexing into a slice can produce an l-value regardless of
			
 
				+the slice's value category.
			
 
				+
			
 
				+Interfaces are Carbon's
			
 
				+[only static open extension mechanism](/docs/project/principles/static_open_extension.md),
			
 
				+so user-defined indexing must be defined in terms of interfaces. In order to
			
 
				+support
			
 
				+[definition checking of generic functions](/docs/design/generics/goals.md#generics-will-be-checked-when-defined),
			
 
				+it must be possible to fully typecheck indexing operations using only the
			
 
				+information exposed by those interfaces. And in order to provide
			
 
				+[coherence](/docs/design/generics/goals.md#coherence), a given indexing
			
 
				+expression must execute the same code regardless of how much type information we
			
 
				+have about it, so long as we have enough type information to establish that the
			
 
				+expression is valid.
			
 
				+
			
 
				+The primary problem this proposal needs to solve is how to support both
			
 
				+array-like and slice-like indexing semantics while satisfying those
			
 
				+requirements.
			
 
				+
			
 
				+## Proposal
			
 
				+
			
 
				+Carbon will support indexing using the conventional `a[i]` subscript syntax.
			
 
				+When `a` is an l-value, the result of subscripting will always be an l-value,
			
 
				+but when `a` is an r-value, the result can be an l-value or an r-value,
			
 
				+depending on which interface the type implements:
			
 
				+
			
 
				+-   If subscripting an r-value produces an r-value result, as with an array, the
			
 
				+    type should implement `IndexWith`.
			
 
				+-   If subscripting an r-value produces an l-value result, as with a slice, the
			
 
				+    type should implement `IndirectIndexWith`.
			
 
				+
			
 
				+`IndirectIndexWith` is a subtype of `IndexWith`, and subscript expressions are
			
 
				+rewritten to method calls on `IndirectIndexWith` if the type is known to
			
 
				+implement that interface, or to method calls on `IndexWith` otherwise.
			
 
				+
			
 
				+A type can implement at most one of these two interfaces. `IndirectIndexWith`
			
 
				+provides a final blanket `impl` of `IndexWith`, which ensures coherence.
			
 
				+
			
 
				+One implication of this design is that the requirement of coherence must apply
			
 
				+to value categories as well as types (and for those purposes, "l-value" is a
			
 
				+subcategory of "r-value"). This implies that for any valid expression containing
			
 
				+an r-value sub-expression, replacing that sub-expression with an l-value
			
 
				+expression cannot change the validity or semantics of the outer expression, so
			
 
				+long as the r-value expression has the same type and evaluates to the same
			
 
				+value. This, in turn, ensures that once we know a value implements `IndexWith`,
			
 
				+additionally learning that it implements `IndirectIndexWith` can't invalidate
			
 
				+code that previously typechecked, or change its semantics.
			
 
				+
			
 
				+## Details
			
 
				+
			
 
				+A subscript expression has the form "_lhs_ `[` _index_ `]`". As in C++, this
			
 
				+syntax has the same precedence as `.`, `->`, and function calls, and associates
			
 
				+left-to-right with all of them.
			
 
				+
			
 
				+Its semantics are defined in terms of the following interfaces:
			
 
				+
			
 
				+```
			
 
				+interface IndexWith(SubscriptType:! Type) {
			
 
				+  let ElementType:! Type;
			
 
				+  fn At[me: Self](subscript: SubscriptType) -> ElementType;
			
 
				+  fn Addr[addr me: Self*](subscript: SubscriptType) -> ElementType*;
			
 
				+}
			
 
				+
			
 
				+interface IndirectIndexWith(SubscriptType:! Type) {
			
 
				+  impl as IndexWith(SubscriptType);
			
 
				+  fn Addr[me: Self](subscript: SubscriptType) -> ElementType*;
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+A subscript expression where _lhs_ has type `T` and _index_ has type `I` is
			
 
				+rewritten based on the value category of _lhs_ and whether `T` is known to
			
 
				+implement `IndirectIndexWith(I)`:
			
 
				+
			
 
				+-   If `T` implements `IndirectIndexWith(I)`, the expression is rewritten to
			
 
				+    "`*((` _lhs_ `).(IndirectIndexWith(I).Addr)(` _index_ `))`".
			
 
				+-   Otherwise, if _lhs_ is an l-value, the expression is rewritten to "`*((`
			
 
				+    _lhs_ `).(IndexWith(I).Addr)(` _index_ `))`".
			
 
				+-   Otherwise, the expression is rewritten to "`(` _lhs_ `).(IndexWith(I).At)(`
			
 
				+    _index_ `)`".
			
 
				+
			
 
				+On their own, these rules would oblige `IndirectIndexWith` types to define three
			
 
				+methods to support a single syntax. Worse, they would permit those types to
			
 
				+define those methods in inconsistent ways, which would violate coherence. To
			
 
				+avoid those problems, `IndirectIndexWith` provides a blanket `final impl` for
			
 
				+`IndexWith`:
			
 
				+
			
 
				+```
			
 
				+final external impl forall
			
 
				+    [SubscriptType:! Type, T:! IndirectIndexWith(SubscriptType)]
			
 
				+    T as IndexWith(SubscriptType) {
			
 
				+  let ElementType:! Type = T.(IndirectIndexWith(SubscriptType)).ElementType;
			
 
				+  fn At[me: Self](subscript: SubscriptType) -> ElementType {
			
 
				+    return *(me.(IndirectIndexWith(SubscriptType).Addr)(index));
			
 
				+  }
			
 
				+  fn Addr[addr me: Self*](subscript: SubscriptType) -> ElementType* {
			
 
				+    return me->(IndirectIndexWith(SubscriptType).Addr)(index);
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Thus, a type that implements `IndirectIndexWith` need not, and cannot, provide
			
 
				+its own definitions of `IndexWith.At` and `IndexWith.Addr`.
			
 
				+
			
 
				+### Examples
			
 
				+
			
 
				+An array type could implement subscripting like so:
			
 
				+
			
 
				+```
			
 
				+class Array(template T:! Type) {
			
 
				+  external impl as IndexWith(like i64) {
			
 
				+    let ElementType:! Type = T;
			
 
				+    fn At[me: Self](subscript: i64) -> T;
			
 
				+    fn Addr[addr me: Self*](subscript: i64) -> T*;
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+And a slice type could look like this:
			
 
				+
			
 
				+```
			
 
				+class Slice(T:! Type) {
			
 
				+  external impl as IndirectIndexWith(like i64) {
			
 
				+    let ElementType:! Type = T;
			
 
				+    fn Addr[me: Self](subscript: i64) -> T*;
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+We can even approximate support for indexing on tuple types. We don't have
			
 
				+enough of a design for metaprogramming or variadics to specify it generically,
			
 
				+but we can illustrate how it would look for a particular tuple type, such as
			
 
				+`(i8, f64)`:
			
 
				+
			
 
				+```
			
 
				+external impl (i8, f64) as IndexWith(typeof(0)) {
			
 
				+  let ElementType:! Type = i8;
			
 
				+  fn At[me: Self](0) -> i8;
			
 
				+  fn Addr[addr me: Self*](0) -> i8*;
			
 
				+}
			
 
				+
			
 
				+external impl (i8, f64) as IndexWith(typeof(1)) {
			
 
				+  let ElementType:! Type = f64;
			
 
				+  fn At[me: Self](1) -> f64;
			
 
				+  fn Addr[addr me: Self*](1) -> f64*;
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+However, this approach presupposes that every integer literal has a distinct
			
 
				+type, which is not yet settled. It also only supports indexing with integer
			
 
				+literals, not with constant values of ordinary integer types. This is somewhat
			
 
				+contrary to Carbon's metaprogramming philosophy: Carbon treats types as values
			
 
				+in order to make metaprogramming more like ordinary programming, but here we're
			
 
				+doing the reverse, treating values as types. Consequently, the eventual design
			
 
				+for tuple-style indexing may look very different.
			
 
				+
			
 
				+## Rationale based on Carbon's goals
			
 
				+
			
 
				+Indexing is a fundamental operation in procedural programming, and is especially
			
 
				+common in performance-critical code, so supporting it with a concise, familiar
			
 
				+syntax helps us make
			
 
				+[code easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write),
			
 
				+and support
			
 
				+[performance-critical software](/docs/project/goals.md#performance-critical-software).
			
 
				+Using the same syntax as C++, even for slice-like types, supports
			
 
				+[interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code).
			
 
				+
			
 
				+## Alternatives considered
			
 
				+
			
 
				+### Different subscripting syntaxes
			
 
				+
			
 
				+The reason that slices have different semantics from arrays is that a slice
			
 
				+represents an indirection, like a pointer. Indexing behaves the same for slice
			
 
				+l-values and r-values for the same reason that dereferencing behaves the same
			
 
				+for pointer l-values and r-values. Thus, our attempts to make a single
			
 
				+subscripting syntax work for both arrays and slices are arguably analogous to
			
 
				+trying to make the same method-call syntax work for both pointers and objects.
			
 
				+Or, to put it another way, the difficulty with slices is that they behave like
			
 
				+_references_.
			
 
				+
			
 
				+From that point of view, it would be more consistent to have different
			
 
				+subscripting syntaxes for the two cases. The most naive version of that would be
			
 
				+to say that only array-like types support subscripting, and slice-like types
			
 
				+should instead support dereferencing, so indexing into a slice would look like
			
 
				+`(*s)[i]`. However, that would be syntactically onerous, and it's not clear how
			
 
				+to define the type of `*s` in a way that doesn't beg the question.
			
 
				+
			
 
				+Instead, we could define a distinct syntax for slice subscripting, such as
			
 
				+`s*[i]`, and define separate interfaces for the two subscript syntax. The two
			
 
				+interfaces would be very similar to `IndexWith` and `IndirectIndexWith`, but
			
 
				+neither would extend the other, and the rewrite from subscript syntax to method
			
 
				+calls would not depend on type information (although it would depend on value
			
 
				+category). However, any such syntax would be quite novel and unfamiliar, and
			
 
				+there seem to be few good choices for how to spell it. `s*[i]` is almost the
			
 
				+only syntax that seems both viable and somewhat mnemonic, but it would add
			
 
				+further complexity to the already-fraught parsing of `*` (for humans as well as
			
 
				+tools).
			
 
				+
			
 
				+### Multiple indices
			
 
				+
			
 
				+This proposal does not support multiple comma-separated indices (such as
			
 
				+`a[i, j]`), which is desirable for types like multidimensional arrays. We do
			
 
				+support syntax like `a[(i, j)]`, which is a single index whose type is a tuple,
			
 
				+but the extra parens are syntactically noisy. We could add those parens
			
 
				+implicitly, but that would effectively move the syntactic noise to the
			
 
				+implementation, even for the single-index case (for example
			
 
				+`impl Foo as IndexWith((i64,))`). It should be much cleaner to make the
			
 
				+interfaces and their methods variadic, once we have a design for variadics.
			
 
				+
			
 
				+### Read-only subscripting
			
 
				+
			
 
				+This proposal does not provide an obvious path for supporting types like C++'s
			
 
				+`std::string_view` or `std::span<const T>`, whose subscripting operations expose
			
 
				+read-only access to their contents. It is tempting to try to extend this
			
 
				+proposal to support those use cases, both because of their inherent importance
			
 
				+and because it already has to deal with read-only versus read-write access in
			
 
				+order to support array r-values.
			
 
				+
			
 
				+However, there's a fundamental difference between the true immutability of
			
 
				+something like an array r-value, and the contextual lack of mutable _access_
			
 
				+provided by something like `string_view`. While value categories can express the
			
 
				+former, they are not well-suited to expressing the latter. To address these use
			
 
				+cases, Carbon will probably need something like C++'s `const` type system, but
			
 
				+that should be largely orthogonal to this proposal.
			
 
				+
			
 
				+### Rvalue-only subscripting
			
 
				+
			
 
				+This proposal does not support subscripting operations that can't produce
			
 
				+l-values. In particular, this means it does not support using subscript syntax
			
 
				+to form slices, as in Python's `a[i:j]` or Swift's `a[i...j]`. To support this,
			
 
				+we would need a separate pair of interfaces that return by value:
			
 
				+
			
 
				+```
			
 
				+interface RValueIndexWith(SubscriptType:! Type) {
			
 
				+  let ElementType:! Type;
			
 
				+  fn Addr[addr me: Self*](subscript: SubscriptType) -> ElementType;
			
 
				+}
			
 
				+
			
 
				+interface RValueIndirectIndexWith(SubscriptType:! Type) {
			
 
				+  extends RValueIndexWith(SubscriptType);
			
 
				+  let ElementType:! Type;
			
 
				+  fn Addr[me: Self](subscript: SubscriptType) -> ElementType;
			
 
				+}
			
 
				+
			
 
				+final external impl [SubscriptType:! Type,
			
 
				+                     T:! RValueIndirectIndexWith(SubscriptType)]
			
 
				+    T as IndexWith(SubscriptType) {
			
 
				+  let ElementType:! Type =
			
 
				+      T.(RValueIndirectIndexWith(SubscriptType)).ElementType;
			
 
				+  fn Addr[addr me: Self*](subscript: SubscriptType) -> ElementType {
			
 
				+    return me->Addr(subscript);
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Note that we still need a pair of interfaces, and a blanket final `impl` to
			
 
				+enforce consistency, because arrays and slices have different semantics in this
			
 
				+context as well: taking a slice of an r-value array is invalid, because taking a
			
 
				+slice is equivalent to (and presumably implemented in terms of) taking an
			
 
				+address. On the other hand, taking a slice of an r-value slice is valid and
			
 
				+should be supported.
			
 
				+
			
 
				+We would likewise need to extend the rewrite rules for subscript syntax to
			
 
				+detect and use implementations of these interfaces. This should not lead to a
			
 
				+combinatorial explosion of cases, though; if `T` implements both `IndexWith(I)`
			
 
				+and `RValueIndexWith(I)`, the program should be ill-formed due to ambiguity.
			
 
				+
			
 
				+However, it's not clear if this would provide enough benefit to justify the
			
 
				+added complexity.
			
 
				+
			
 
				+### Map-like subscripting
			
 
				+
			
 
				+This proposal does not support subscripting operations that insert new elements
			
 
				+into a collection, as in C++'s `std::map` and `std::unordered_map`, because it
			
 
				+requires subscriptable types to support subscripting of r-values, which are
			
 
				+immutable. We could support this with an additional interface for types that are
			
 
				+subscriptable only as l-values, and a corresponding extension to the rewrite
			
 
				+rules.
			
 
				+
			
 
				+However, it's debatable whether such insertion behavior is desirable; it has not
			
 
				+been a clear-cut success in C++. Code like `x = m[i];` looks like it reads from
			
 
				+`m`, and the fact that it can write to `m` is surprising, and easy for even
			
 
				+experienced programmers to overlook. Even for wary readers, it doesn't convey
			
 
				+the author's intent, because it's not clear whether the author assumed `i` is
			
 
				+present, or is relying on the implicit insertion. Furthermore, the implicit
			
 
				+insertion means that `x = m[i];` won't compile when `m` is const. On the other
			
 
				+hand, while it's relatively unsurprising that `m[i] = x;` might insert, that
			
 
				+insertion is also potentially inefficient, since it must default-construct a new
			
 
				+value before assigning `x` to it.
			
 
				+
			
 
				+We could fix most of these problems by giving special treatment to syntax of the
			
 
				+form `m[i] = x;`, and defining separate methods for it:
			
 
				+
			
 
				+```
			
 
				+interface IndexWith(SubscriptType:! Type) {
			
 
				+  let ElementType:! Type;
			
 
				+  fn At[me: Self](subscript: SubscriptType) -> ElementType;
			
 
				+  fn Addr[addr me: Self*](subscript: SubscriptType) -> ElementType*;
			
 
				+  fn Assign[addr me: Self*](subscript: SubscriptType, element: ElementType) {
			
 
				+    (*me->Addr(index)) = element;
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+interface IndirectIndexWith(SubscriptType:! Type) {
			
 
				+  extends IndexWith(SubscriptType);
			
 
				+  let ElementType:! Type;
			
 
				+  fn Addr[me: Self](subscript: SubscriptType) -> ElementType*;
			
 
				+  fn Assign[me: Self](subscript: SubscriptType, element: ElementType) {
			
 
				+    me->Addr(subscript) = element;
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+final external impl [SubscriptType:! Type, T:! IndirectIndexWith(SubscriptType)]
			
 
				+    T as IndexWith(SubscriptType) {
			
 
				+  ...
			
 
				+  fn Assign[addr me: Self*](subscript: SubscriptType, element: ElementType) {
			
 
				+    me->(IndirectIndexWith.AssignAsRValue)(subscript, element);
			
 
				+  }
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+With this approach, expressions of the form `m[i] = x` would be rewritten to
			
 
				+call the appropriate `Assign` method, while all other subscript expressions are
			
 
				+rewritten as in the primary proposal. However, this does add some complexity to
			
 
				+both the interfaces and the rewrite rules (for example, we presumably want
			
 
				+`(m[i]) = x` to be treated the same as `m[i] = x`), so I'm leaving this as a
			
 
				+potential future extension.
			
 
				+
			
 
				+#### Usage-based rewriting
			
 
				+
			
 
				+As an alternative way of getting `std::map`-like behavior, we could rewrite
			
 
				+subscript expressions to use `At` instead of `Addr` when the usage context only
			
 
				+needs an r-value, even if the operand is an l-value, in much the same way that
			
 
				+Rust rewrites subscript expressions depending on whether the usage expects the
			
 
				+result to be mutable. This would enable type authors to make `m[i] = x;`
			
 
				+potentially-inserting while ensuring that `x = m[i];` is not, by making `Addr`
			
 
				+but not `At` potentially-inserting.
			
 
				+
			
 
				+However, this would mean that `m[i].Method()` is potentially inserting if
			
 
				+`Method` takes `me` by pointer, and would have the same performance drawback as
			
 
				+in C++. Furthermore, it would violate the principle (which we have discussed but
			
 
				+not yet adopted) that type information should flow from subexpressions to the
			
 
				+parent expression, but never the reverse.
			
 
				+
			
 
				+A variant of this approach would be to allow the compiler to choose between `At`
			
 
				+and `Addr` when the operand is an l-value but the usage context only needs an
			
 
				+r-value. This might enable the compiler to make the generated code more
			
 
				+efficient and/or safer, and might create pressure for libraries to ensure `At`
			
 
				+and `Addr` are consistent with each other.
			
 
				+
			
 
				+However, if the compiler predictably chooses `At` under those conditions, that
			
 
				+could create a temptation for library authors to leverage that implementation
			
 
				+detail to try to implement a `std::map`-like API, by having `Addr` implicitly
			
 
				+insert while `At` does not. That would have all the same drawbacks as if we
			
 
				+supported it intentionally, plus the evolutionary drawbacks of having user code
			
 
				+depend on unsupported implementation details.