Sfoglia il codice sorgente

The Core.Array type for direct-storage immutably-sized buffers (#4682)

We propose to add `Core.Array(T, N)` as a library type in the `Core`
package. Since arrays are a very frequent type, we propose to privilege
use of this type by including it in the `prelude` library of the
package.

We would like to see a shorthand where `Core.Array` is automatically
imported into the file scope, and this proposal includes future work to
this effect.
Dana Jansens 1 anno fa
parent
commit
4b45caa8e0
1 ha cambiato i file con 481 aggiunte e 0 eliminazioni
  1. 481 0
      proposals/p4682.md

+ 481 - 0
proposals/p4682.md

@@ -0,0 +1,481 @@
+# The Core.Array type for direct-storage immutably-sized buffers
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/4682)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+    -   [Rust](#rust)
+    -   [Swift](#swift)
+    -   [Safe C++](#safe-c)
+    -   [Goals](#goals)
+        -   [Privileging the most common type names](#privileging-the-most-common-type-names)
+        -   [Absence of syntax should make clear defaults](#absence-of-syntax-should-make-clear-defaults)
+        -   [Avoiding confusion with other languages](#avoiding-confusion-with-other-languages)
+        -   [Avoiding confusion with other domains](#avoiding-confusion-with-other-domains)
+    -   [No predeclared identifiers](#no-predeclared-identifiers)
+-   [Proposal](#proposal)
+-   [Rationale](#rationale)
+-   [Future work](#future-work)
+    -   [Namespacing the `Core` package](#namespacing-the-core-package)
+-   [Alternatives considered](#alternatives-considered)
+    -   [`[T; N]` builtin syntax](#t-n-builtin-syntax)
+    -   [`array [T; N]` builtin syntax](#array-t-n-builtin-syntax)
+    -   [Just the `Core.Array(T, N)` library type](#just-the-corearrayt-n-library-type)
+    -   [Implicitly importing `Core.Array(T, N)` to the file scope](#implicitly-importing-corearrayt-n-to-the-file-scope)
+
+<!-- tocstop -->
+
+## Abstract
+
+We propose to add `Core.Array(T, N)` as a library type in the `prelude` library
+of the `Core` package. Since arrays are a very frequent type, we propose to
+privilege use of this type by providing a builtin keyword `array(T, N)` that
+resolves to the `Core.Array(T, N)` type.
+
+## Problem
+
+Carbon's current syntax for a fixed-size, direct storage array (hereafter called
+"array") is the provisional `[T; N]` and there is no syntax yet for a
+mutably-sized indirect storage buffer (hereafter called "heap-buffer").
+
+Arrays and heap-buffers are some of the most commonly used types, after
+fundamental types. The syntax, whatever it is, will be incredibly frequent in
+Carbon source code.
+
+We explore and propose a new syntax for arrays that addresses design issues with
+the provisional syntax that allows for writing each of the following in clear
+ways: slice, compile-time sized slice, array, and pointer to array. And that
+leaves clear room for a sibling indirect-storage type.
+
+## Background
+
+We have developed a matrix for enumerating and describing the vocabulary of
+owning array and buffer types. Direct refers to an in-place storage buffer, as
+with arrays. Indirect refers to heap allocation, where the type itself holds
+storage of a pointer to the buffer, as with heap-buffers.
+
+Here we are discussing the location of storage (direct vs indirect) as a way to
+categorize types. Indirect-storage types may, for small payloads, store state
+directly in its fields (such as with the Small String Optimization), but this is
+an optimization for specific payloads and the category of the type remains an
+indirect-storage type.
+
+To provide familiarity, here is the table for the C++ language as a baseline:
+
+| Owning type              | Runtime Sized          | Compile-time Sized          |
+| ------------------------ | ---------------------- | --------------------------- |
+| Direct, Immutable Size   | -                      | `T[N]` / `std::array<T, N>` |
+| Indirect, Immutable Size | `std::unique_ptr<T[]>` | `std::unique_ptr<T[N]>`     |
+| Indirect, Mutable Size   | `std::vector<T>`       | -                           |
+
+### Rust
+
+The Rust vocabulary is as follows:
+
+| Owning type              | Runtime Sized | Compile-time Sized |
+| ------------------------ | ------------- | ------------------ |
+| Direct, Immutable Size   | -             | `[T; N]`           |
+| Indirect, Immutable Size | `Box<[T]>`    | `Box<[T; N]>`      |
+| Indirect, Mutable Size   | `Vec<T>`      | -                  |
+
+There are a few things of note when comparing to C++:
+
+-   The Rust `Box` and `Vec` types are part of `std` but are imported into the
+    current scope automatically, so they do not need any prefix.
+-   The `[T]` type represents a fixed-runtime-size buffer. The type itself is
+    not instantiable since its size is not known at compile time. `Box` is
+    specialized for the type to store a runtime size in its own type.
+-   The array type syntax matches the Carbon provisional syntax.
+-   The heap-buffer type name matches the C++ `vector` type, but it is
+    privileged with a shorter name. The `Vec` type name is at most the same
+    length as an array type name (for the same `T`).
+
+### Swift
+
+The Swift vocabulary is significantly smaller, to support automatic refcounting:
+
+| Owning type              | Runtime Sized      | Compile-time Sized |
+| ------------------------ | ------------------ | ------------------ |
+| Direct, Immutable Size   | `InlineArray<T>`   | -                  |
+| Indirect, Immutable Size | -                  | -                  |
+| Indirect, Mutable Size   | `Array<T>` / `[T]` | -                  |
+
+Because there was historically no direct storage option, only one name was
+needed, and "Array" was used to refer to a heap-buffer.
+
+On
+[Feb 5 2025](https://forums.swift.org/t/accepted-with-modifications-se-0453-inlinearray-formerly-vector-a-fixed-size-array/77678),
+a proposal was accepted to
+[introduce `InlineArray<T>` for the direct storage immutably sized array type](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0453-vector.md).
+Because "Array" is already taken, the original proposal called this new type
+"Vector" in reference to mathematical vectors. The choice of name was
+[heavily discussed](https://forums.swift.org/t/second-review-se-0453-vector-a-fixed-size-array/76412)
+however, due to the confusion with C++'s `std::vector` and Rust's
+`std::vec::Vec`. It was
+[provisionally renamed to `Slab`](https://github.com/swiftlang/swift/pull/76438)
+but settled on `InlineArray`.
+
+### Safe C++
+
+The [Safe C++ proposal](https://safecpp.org/draft.html#tuples-arrays-and-slices)
+introduces array syntax very similar to Rust:
+
+| Owning type              | Runtime Sized         | Compile-time Sized  |
+| ------------------------ | --------------------- | ------------------- |
+| Direct, Immutable Size   | -                     | `[T; N]`            |
+| Indirect, Immutable Size | `std2::box<[T; dyn]>` | `std2::box<[T; N]>` |
+| Indirect, Mutable Size   | `std2::vector<T>`     | -                   |
+
+There are a few things of note:
+
+-   While Rust omits a size to indicate the size is known only at runtime, Safe
+    C++ uses a `dyn` keyword indicate the same.
+-   The heap-buffer type name is unchanged from C++, sticking with `vector`.
+
+### Goals
+
+It will help to establish some goals in order to weigh alternatives against.
+These goals are based on the
+[open discussion from 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?usp=sharing&resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA#heading=h.h0tg34pzq5yz),
+where we discussed the
+[Pointers, Arrays, Slices](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?usp=sharing)
+document.
+
+The goals here are largely informed by and trying to achieve the top-level goal
+of
+["Code that is easy to read, understand, and write"](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write).
+We define some more specific targets here as relate to the specifics of the
+array syntax.
+
+#### Privileging the most common type names
+
+-   "Explicitness must be balanced against conciseness, as verbosity and
+    ceremony add cognitive overhead for the reader, while explicitness reduces
+    the amount of outside context the reader must have or assume."
+
+The more common it will be for a type to be used, the shorter we would like the
+name to be. This follows from the presumption that we weigh conciseness as
+increasingly valuable for types that will appear more frequently in Carbon code.
+
+We expect the ordering of frequency in Carbon code to be:
+
+-   fundamental types ≈ tuples >> heap-buffers > arrays >> everything else[^1].
+
+Where fundamental types are: machine-sized integers (8 bit, 16 bit, etc.),
+machine-sized floating points, and pointers including slices[^2]. Function
+parameters/arguments are an example of tuples.
+
+From this, we derive that we want:
+
+-   Fundamental types and tuples to have the most concise names.
+    -   We can lean on special syntax or keywords as needed to make them concise
+        but descriptive.
+-   Heap-buffers to have a concise name, even more so than arrays.
+    -   We could use special syntax or keywords if needed to achieve
+        conciseness.
+-   Arrays to have a concise name, but they do not need to be comparably concise
+    to fundamental types and tuples.
+    -   We should try to avoid special syntax.
+-   Everything else should be written as idiomatic types with descriptive names.
+
+[^1]:
+    "[chandlerc] Prioritize: slices first, then [resizable storage], then compile-time
+    sized storage, then everything else is vastly less common. Between those three,
+    the difference in frequency between the first two is the biggest." from [open discussion on 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA&tab=t.0)
+
+[^2]:
+    Slices are included with fundamental types for simplicity, since they will
+    take the place of many pointers in C++, giving them similar frequency to
+    pointers, and can be logically thought of as a bounded pointer.
+
+#### Absence of syntax should make clear defaults
+
+One way to write arrays and compile-time-sized slices is like we see in Rust:
+`[T; N]` and `&[T; N]`. This suggests a relationship where array is like slice,
+and the default form. But they are very different types, rather than a
+modification of a single type, and this can be confusing[^3] for developers
+learning the language.
+
+[^3]: https://fire.asta.lgbt/notes/a1iay7r3e7or0a59 (content-warning: swearing)
+
+We want to avoid the situation where
+[absence of syntax](https://www.youtube.com/watch?v=-Hb-9TUyjoo), such as a
+missing pointer indicator, changes the entire meaning of the remaining syntax or
+is otherwise confusing.
+
+#### Avoiding confusion with other languages
+
+The most general meaning of "array" is a range of consecutive values in memory.
+
+However in many languages it is used, either in formally or informally, to refer
+to a direct-storage, immutably-sized memory range:
+
+-   C,
+    [colloquial](https://en.wikibooks.org/wiki/C_Programming/Arrays_and_strings)
+-   C++, colloquial (from C) and
+    [`std::array`](https://en.cppreference.com/w/cpp/container/array)
+-   Go, [colloquial](https://go.dev/tour/moretypes/6)
+-   Rust, [colloquial](https://doc.rust-lang.org/std/primitive.array.html)[^4]
+
+In particular, this is the usage in the languages which Carbon will most
+frequently interoperate, and/or from which code will be migrated to Carbon and
+thus comments and variable names would use these terms in this way.
+
+[^4]:
+    Maybe this is more formal than colloquial, but the name is not part of the
+    typename/syntax.
+
+Languages which require shared ownership _don't have direct-storage arrays_, so
+the same term gets used for indirect storage:
+
+-   Swift, [`Array`](https://developer.apple.com/documentation/swift/array)
+    -   As noted earlier, Swift is in the processes of adding a direct-storage
+        array called
+        [`InlineArray`](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0453-vector.md).
+        Backwards compatibility prevents the use of `Array` for this.
+-   Javascript,
+    [`Array`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array)
+-   Java and Kotlin,
+    [`ArrayList`](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html)
+
+And some languages use array to refer to both direct and indirect storage types.
+
+-   Dlang has direct-storage arrays
+    [colloqually](https://dlang.org/spec/arrays.html) and the indirect-storage
+    [`Array`](https://dlang.org/phobos/std_container_array.html) type.
+-   Pascal uses the presence or absence of a size to determine if
+    [`Array`](https://www.freepascal.org/docs-html/ref/refsu14.html) uses direct
+    (immutably-sized) or indirect (mutably-sized) storage.
+
+In sum, languages which have direct-storage immutably-sized arrays use the term
+"array" to refer to those, and most then use a separate name for the
+indirect-storage type.
+
+#### Avoiding confusion with other domains
+
+The term "vector" in mathematics refers to a fixed-size set of numbers. This
+leads to confusion with the C++ type `std::vector` since it holds a
+mutably-sized set of values. Developers coming from other domains must learn a
+new and contradictory term of art. The Rust language chose naming that derives
+from C++, with `std::vec::Vec`.
+
+These type names conflict with names in mathematical and graphics libraries,
+which want to use vector in its mathematical sense. In Rust this leads to `Vec`
+for a mutably-sized array, and
+[`Vec3`](https://docs.rs/bevy/latest/bevy/prelude/struct.Vec3.html),
+[`Vec4`](https://docs.rs/bevy/latest/bevy/prelude/struct.Vec4.html), and so on
+for fixed-size mathematical vectors. While not fatal, this does create ambiguity
+that must be overcome by developers.
+
+### No predeclared identifiers
+
+Recently the proposal
+[p4864: No predeclared identifiers, Core is a keyword](https://docs.carbon-lang.dev/proposals/p4864.html)
+clarified a direction for the Carbon language, wherein there will not be
+implicit imports from the `Core` library. Anything accessible directly in the
+language, rather than through a package name, is done so through a builtin
+keyword. This ensures that raw identifier syntax is always available for those
+same names in Carbon code.
+
+## Proposal
+
+The
+[All APIs are library APIs principle](/docs/project/principles/library_apis_only.md)
+states:
+
+> In Carbon, every public function is declared in some Carbon API file.
+
+As such, we propose a `Core` library type for a direct-storage immutably-sized
+array, and then a builtin shorthand for referring to that library type.
+
+In line with other languages surveyed above, given the presence of a
+direct-storage immutably-sized array in Carbon, we will reserve the unqualified
+name "array" for this type. In full, its name is `Core.Array(T, N)`, where `T`
+is the type of elements in the array, and `N` is the number of elements. Notably
+this leaves room for supporting multi-dimensional arrays by adding further
+optional size parameters, either in the `Array` type or in a similar sibling
+type.
+
+Here is a provisional vocabulary table to compare with other languages:
+
+| Owning type              | Runtime Sized | Compile-time Sized                 |
+| ------------------------ | ------------- | ---------------------------------- |
+| Direct, Immutable Size   | -             | `array(T, N)` / `Core.Array(T, N)` |
+| Indirect, Immutable Size | ?             | `Core.Box(Array(T, N))`            |
+| Indirect, Mutable Size   | `Core.Buf(T)` | -                                  |
+
+Carbon does not have proposed names for heap-allocated storage, so we use some
+placeholders here, in order to show where `Array` fits into the picture:
+
+-   `Box(T)` for a heap-allocated `T` value.
+-   `Buf(T)` for a heap-buffer of `T` values.
+
+An indirect, immutably-sized buffer does not have a clearly expressible syntax
+at the moment. `Box([T])` is the closest fit with the current provisional syntax
+for slices. But `[T]` is a sized pointer, which would make this type a
+heap-allocated sized pointer, rather than a heap-allocated fixed-size array.
+This is in contrast with Rust where `&[T]` is a slice, and `[T]` is a fixed-size
+buffer; so it then follows that `Box<[T]>` is a heap-allocated fixed-size
+buffer.
+
+Because arrays will be very common in Carbon code, we want to privilege their
+usage. There are at least two ways in which we can do so. The first is to
+include them in the `prelude` library of the `Core` package. This ensures they
+are available in every Carbon file as `Core.Array(T, N)`. The second is by
+making the type available through a shorthand without being qualfiied by the
+`Core` package name. We propose the `array(T, N)` builtin keyword as that
+shorthand.
+
+## Rationale
+
+As this proposal is addressing the question of introducing a new `prelude`
+library type in `Core`, it is mostly focused on the goal
+[Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
+
+This proposal aims to make code easy to understand by using a name that is
+consistent across systems programming languages, and avoiding names that have
+conflicting meaning. It also uses a standard type syntax, with a type in the
+`Core` package, making the type and its documentation maximally discoverable
+without requiring special-casing.
+
+We introduced some more specific sub-goals above:
+
+1. Privileging the most common type names
+
+This proposal privileges `Core.Array` as it will appear frequently in code, by
+placing it in the `prelude` library. This avoids the need for developers to
+`import` another `Core` library in order to access the type.
+
+Since array types are expected to be very frequent, we also propose an `array`
+builtin keyword as a shorthand. This is in line with the
+[No predeclared identifiers](https://docs.carbon-lang.dev/proposals/p4864.html)
+proposal, and uses a lowercase spelling to mark the word as a builtin. The
+spelling of `array(T, N)` will resolve to the library type `Core.Array(T, N)`.
+
+In this proposal, we avoid introducing additional syntax (such as with `[T; N]`
+or `(1, 2)`) because the frequency of use of arrays will be lower than that of
+fundamental types and tuples.
+
+2. Absence of syntax should make clear defaults
+
+We introduce a type name, with a keyword that has a clear relationship to the
+generic type name, rather than making arrays look more like slices but without
+being a pointer. This is maent to avoid the confusion raised when removing
+syntax changes the meaning significantly, and especially in ways that differ
+from defaults/options for a single language concept.
+
+3. Avoiding confusion with other languages
+
+We propose using the `Array` type name, and `array` shorthand, in line with how
+other languages use the same term. When a direct-storage array type is part of
+the language, it's consistently referred to as an "array" without
+qualifications.
+
+Most importantly, the name is consistent with the meaning in C++ and its
+standard library (`std::array<T, N>`) as well as with Rust, the languages which
+we expect Carbon code to interact with the most.
+
+4. Avoiding confusion with other domains
+
+The name `Vector` is a possible choice for a fixed-length set of values, due to
+its mathematical meaning, as was originally proposed for the direct-storage
+immutably-sized array type in Swift. However any use of the name `Vector` in a
+core systems programming language construct is fraught. Either the name is to be
+incorrectly confused with a mathematical vector or with a C++ `std::vector`. We
+avoid the confusion by avoiding this name.
+
+## Future work
+
+### Namespacing the `Core` package
+
+At this time, the `Core` package remains small, but there will come a time where
+the names within need to be split into smaller namespaces. Then the name
+`Core.Array`, among others, will become longer and the act of previleging the
+name through the `array` keyword will become more pronounced and helpful. At
+this time, we don't propose to put `Array` into a namespace in `Core` as there's
+no such existing structure to point to yet.
+
+## Alternatives considered
+
+### `[T; N]` builtin syntax
+
+This is the current syntax used by the toolchain, however it had the following
+problems raised:
+
+-   It's very similar to the syntax for slices, which is `[T]`, but very
+    different in nature, being storage instead of a reference to storage.
+-   Given `[T]` is a slice, `[T; N]` would better suit a compile-time-sized
+    slice.
+
+The syntax for a slice may also be changed, we discussed
+[adding a pointer annotation](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?tab=t.0#heading=h.fahgww8db6f0)
+to it, such as `[T]*` and `[T; N]*`. Some downsides remained:
+
+-   The `[T; N]*` syntax would be a fixed-size slice, rather than a pointer to
+    an array. This leaves no room for writing a pointer to an array, which can
+    indicate a different intent, that it always includes the full memory range
+    of the array. Without this distinction, we can't model both
+    `std::span<T, N>` and `std::array<T, N>*` in code migrated from C++ to
+    Carbon and would need to collapse these to a single type.
+-   Removing the pointer annotation would change the meaning of the type
+    expression more then we'd like, since it would change from a slice into an
+    array, rather than pointer-to-an-array into an array.
+
+### `array [T; N]` builtin syntax
+
+This introduces a keyword as a modifier of a fixed-size slice, rather than a
+builtin forwarding type. While arrays will be very common, it's not clear that
+they rise to the level of requiring breaking the languages naming rules (using a
+lowercase name) in order to provide a shorthand. And the shorthand is longer in
+the end than the `Array(T, N)` being proposed here. So this uses a larger
+weirdness budget for privileging the type while achieving less conciseness.
+
+This has a similar issue as with `[T; N]` but in the reverse. Removing the
+`array` modifier keyword changes the meaning of the type expression in ways that
+are larger than a default/modifier relationship. Fixed-size slices are not the
+more-default array.
+
+The use of a lowercase keyword also costs us by preventing users from using the
+word `array` in variables, a name which is quite common.
+
+### Just the `Core.Array(T, N)` library type
+
+Providing just the library type is possible, but arrays will be one of the most
+common types in Carbon code, as described earlier. Privileging them with a
+shorthand that avoids `Core.` will help make Carbon code significantly more
+concise, due to the frequency, without hurting understandability. This makes it
+worth the tradeoff of putting a name into the file scope (by way of a builtin
+type).
+
+### Implicitly importing `Core.Array(T, N)` to the file scope
+
+We considered a direction where some subset of names in the `Core` packages's
+`prelude` library are imported automatically to the file scope. This would be
+similar to the
+[Rust `std::prelude` module](https://doc.rust-lang.org/stable/std/prelude/index.html),
+which names aliases that are pulled into the global scope.
+
+Importing names poses challenges for migrating code to Carbon. Names imported
+from a library may allow shadowing in other scopes, which would create ambiguity
+about the meaning of these names. And names imported from a library would not be
+avoidable using the raw identifier syntax: `Array` and `r#Array` would both
+refer to the same imported `Core.Array` type. This would present a challenge for
+code migrated to Carbon which uses the name `Array` in its own types or
+variables. Whereas with a builtin keyword, `array` and `r#array` refer to
+different things: The first is the keyword, and the second is a name that the
+developer can use freely for other purposes.
+
+The
+[No predeclared identifiers, Core is a keyword](https://docs.carbon-lang.dev/proposals/p4864.html)
+proposal discusses in more detail why this approach was not taken.