# The Core.Array type for direct-storage immutably-sized buffers [Pull request](https://github.com/carbon-language/carbon-lang/pull/4682) ## Table of contents - [Abstract](#abstract) - [Problem](#problem) - [Background](#background) - [Rust](#rust) - [Swift](#swift) - [Safe C++](#safe-c) - [Goals](#goals) - [Privileging the most common type names](#privileging-the-most-common-type-names) - [Absence of syntax should make clear defaults](#absence-of-syntax-should-make-clear-defaults) - [Avoiding confusion with other languages](#avoiding-confusion-with-other-languages) - [Avoiding confusion with other domains](#avoiding-confusion-with-other-domains) - [No predeclared identifiers](#no-predeclared-identifiers) - [Proposal](#proposal) - [Rationale](#rationale) - [Future work](#future-work) - [Namespacing the `Core` package](#namespacing-the-core-package) - [Alternatives considered](#alternatives-considered) - [`[T; N]` builtin syntax](#t-n-builtin-syntax) - [`array [T; N]` builtin syntax](#array-t-n-builtin-syntax) - [Just the `Core.Array(T, N)` library type](#just-the-corearrayt-n-library-type) - [Implicitly importing `Core.Array(T, N)` to the file scope](#implicitly-importing-corearrayt-n-to-the-file-scope) ## Abstract We propose to add `Core.Array(T, N)` as a library type in the `prelude` library of the `Core` package. Since arrays are a very frequent type, we propose to privilege use of this type by providing a builtin keyword `array(T, N)` that resolves to the `Core.Array(T, N)` type. ## Problem Carbon's current syntax for a fixed-size, direct storage array (hereafter called "array") is the provisional `[T; N]` and there is no syntax yet for a mutably-sized indirect storage buffer (hereafter called "heap-buffer"). Arrays and heap-buffers are some of the most commonly used types, after fundamental types. The syntax, whatever it is, will be incredibly frequent in Carbon source code. We explore and propose a new syntax for arrays that addresses design issues with the provisional syntax that allows for writing each of the following in clear ways: slice, compile-time sized slice, array, and pointer to array. And that leaves clear room for a sibling indirect-storage type. ## Background We have developed a matrix for enumerating and describing the vocabulary of owning array and buffer types. Direct refers to an in-place storage buffer, as with arrays. Indirect refers to heap allocation, where the type itself holds storage of a pointer to the buffer, as with heap-buffers. Here we are discussing the location of storage (direct vs indirect) as a way to categorize types. Indirect-storage types may, for small payloads, store state directly in its fields (such as with the Small String Optimization), but this is an optimization for specific payloads and the category of the type remains an indirect-storage type. To provide familiarity, here is the table for the C++ language as a baseline: | Owning type | Runtime Sized | Compile-time Sized | | ------------------------ | ---------------------- | --------------------------- | | Direct, Immutable Size | - | `T[N]` / `std::array` | | Indirect, Immutable Size | `std::unique_ptr` | `std::unique_ptr` | | Indirect, Mutable Size | `std::vector` | - | ### Rust The Rust vocabulary is as follows: | Owning type | Runtime Sized | Compile-time Sized | | ------------------------ | ------------- | ------------------ | | Direct, Immutable Size | - | `[T; N]` | | Indirect, Immutable Size | `Box<[T]>` | `Box<[T; N]>` | | Indirect, Mutable Size | `Vec` | - | There are a few things of note when comparing to C++: - The Rust `Box` and `Vec` types are part of `std` but are imported into the current scope automatically, so they do not need any prefix. - The `[T]` type represents a fixed-runtime-size buffer. The type itself is not instantiable since its size is not known at compile time. `Box` is specialized for the type to store a runtime size in its own type. - The array type syntax matches the Carbon provisional syntax. - The heap-buffer type name matches the C++ `vector` type, but it is privileged with a shorter name. The `Vec` type name is at most the same length as an array type name (for the same `T`). ### Swift The Swift vocabulary is significantly smaller, to support automatic refcounting: | Owning type | Runtime Sized | Compile-time Sized | | ------------------------ | ------------------ | ------------------ | | Direct, Immutable Size | `InlineArray` | - | | Indirect, Immutable Size | - | - | | Indirect, Mutable Size | `Array` / `[T]` | - | Because there was historically no direct storage option, only one name was needed, and "Array" was used to refer to a heap-buffer. On [Feb 5 2025](https://forums.swift.org/t/accepted-with-modifications-se-0453-inlinearray-formerly-vector-a-fixed-size-array/77678), a proposal was accepted to [introduce `InlineArray` for the direct storage immutably sized array type](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0453-vector.md). Because "Array" is already taken, the original proposal called this new type "Vector" in reference to mathematical vectors. The choice of name was [heavily discussed](https://forums.swift.org/t/second-review-se-0453-vector-a-fixed-size-array/76412) however, due to the confusion with C++'s `std::vector` and Rust's `std::vec::Vec`. It was [provisionally renamed to `Slab`](https://github.com/swiftlang/swift/pull/76438) but settled on `InlineArray`. ### Safe C++ The [Safe C++ proposal](https://safecpp.org/draft.html#tuples-arrays-and-slices) introduces array syntax very similar to Rust: | Owning type | Runtime Sized | Compile-time Sized | | ------------------------ | --------------------- | ------------------- | | Direct, Immutable Size | - | `[T; N]` | | Indirect, Immutable Size | `std2::box<[T; dyn]>` | `std2::box<[T; N]>` | | Indirect, Mutable Size | `std2::vector` | - | There are a few things of note: - While Rust omits a size to indicate the size is known only at runtime, Safe C++ uses a `dyn` keyword indicate the same. - The heap-buffer type name is unchanged from C++, sticking with `vector`. ### Goals It will help to establish some goals in order to weigh alternatives against. These goals are based on the [open discussion from 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?usp=sharing&resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA#heading=h.h0tg34pzq5yz), where we discussed the [Pointers, Arrays, Slices](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?usp=sharing) document. The goals here are largely informed by and trying to achieve the top-level goal of ["Code that is easy to read, understand, and write"](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write). We define some more specific targets here as relate to the specifics of the array syntax. #### Privileging the most common type names - "Explicitness must be balanced against conciseness, as verbosity and ceremony add cognitive overhead for the reader, while explicitness reduces the amount of outside context the reader must have or assume." The more common it will be for a type to be used, the shorter we would like the name to be. This follows from the presumption that we weigh conciseness as increasingly valuable for types that will appear more frequently in Carbon code. We expect the ordering of frequency in Carbon code to be: - fundamental types ≈ tuples >> heap-buffers > arrays >> everything else[^1]. Where fundamental types are: machine-sized integers (8 bit, 16 bit, etc.), machine-sized floating points, and pointers including slices[^2]. Function parameters/arguments are an example of tuples. From this, we derive that we want: - Fundamental types and tuples to have the most concise names. - We can lean on special syntax or keywords as needed to make them concise but descriptive. - Heap-buffers to have a concise name, even more so than arrays. - We could use special syntax or keywords if needed to achieve conciseness. - Arrays to have a concise name, but they do not need to be comparably concise to fundamental types and tuples. - We should try to avoid special syntax. - Everything else should be written as idiomatic types with descriptive names. [^1]: "[chandlerc] Prioritize: slices first, then [resizable storage], then compile-time sized storage, then everything else is vastly less common. Between those three, the difference in frequency between the first two is the biggest." from [open discussion on 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA&tab=t.0) [^2]: Slices are included with fundamental types for simplicity, since they will take the place of many pointers in C++, giving them similar frequency to pointers, and can be logically thought of as a bounded pointer. #### Absence of syntax should make clear defaults One way to write arrays and compile-time-sized slices is like we see in Rust: `[T; N]` and `&[T; N]`. This suggests a relationship where array is like slice, and the default form. But they are very different types, rather than a modification of a single type, and this can be confusing[^3] for developers learning the language. [^3]: https://fire.asta.lgbt/notes/a1iay7r3e7or0a59 (content-warning: swearing) We want to avoid the situation where [absence of syntax](https://www.youtube.com/watch?v=-Hb-9TUyjoo), such as a missing pointer indicator, changes the entire meaning of the remaining syntax or is otherwise confusing. #### Avoiding confusion with other languages The most general meaning of "array" is a range of consecutive values in memory. However in many languages it is used, either in formally or informally, to refer to a direct-storage, immutably-sized memory range: - C, [colloquial](https://en.wikibooks.org/wiki/C_Programming/Arrays_and_strings) - C++, colloquial (from C) and [`std::array`](https://en.cppreference.com/w/cpp/container/array) - Go, [colloquial](https://go.dev/tour/moretypes/6) - Rust, [colloquial](https://doc.rust-lang.org/std/primitive.array.html)[^4] In particular, this is the usage in the languages which Carbon will most frequently interoperate, and/or from which code will be migrated to Carbon and thus comments and variable names would use these terms in this way. [^4]: Maybe this is more formal than colloquial, but the name is not part of the typename/syntax. Languages which require shared ownership _don't have direct-storage arrays_, so the same term gets used for indirect storage: - Swift, [`Array`](https://developer.apple.com/documentation/swift/array) - As noted earlier, Swift is in the processes of adding a direct-storage array called [`InlineArray`](https://github.com/swiftlang/swift-evolution/blob/main/proposals/0453-vector.md). Backwards compatibility prevents the use of `Array` for this. - Javascript, [`Array`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array) - Java and Kotlin, [`ArrayList`](https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html) And some languages use array to refer to both direct and indirect storage types. - Dlang has direct-storage arrays [colloqually](https://dlang.org/spec/arrays.html) and the indirect-storage [`Array`](https://dlang.org/phobos/std_container_array.html) type. - Pascal uses the presence or absence of a size to determine if [`Array`](https://www.freepascal.org/docs-html/ref/refsu14.html) uses direct (immutably-sized) or indirect (mutably-sized) storage. In sum, languages which have direct-storage immutably-sized arrays use the term "array" to refer to those, and most then use a separate name for the indirect-storage type. #### Avoiding confusion with other domains The term "vector" in mathematics refers to a fixed-size set of numbers. This leads to confusion with the C++ type `std::vector` since it holds a mutably-sized set of values. Developers coming from other domains must learn a new and contradictory term of art. The Rust language chose naming that derives from C++, with `std::vec::Vec`. These type names conflict with names in mathematical and graphics libraries, which want to use vector in its mathematical sense. In Rust this leads to `Vec` for a mutably-sized array, and [`Vec3`](https://docs.rs/bevy/latest/bevy/prelude/struct.Vec3.html), [`Vec4`](https://docs.rs/bevy/latest/bevy/prelude/struct.Vec4.html), and so on for fixed-size mathematical vectors. While not fatal, this does create ambiguity that must be overcome by developers. ### No predeclared identifiers Recently the proposal [p4864: No predeclared identifiers, Core is a keyword](https://docs.carbon-lang.dev/proposals/p4864.html) clarified a direction for the Carbon language, wherein there will not be implicit imports from the `Core` library. Anything accessible directly in the language, rather than through a package name, is done so through a builtin keyword. This ensures that raw identifier syntax is always available for those same names in Carbon code. ## Proposal The [All APIs are library APIs principle](/docs/project/principles/library_apis_only.md) states: > In Carbon, every public function is declared in some Carbon API file. As such, we propose a `Core` library type for a direct-storage immutably-sized array, and then a builtin shorthand for referring to that library type. In line with other languages surveyed above, given the presence of a direct-storage immutably-sized array in Carbon, we will reserve the unqualified name "array" for this type. In full, its name is `Core.Array(T, N)`, where `T` is the type of elements in the array, and `N` is the number of elements. Notably this leaves room for supporting multi-dimensional arrays by adding further optional size parameters, either in the `Array` type or in a similar sibling type. Here is a provisional vocabulary table to compare with other languages: | Owning type | Runtime Sized | Compile-time Sized | | ------------------------ | ------------- | ---------------------------------- | | Direct, Immutable Size | - | `array(T, N)` / `Core.Array(T, N)` | | Indirect, Immutable Size | ? | `Core.Box(Array(T, N))` | | Indirect, Mutable Size | `Core.Buf(T)` | - | Carbon does not have proposed names for heap-allocated storage, so we use some placeholders here, in order to show where `Array` fits into the picture: - `Box(T)` for a heap-allocated `T` value. - `Buf(T)` for a heap-buffer of `T` values. An indirect, immutably-sized buffer does not have a clearly expressible syntax at the moment. `Box([T])` is the closest fit with the current provisional syntax for slices. But `[T]` is a sized pointer, which would make this type a heap-allocated sized pointer, rather than a heap-allocated fixed-size array. This is in contrast with Rust where `&[T]` is a slice, and `[T]` is a fixed-size buffer; so it then follows that `Box<[T]>` is a heap-allocated fixed-size buffer. Because arrays will be very common in Carbon code, we want to privilege their usage. There are at least two ways in which we can do so. The first is to include them in the `prelude` library of the `Core` package. This ensures they are available in every Carbon file as `Core.Array(T, N)`. The second is by making the type available through a shorthand without being qualfiied by the `Core` package name. We propose the `array(T, N)` builtin keyword as that shorthand. ## Rationale As this proposal is addressing the question of introducing a new `prelude` library type in `Core`, it is mostly focused on the goal [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) This proposal aims to make code easy to understand by using a name that is consistent across systems programming languages, and avoiding names that have conflicting meaning. It also uses a standard type syntax, with a type in the `Core` package, making the type and its documentation maximally discoverable without requiring special-casing. We introduced some more specific sub-goals above: 1. Privileging the most common type names This proposal privileges `Core.Array` as it will appear frequently in code, by placing it in the `prelude` library. This avoids the need for developers to `import` another `Core` library in order to access the type. Since array types are expected to be very frequent, we also propose an `array` builtin keyword as a shorthand. This is in line with the [No predeclared identifiers](https://docs.carbon-lang.dev/proposals/p4864.html) proposal, and uses a lowercase spelling to mark the word as a builtin. The spelling of `array(T, N)` will resolve to the library type `Core.Array(T, N)`. In this proposal, we avoid introducing additional syntax (such as with `[T; N]` or `(1, 2)`) because the frequency of use of arrays will be lower than that of fundamental types and tuples. 2. Absence of syntax should make clear defaults We introduce a type name, with a keyword that has a clear relationship to the generic type name, rather than making arrays look more like slices but without being a pointer. This is maent to avoid the confusion raised when removing syntax changes the meaning significantly, and especially in ways that differ from defaults/options for a single language concept. 3. Avoiding confusion with other languages We propose using the `Array` type name, and `array` shorthand, in line with how other languages use the same term. When a direct-storage array type is part of the language, it's consistently referred to as an "array" without qualifications. Most importantly, the name is consistent with the meaning in C++ and its standard library (`std::array`) as well as with Rust, the languages which we expect Carbon code to interact with the most. 4. Avoiding confusion with other domains The name `Vector` is a possible choice for a fixed-length set of values, due to its mathematical meaning, as was originally proposed for the direct-storage immutably-sized array type in Swift. However any use of the name `Vector` in a core systems programming language construct is fraught. Either the name is to be incorrectly confused with a mathematical vector or with a C++ `std::vector`. We avoid the confusion by avoiding this name. ## Future work ### Namespacing the `Core` package At this time, the `Core` package remains small, but there will come a time where the names within need to be split into smaller namespaces. Then the name `Core.Array`, among others, will become longer and the act of previleging the name through the `array` keyword will become more pronounced and helpful. At this time, we don't propose to put `Array` into a namespace in `Core` as there's no such existing structure to point to yet. ## Alternatives considered ### `[T; N]` builtin syntax This is the current syntax used by the toolchain, however it had the following problems raised: - It's very similar to the syntax for slices, which is `[T]`, but very different in nature, being storage instead of a reference to storage. - Given `[T]` is a slice, `[T; N]` would better suit a compile-time-sized slice. The syntax for a slice may also be changed, we discussed [adding a pointer annotation](https://docs.google.com/document/d/1hdYyCLmzEOj9gDulm7Eo1SVNc0pY7zbMvFmEzenMhYE/edit?tab=t.0#heading=h.fahgww8db6f0) to it, such as `[T]*` and `[T; N]*`. Some downsides remained: - The `[T; N]*` syntax would be a fixed-size slice, rather than a pointer to an array. This leaves no room for writing a pointer to an array, which can indicate a different intent, that it always includes the full memory range of the array. Without this distinction, we can't model both `std::span` and `std::array*` in code migrated from C++ to Carbon and would need to collapse these to a single type. - Removing the pointer annotation would change the meaning of the type expression more then we'd like, since it would change from a slice into an array, rather than pointer-to-an-array into an array. ### `array [T; N]` builtin syntax This introduces a keyword as a modifier of a fixed-size slice, rather than a builtin forwarding type. While arrays will be very common, it's not clear that they rise to the level of requiring breaking the languages naming rules (using a lowercase name) in order to provide a shorthand. And the shorthand is longer in the end than the `Array(T, N)` being proposed here. So this uses a larger weirdness budget for privileging the type while achieving less conciseness. This has a similar issue as with `[T; N]` but in the reverse. Removing the `array` modifier keyword changes the meaning of the type expression in ways that are larger than a default/modifier relationship. Fixed-size slices are not the more-default array. The use of a lowercase keyword also costs us by preventing users from using the word `array` in variables, a name which is quite common. ### Just the `Core.Array(T, N)` library type Providing just the library type is possible, but arrays will be one of the most common types in Carbon code, as described earlier. Privileging them with a shorthand that avoids `Core.` will help make Carbon code significantly more concise, due to the frequency, without hurting understandability. This makes it worth the tradeoff of putting a name into the file scope (by way of a builtin type). ### Implicitly importing `Core.Array(T, N)` to the file scope We considered a direction where some subset of names in the `Core` packages's `prelude` library are imported automatically to the file scope. This would be similar to the [Rust `std::prelude` module](https://doc.rust-lang.org/stable/std/prelude/index.html), which names aliases that are pulled into the global scope. Importing names poses challenges for migrating code to Carbon. Names imported from a library may allow shadowing in other scopes, which would create ambiguity about the meaning of these names. And names imported from a library would not be avoidable using the raw identifier syntax: `Array` and `r#Array` would both refer to the same imported `Core.Array` type. This would present a challenge for code migrated to Carbon which uses the name `Array` in its own types or variables. Whereas with a builtin keyword, `array` and `r#array` refer to different things: The first is the keyword, and the second is a name that the developer can use freely for other purposes. The [No predeclared identifiers, Core is a keyword](https://docs.carbon-lang.dev/proposals/p4864.html) proposal discusses in more detail why this approach was not taken.