We propose to add Core.Array(T, N) as a library type in the prelude library
of the Core package. Since arrays are a very frequent type, we propose to
privilege use of this type by providing a builtin keyword array(T, N) that
resolves to the Core.Array(T, N) type.
Carbon's current syntax for a fixed-size, direct storage array (hereafter called
"array") is the provisional [T; N] and there is no syntax yet for a
mutably-sized indirect storage buffer (hereafter called "heap-buffer").
Arrays and heap-buffers are some of the most commonly used types, after fundamental types. The syntax, whatever it is, will be incredibly frequent in Carbon source code.
We explore and propose a new syntax for arrays that addresses design issues with the provisional syntax that allows for writing each of the following in clear ways: slice, compile-time sized slice, array, and pointer to array. And that leaves clear room for a sibling indirect-storage type.
We have developed a matrix for enumerating and describing the vocabulary of owning array and buffer types. Direct refers to an in-place storage buffer, as with arrays. Indirect refers to heap allocation, where the type itself holds storage of a pointer to the buffer, as with heap-buffers.
Here we are discussing the location of storage (direct vs indirect) as a way to categorize types. Indirect-storage types may, for small payloads, store state directly in its fields (such as with the Small String Optimization), but this is an optimization for specific payloads and the category of the type remains an indirect-storage type.
To provide familiarity, here is the table for the C++ language as a baseline:
| Owning type | Runtime Sized | Compile-time Sized |
|---|---|---|
| Direct, Immutable Size | - | T[N] / std::array<T, N> |
| Indirect, Immutable Size | std::unique_ptr<T[]> |
std::unique_ptr<T[N]> |
| Indirect, Mutable Size | std::vector<T> |
- |
The Rust vocabulary is as follows:
| Owning type | Runtime Sized | Compile-time Sized |
|---|---|---|
| Direct, Immutable Size | - | [T; N] |
| Indirect, Immutable Size | Box<[T]> |
Box<[T; N]> |
| Indirect, Mutable Size | Vec<T> |
- |
There are a few things of note when comparing to C++:
Box and Vec types are part of std but are imported into the
current scope automatically, so they do not need any prefix.[T] type represents a fixed-runtime-size buffer. The type itself is
not instantiable since its size is not known at compile time. Box is
specialized for the type to store a runtime size in its own type.vector type, but it is
privileged with a shorter name. The Vec type name is at most the same
length as an array type name (for the same T).The Swift vocabulary is significantly smaller, to support automatic refcounting:
| Owning type | Runtime Sized | Compile-time Sized |
|---|---|---|
| Direct, Immutable Size | InlineArray<T> |
- |
| Indirect, Immutable Size | - | - |
| Indirect, Mutable Size | Array<T> / [T] |
- |
Because there was historically no direct storage option, only one name was needed, and "Array" was used to refer to a heap-buffer.
On
Feb 5 2025,
a proposal was accepted to
introduce InlineArray<T> for the direct storage immutably sized array type.
Because "Array" is already taken, the original proposal called this new type
"Vector" in reference to mathematical vectors. The choice of name was
heavily discussed
however, due to the confusion with C++'s std::vector and Rust's
std::vec::Vec. It was
provisionally renamed to Slab
but settled on InlineArray.
The Safe C++ proposal introduces array syntax very similar to Rust:
| Owning type | Runtime Sized | Compile-time Sized |
|---|---|---|
| Direct, Immutable Size | - | [T; N] |
| Indirect, Immutable Size | std2::box<[T; dyn]> |
std2::box<[T; N]> |
| Indirect, Mutable Size | std2::vector<T> |
- |
There are a few things of note:
dyn keyword indicate the same.vector.It will help to establish some goals in order to weigh alternatives against. These goals are based on the open discussion from 2024-12-05, where we discussed the Pointers, Arrays, Slices document.
The goals here are largely informed by and trying to achieve the top-level goal of "Code that is easy to read, understand, and write". We define some more specific targets here as relate to the specifics of the array syntax.
The more common it will be for a type to be used, the shorter we would like the name to be. This follows from the presumption that we weigh conciseness as increasingly valuable for types that will appear more frequently in Carbon code.
We expect the ordering of frequency in Carbon code to be:
Where fundamental types are: machine-sized integers (8 bit, 16 bit, etc.), machine-sized floating points, and pointers including slices[^2]. Function parameters/arguments are an example of tuples.
From this, we derive that we want:
[^1]:
"[chandlerc] Prioritize: slices first, then [resizable storage], then compile-time
sized storage, then everything else is vastly less common. Between those three,
the difference in frequency between the first two is the biggest." from [open discussion on 2024-12-05](https://docs.google.com/document/d/1Iut5f2TQBrtBNIduF4vJYOKfw7MbS8xH_J01_Q4e6Rk/edit?resourcekey=0-mc_vh5UzrzXfU4kO-3tOjA&tab=t.0)
[^2]:
Slices are included with fundamental types for simplicity, since they will
take the place of many pointers in C++, giving them similar frequency to
pointers, and can be logically thought of as a bounded pointer.
One way to write arrays and compile-time-sized slices is like we see in Rust:
[T; N] and &[T; N]. This suggests a relationship where array is like slice,
and the default form. But they are very different types, rather than a
modification of a single type, and this can be confusing^3 for developers
learning the language.
We want to avoid the situation where absence of syntax, such as a missing pointer indicator, changes the entire meaning of the remaining syntax or is otherwise confusing.
The most general meaning of "array" is a range of consecutive values in memory.
However in many languages it is used, either in formally or informally, to refer to a direct-storage, immutably-sized memory range:
std::arrayIn particular, this is the usage in the languages which Carbon will most frequently interoperate, and/or from which code will be migrated to Carbon and thus comments and variable names would use these terms in this way.
[^4]:
Maybe this is more formal than colloquial, but the name is not part of the
typename/syntax.
Languages which require shared ownership don't have direct-storage arrays, so the same term gets used for indirect storage:
Array
InlineArray.
Backwards compatibility prevents the use of Array for this.ArrayArrayListAnd some languages use array to refer to both direct and indirect storage types.
Array type.Array uses direct
(immutably-sized) or indirect (mutably-sized) storage.In sum, languages which have direct-storage immutably-sized arrays use the term "array" to refer to those, and most then use a separate name for the indirect-storage type.
The term "vector" in mathematics refers to a fixed-size set of numbers. This
leads to confusion with the C++ type std::vector since it holds a
mutably-sized set of values. Developers coming from other domains must learn a
new and contradictory term of art. The Rust language chose naming that derives
from C++, with std::vec::Vec.
These type names conflict with names in mathematical and graphics libraries,
which want to use vector in its mathematical sense. In Rust this leads to Vec
for a mutably-sized array, and
Vec3,
Vec4, and so on
for fixed-size mathematical vectors. While not fatal, this does create ambiguity
that must be overcome by developers.
Recently the proposal
p4864: No predeclared identifiers, Core is a keyword
clarified a direction for the Carbon language, wherein there will not be
implicit imports from the Core library. Anything accessible directly in the
language, rather than through a package name, is done so through a builtin
keyword. This ensures that raw identifier syntax is always available for those
same names in Carbon code.
The All APIs are library APIs principle states:
In Carbon, every public function is declared in some Carbon API file.
As such, we propose a Core library type for a direct-storage immutably-sized
array, and then a builtin shorthand for referring to that library type.
In line with other languages surveyed above, given the presence of a
direct-storage immutably-sized array in Carbon, we will reserve the unqualified
name "array" for this type. In full, its name is Core.Array(T, N), where T
is the type of elements in the array, and N is the number of elements. Notably
this leaves room for supporting multi-dimensional arrays by adding further
optional size parameters, either in the Array type or in a similar sibling
type.
Here is a provisional vocabulary table to compare with other languages:
| Owning type | Runtime Sized | Compile-time Sized |
|---|---|---|
| Direct, Immutable Size | - | array(T, N) / Core.Array(T, N) |
| Indirect, Immutable Size | ? | Core.Box(Array(T, N)) |
| Indirect, Mutable Size | Core.Buf(T) |
- |
Carbon does not have proposed names for heap-allocated storage, so we use some
placeholders here, in order to show where Array fits into the picture:
Box(T) for a heap-allocated T value.Buf(T) for a heap-buffer of T values.An indirect, immutably-sized buffer does not have a clearly expressible syntax
at the moment. Box([T]) is the closest fit with the current provisional syntax
for slices. But [T] is a sized pointer, which would make this type a
heap-allocated sized pointer, rather than a heap-allocated fixed-size array.
This is in contrast with Rust where &[T] is a slice, and [T] is a fixed-size
buffer; so it then follows that Box<[T]> is a heap-allocated fixed-size
buffer.
Because arrays will be very common in Carbon code, we want to privilege their
usage. There are at least two ways in which we can do so. The first is to
include them in the prelude library of the Core package. This ensures they
are available in every Carbon file as Core.Array(T, N). The second is by
making the type available through a shorthand without being qualfiied by the
Core package name. We propose the array(T, N) builtin keyword as that
shorthand.
As this proposal is addressing the question of introducing a new prelude
library type in Core, it is mostly focused on the goal
Code that is easy to read, understand, and write
This proposal aims to make code easy to understand by using a name that is
consistent across systems programming languages, and avoiding names that have
conflicting meaning. It also uses a standard type syntax, with a type in the
Core package, making the type and its documentation maximally discoverable
without requiring special-casing.
We introduced some more specific sub-goals above:
This proposal privileges Core.Array as it will appear frequently in code, by
placing it in the prelude library. This avoids the need for developers to
import another Core library in order to access the type.
Since array types are expected to be very frequent, we also propose an array
builtin keyword as a shorthand. This is in line with the
No predeclared identifiers
proposal, and uses a lowercase spelling to mark the word as a builtin. The
spelling of array(T, N) will resolve to the library type Core.Array(T, N).
In this proposal, we avoid introducing additional syntax (such as with [T; N]
or (1, 2)) because the frequency of use of arrays will be lower than that of
fundamental types and tuples.
We introduce a type name, with a keyword that has a clear relationship to the generic type name, rather than making arrays look more like slices but without being a pointer. This is maent to avoid the confusion raised when removing syntax changes the meaning significantly, and especially in ways that differ from defaults/options for a single language concept.
We propose using the Array type name, and array shorthand, in line with how
other languages use the same term. When a direct-storage array type is part of
the language, it's consistently referred to as an "array" without
qualifications.
Most importantly, the name is consistent with the meaning in C++ and its
standard library (std::array<T, N>) as well as with Rust, the languages which
we expect Carbon code to interact with the most.
The name Vector is a possible choice for a fixed-length set of values, due to
its mathematical meaning, as was originally proposed for the direct-storage
immutably-sized array type in Swift. However any use of the name Vector in a
core systems programming language construct is fraught. Either the name is to be
incorrectly confused with a mathematical vector or with a C++ std::vector. We
avoid the confusion by avoiding this name.
Core packageAt this time, the Core package remains small, but there will come a time where
the names within need to be split into smaller namespaces. Then the name
Core.Array, among others, will become longer and the act of previleging the
name through the array keyword will become more pronounced and helpful. At
this time, we don't propose to put Array into a namespace in Core as there's
no such existing structure to point to yet.
[T; N] builtin syntaxThis is the current syntax used by the toolchain, however it had the following problems raised:
[T], but very
different in nature, being storage instead of a reference to storage.[T] is a slice, [T; N] would better suit a compile-time-sized
slice.The syntax for a slice may also be changed, we discussed
adding a pointer annotation
to it, such as [T]* and [T; N]*. Some downsides remained:
[T; N]* syntax would be a fixed-size slice, rather than a pointer to
an array. This leaves no room for writing a pointer to an array, which can
indicate a different intent, that it always includes the full memory range
of the array. Without this distinction, we can't model both
std::span<T, N> and std::array<T, N>* in code migrated from C++ to
Carbon and would need to collapse these to a single type.array [T; N] builtin syntaxThis introduces a keyword as a modifier of a fixed-size slice, rather than a
builtin forwarding type. While arrays will be very common, it's not clear that
they rise to the level of requiring breaking the languages naming rules (using a
lowercase name) in order to provide a shorthand. And the shorthand is longer in
the end than the Array(T, N) being proposed here. So this uses a larger
weirdness budget for privileging the type while achieving less conciseness.
This has a similar issue as with [T; N] but in the reverse. Removing the
array modifier keyword changes the meaning of the type expression in ways that
are larger than a default/modifier relationship. Fixed-size slices are not the
more-default array.
The use of a lowercase keyword also costs us by preventing users from using the
word array in variables, a name which is quite common.
Core.Array(T, N) library typeProviding just the library type is possible, but arrays will be one of the most
common types in Carbon code, as described earlier. Privileging them with a
shorthand that avoids Core. will help make Carbon code significantly more
concise, due to the frequency, without hurting understandability. This makes it
worth the tradeoff of putting a name into the file scope (by way of a builtin
type).
Core.Array(T, N) to the file scopeWe considered a direction where some subset of names in the Core packages's
prelude library are imported automatically to the file scope. This would be
similar to the
Rust std::prelude module,
which names aliases that are pulled into the global scope.
Importing names poses challenges for migrating code to Carbon. Names imported
from a library may allow shadowing in other scopes, which would create ambiguity
about the meaning of these names. And names imported from a library would not be
avoidable using the raw identifier syntax: Array and r#Array would both
refer to the same imported Core.Array type. This would present a challenge for
code migrated to Carbon which uses the name Array in its own types or
variables. Whereas with a builtin keyword, array and r#array refer to
different things: The first is the keyword, and the second is a name that the
developer can use freely for other purposes.
The No predeclared identifiers, Core is a keyword proposal discusses in more detail why this approach was not taken.