static variablesA Carbon class is a user-defined
record type. This
is the primary mechanism for users to define new types in Carbon. A class has
members that are referenced by their names, in contrast to a
Carbon tuple which defines a
product type whose members are
referenced positionally.
Carbon supports both named, or "nominal", and unnamed, anonymous, or "structural", class types. Nominal class types are all distinct, but structural types are equal if they have the same sequence of member types and names. Structural class literals may be used to initialize or assign values to nominal class variables.
The use cases for classes include both cases motivated by C++ interop, and cases that we expect to be included in idiomatic Carbon-only code.
This design currently only attempts to address the "data classes" use case. Addressing the other use cases is future work.
Data classes are types that consist of data fields that are publicly accessible and directly read and manipulated by client code. They have few if any methods, and generally are not involved in inheritance at all.
Examples include:
SortedMap or HashMapProperties:
Expected in idiomatic Carbon-only code.
Background: Kotlin has a dedicated concise syntax for defining data classes that avoids boilerplate. Python has a data class library, proposed in PEP 557, that fills a similar role.
There are several categories of types that support encapsulation. This is done by making their data fields private so access and modification of values are all done through methods defined on the type.
The common case for encapsulated types are those that do not participate in inheritance. These types neither support being inherited from (they are "final") nor do they extend other types.
Examples of this use case include:
Datestd::unique_ptr or
a file handleMutexWe expect two kinds of methods on these types: public methods defining the API for accessing and manipulating values of the type, and private helper methods used as an implementation detail of the public methods.
These types are expected in idiomatic Carbon-only code. Extending this design to support these types is future work.
The subtyping you get with inheritance is that you may assign the address of an object of a derived type to a pointer to its base type. For this to work, the compiler needs implementation strategies that allow operations performed through the pointer to the base type work independent of which derived type it actually points to. These strategies include:
Note that these subtyping implementation strategies generally rely on encapsulation, but encapsulation is not a strict requirement in all cases.
This subtyping relationship also creates safety concerns, which Carbon should protect against. Slicing problems can arise when the source or target of an assignment is a dereferenced pointer to the base type. It is also incorrect to delete an object with a non-virtual destructor through a pointer to a base type.
Carbon will fully support single-inheritance type hierarchies with polymorphic types.
Polymorphic types support dynamic dispatch using a vtable, and data members, but only single inheritance. Individual methods opt in to using dynamic dispatch, so types will have a mix of "virtual" and non-virtual methods. Polymorphic types support traditional object-oriented single inheritance, a mix of subtyping and implementation and code reuse.
We exclude complex multiple inheritance schemes, virtual inheritance, and so on from this use case. This is to avoid the complexity and overhead they bring, particularly since the use of these features in C++ is generally discouraged. The rule is that every type has at most one base type with data members for subtyping purposes. Carbon will support additional base types as long as they don't have data members or don't support subtyping.
Background: The "Nothing is Something" talk by Sandi Metz and the Composition Over Inheritance Principle describe design patterns to use instead of multiple inheritance to support types that vary over multiple axes.
In rare cases where the complex multiple inheritance schemes of C++ are truly needed, they can be effectively approximated using a combination of these simpler building blocks.
Polymorphic types support a number of different kinds of methods:
Note that there are two uses for protected methods: those implemented in the base and called in the descendant, and the other way around. "The End Of Object Inheritance & The Beginning Of A New Modularity" talk by Augie Fackler and Nathaniel Manista discusses design patterns that split up types to reduce the number of kinds of calls between base and derived types, and make sure calls only go in one direction.
We expect polymorphic types in idiomatic Carbon-only code, at least for the medium term. Extending this design to support polymorphic types is future work.
We distinguish the specific case of polymorphic base classes that have no data members:
Removing support for data fields greatly simplifies supporting multiple inheritance. For example, it removes the need for a mechanism to figure out the offset of those data fields in the object. Similarly we don't need C++'s virtual inheritance to avoid duplicating those fields. Some complexities still remain, such as pointers changing values when casting to a secondary parent type, but these seem manageable given the benefits of supporting this useful case of multiple inheritance.
While an interface base class is generally for providing an API that allows decoupling two pieces of code, a polymorphic type is a collaboration between a base and derived type to provide some functionality. This is a bit like the difference between a library and a framework, where you might use many of the former but only one of the latter.
Interface base classes are primarily used for subtyping. The extent of implementation reuse is generally limited by the lack of data members, and the decoupling role they play is usually about defining an API as a set of public pure-virtual methods. Compared to other polymorphic types, they more rarely have methods with implementations (virtual or not), or have methods with restricted access. The main use case is when there is a method that is implemented in terms of pure-virtual methods. Those pure-virtual methods may be marked as protected to ensure they are only called through the non-abstract API, but can still be implemented in descendants.
While it is typical for this case to be associated with single-level inheritance hierarchies, there are some cases where there is an interface at the root of a type hierarchy and polymorphic types as interior branches of the tree. The case of generic interfaces extending or requiring other interface would also be modeled by deeper inheritance hierarchies.
An interface as base class needs to either have a virtual destructor or forbid deallocation.
There is significant overlap between interface base classes and Carbon interfaces. Both represent APIs as a collection of method names and signatures to implement. The subset of interfaces that support dynamic dispatch are called object-safe, following Rust:
Self in the signature of a method in a contravariant
position like a parameter.The restrictions on object-safe interfaces match the restrictions on base class
methods. The main difference is the representation in memory. A type extending a
base class with virtual methods includes a pointer to the table of methods in
the object value itself, while a type implementing an interface would store the
pointer alongside the pointer to the value in a DynPtr(MyInterface). Of
course, the interface option also allows the method table to be passed at
compile time.
Note: This presumes that we include some concept of final methods in
interfaces to match non-virtual functions in base classes.
We expect idiomatic Carbon-only code to generally use Carbon interfaces instead of interface base classes. We may still support interface base classes long term if we determine that the ability to put the pointer to the method implementations in the object value is important for users, particularly with a single parent as in the polymorphic type case. Extending this design to support interface base classes is future work.
Background: C++ abstract base classes that don't have data members and Java interfaces model this case.
While it is not common, there are cases where C++ code uses inheritance without dynamic dispatch or a vtable. Instead, methods are never overridden, and derived types only add data and methods. There are some cases where this is done in C++ but would be done differently in Carbon:
However, there are still some cases where non-virtual inheritance makes sense. One is a parameterized type where a prefix of the data is the same independent of the parameter. An example of this is containers with a small-buffer optimization, as described in the talk CppCon 2016: Chandler Carruth "High Performance Code 201: Hybrid Data Structures". By moving the data and methods that don't depend on the buffer size to a base class, we reduce the instantiation overhead for monomorphization. The base type is also useful for reducing instantiation for consumers of the container, as long as they only need to access methods defined in the base.
Another case for non-virtual inheritance is for different node types within a data structure that have some data members in common. This is done in LLVM's map, red-black tree, and list data structure types. In a linked list, the base type might have the next and previous pointers, which is enough for a sentinel node, and there would also be a derived type with the actual data member. The base type can define operations like "splice" that only operate on the pointers not the data, and this is in fact enforced by the type system. Only the derived node type needs to be parameterized by the element type, saving on instantiation costs as before.
Many of the concerns around non-polymorphic inheritance are the same as for the non-virtual methods of polymorphic types. Assignment and destruction are examples of operations that need particular care to be sure they are only done on values of the correct type, rather than through a subtyping relationship. This means having some extrinsic way of knowing when it is safe to downcast before performing one of those operations, or performing them on pointers that were never upcast to the base type.
While Carbon won't support all the C++ forms of multiple inheritance, Carbon
code will still need to interoperate with C++ code that does. Of particular
concern are the std::iostream family of types. Most uses of those types are
the input and output variations or could be migrated to use those variations,
not the harder bidirectional cases.
Much of the complexity of this interoperation could be alleviated by adopting the restriction that Carbon code can't directly access the fields of a virtual base class. In the cases where such access is needed, the workaround is to access them through C++ functions.
We do not expect idiomatic Carbon-only code to use multiple inheritance. Extending this design to support interopating with C++ types using multiple inheritance is future work.
A mixin is a declaration of data, methods, and interface implementations that can be added to another type, called the "main type". The methods of a mixin may also use data, methods, and interface implementations provided by the main type. Mixins are designed around implementation reuse rather than subtyping, and so don't need to use a vtable.
A mixin might be an implementation detail of a data class, object type, or derived type of a polymorphic type. A mixin might partially implement an interface as base class.
Examples: intrusive linked list, intrusive reference count
In both of these examples, the mixin needs the ability to convert between a pointer to the mixin's data (like a "next" pointer or reference count) and a pointer to the containing object with the main type.
Mixins are expected in idiomatic Carbon-only code. Extending this design to support mixins is future work.
Background: Mixins are typically implemented using the curiously recurring template pattern in C++, but other languages support them directly.
See how other languages tackle this problem:
Beyond tuples, Carbon allows defining record types. This is the primary mechanism for users to extend the Carbon type system and is deeply rooted in C++ and its history (C and Simula). We call them classes rather than other terms as that is both familiar to existing programmers and accurately captures their essence: they define the types of objects with (optional) support for methods, encapsulation, and so on.
A class type defines the interpretation of the bytes of a value of that type, including the size, data members, and layout. It defines the operations that may be performed on those values, including what methods may be called. A class type may directly have constant members. The type itself is a compile-time immutable constant value.
The members of a class are named, and are accessed with the . notation. For
example:
var p: Point2D = ...;
// Data member access
p.x = 1;
p.y = 2;
// Method call
Print(p.DistanceFromOrigin());
Tuples are used for cases where accessing the members positionally is more appropriate.
The data members of a class, or fields, have an order that matches the order they are declared in. This determines the order of those fields in memory, and the order that the fields are destroyed when a value goes out of scope or is deallocated.
Structural data classes, or struct types, are convenient for defining data classes in an ad-hoc manner. They would commonly be used:
class variables or valuesNote that struct types are examples of data class types and are still classes,
but we expect later to support more ways to define data class types. Also note
that there is no struct keyword, "struct" is just convenient shorthand
terminology for a structural data class.
Future work: We intend to support nominal data classes as well.
Structural data class literals, or struct literals, are written using this syntax:
var kvpair: auto = {.key = "the", .value = 27};
This produces a struct value with two fields:
key" and has the value "the". The type of the
field is set to the type of the value, and so is String.value" and has the value 27. The type of the
field is set to the type of the value, and so is Int.Note: A comma , may optionally be included after the last field:
var kvpair: auto = {.key = "the", .value = 27,};
Open question: To keep the literal syntax from being ambiguous with compound statements, Carbon will adopt some combination of:
{ to see if it is followed by .name;{ to introduce a compound statement in contexts introduced
by a keyword where they are required, like requiring { ... } around the
cases of an if...else statement.The type of kvpair in the last example would be represented by this
expression:
{.key: String, .value: Int}
This syntax is intended to parallel the literal syntax, and so uses commas (,)
to separate fields instead of a semicolon (;) terminator. This choice also
reflects the expected use inline in function signature declarations.
Struct types may only have data members, so the type declaration is just a list of field names and types. The result of a struct type expression is an immutable compile-time type value.
Note: Like with struct literal expressions, a comma , may optionally be
included after the last field:
{.key: String, .value: Int,}
Also note that {} represents both the empty struct literal and its type.
When initializing or assigning a variable with a data class such as a struct type to a struct value on the right hand side, the order of the fields does not have to match, just the names.
var different_order: {.x: Int, .y: Int} = {.y = 2, .x = 3};
Assert(different_order.x == 3);
Assert(different_order.y == 2);
Open question: What operations and in what order happen for assignment and initialization?
When initializing or assigning, the order of fields is determined from the
target on the left side of the =. This rule matches what we expect for classes
with encapsulation more generally.
Generally speaking, the operations that are available on a data class value, such as a value with a struct type, are dependent on those operations being available for all the types of the fields.
For example, two values of the same data class type may be compared for equality if equality is supported for every member of the type:
var p: auto = {.x = 2, .y = 3};
Assert(p == {.x = 2, .y = 3});
Assert(p != {.x = 2, .y = 4});
Similarly, a data class has an unformed state if all its members do. Treatment of unformed state follows #257.
== and != are defined on a data class type if all its field types support
it:
Assert({.x = 2, .y = 4} != {.x = 5, .y = 3});
Open question: Which other comparisons are supported is the subject of question-for-leads issue #710.
// Illegal
Assert({.x = 2, .y = 3} != {.y = 4, .x = 5});
Destruction is performed field-wise in reverse order.
Extending user-defined operations on the fields to an operation on an entire data class is future work.
This includes features that need to be designed, questions to answer, and a description of the provisional syntax in use until these decisions have been made.
The declarations for nominal class types will have a different format. Provisionally we have been using something like this:
class TextLabel {
var x: Int;
var y: Int;
var text: String;
}
It is an open question, though, how we will address the
different use cases. For example, we might mark
data classes with an impl as Data {} line.
There are a variety of options for constructing class values, we might choose to support, including initializing from struct values:
var p1: Point2D = {.x = 1, .y = 2};
var p2: auto = {.x = 1, .y = 2} as Point2D;
var p3: auto = Point2D{.x = 1, .y = 2};
var p4: auto = Point2D(1, 2);
Additional types may be defined in the scope of a class definition.
class StringCounts {
class Node {
var key: String;
var count: Int;
}
var counts: Vector(Node);
}
The inner type is a member of the type, and is given the name
StringCounts.Node.
A class definition may provisionally include references to its own name in
limited ways, similar to an incomplete type. What is allowed and forbidden is an
open question.
class IntListNode {
var data: Int;
var next: IntListNode*;
}
An equivalent definition of IntListNode, since Self is an alias for the
current type, is:
class IntListNode {
var data: Int;
var next: Self*;
}
Self refers to the innermost type declaration:
class IntList {
class IntListNode {
var data: Int;
var next: Self*;
}
var first: IntListNode*;
}
Other type constants can provisionally be defined using a let declaration:
class MyClass {
let Pi: Float32 = 3.141592653589793;
let IndexType: Type = Int;
}
There are definite questions about this syntax:
:! generic syntax decided in
issue #565?alias declarations? They would only be used for names,
not other constant values.A future proposal will incorporate method declaration, definition, and calling into classes. The syntax for declaring methods has been decided in question-for-leads issue #494. Summarizing that issue:
fn Diameter[me: Self]() -> Float { ... }fn Expand[addr me: Self*](distance: Float) { ... }fn Create() -> Self { ... }We do not expect to have implicit member access in methods, so inside the method
body members will be accessed through the me parameter.
Structs are being considered as a possible mechanism for implementing optional named parameters. We have three main candidate approaches: allowing struct types to have field defaults, having dedicated support for destructuring struct values in pattern contexts, or having a dedicated optional named parameter syntax.
If struct types could have field defaults, you could write a function declaration with all of the optional parameters in an option struct:
fn SortIntVector(
v: Vector(Int)*,
options: {.stable: Bool = false,
.descending: Bool = false} = {}) {
// Code using `options.stable` and `options.descending`.
}
// Uses defaults of `.stable` and `.descending` equal to `false`.
SortIntVector(&v);
SortIntVector(&v, {});
// Sets `.stable` option to `true`.
SortIntVector(&v, {.stable = true});
// Sets `.descending` option to `true`.
SortIntVector(&v, {.descending = true});
// Sets both `.stable` and `.descending` options to `true`.
SortIntVector(&v, {.stable = true, .descending = true});
// Order can be different for arguments as well.
SortIntVector(&v, {.descending = true, .stable = true});
We might instead support destructuring struct patterns with defaults:
fn SortIntVector(
v: Vector(Int)*,
{stable: Bool = false, descending: Bool = false}) {
// Code using `stable` and `descending`.
}
This would allow the same syntax at the call site, but avoids some concerns with field defaults and allows some other use cases such as destructuring return values.
We might support destructuring directly:
var {key: String, value: Int} = ReturnKeyValue();
or by way of a mechanism that converts a struct into a tuple:
var (key: String, value: Int) =
ReturnKeyValue().extract(.key, .value);
// or maybe:
var (key: String, value: Int) =
ReturnKeyValue()[(.key, .value)];
Similarly we might support optional named parameters directly instead of by way of struct types.
Some discussion on this topic has occurred in:
We will need some way of controlling access to the members of classes. By default, all members are fully publicly accessible, as decided in issue #665.
The set of access control options Carbon will support is an open question. Swift and C++ (especially w/ modules) provide a lot of options and a pretty wide space to explore here.
This includes destructors, copy and move operations, as well as other Carbon
operators such as + and /. We expect types to implement these operations by
implementing corresponding interfaces, see
the generics overview.
Carbon will need ways of saying:
class type has a virtual method tableclass type extends a base typeclass type is "final" and may not be extended furtherMultiple inheritance will be limited in at least a couple of ways:
There is a document considering the options for constructing objects with inheritance.
We want four things so that Carbon's object-safe interfaces may interoperate with C++ abstract base classes without data members, matching the interface as base class use case:
AsBaseClass(MyInterface).AsInterface(MyIBC).DynPtr
of the corresponding interface.DynPtr(MyInterface) values to a proxy type that extends
the corresponding base class AsBaseType(MyInterface).Note that the proxy type extending AsBaseType(MyInterface) would be a
different type than DynPtr(MyInterface) since the receiver input to the
function members of the vtable for the former does not match those in the
witness table for the latter.
We will need some way to declare mixins. This syntax will need a way to distinguish defining versus requiring member variables. Methods may additionally be given a default definition but may be overridden. Interface implementations may only be partially provided by a mixin. Mixin methods will need to be able to convert between pointers to the mixin type and the main type.
Open questions include whether a mixin is its own type that is a member of the containing type, and whether mixins are templated on the containing type. Mixins also complicate how constructors work.
We need some way of addressing two safety concerns created by non-virtual inheritance:
These concerns would be resolved by distinguishing between pointers that point to a specified type only and those that point to a type or any subtype. The latter case would have restrictions to prevent misuse. This distinction may be more complexity than is justified for a relatively rare use case. An alternative approach would be to forbid destruction of non-final types without virtual destructors, and forbid assignment of non-final types entirely.
This open question is being considered in question-for-leads issue #652.
Carbon will need some way for users to specify the memory layout of class types beyond simple ordering of fields, such as controlling the packing and alignment for the whole type or individual members.
static variablesAt the moment, there is no proposal to support
static member variables,
in line with avoiding global variables more generally. Carbon may need some
support in this area, though, for parity with and migration from C++.
Carbon might want to support members of a type that are accessed like a data member but return a computed value like a function. This has a number of implications:
However, there are likely to be differences between computed properties and
other data members, such as the ability to take the address of them. We might
want to support "read only" data members, that can be read through the public
api but only modified with private access, for data members which may need to
evolve into a computed property. There are also questions regarding how to
support assigning or modifying computed properties, such as using +=.
We should define a way for defining implementations of interfaces for struct types. To satisfy coherence, these implementations would have to be defined in the library with the interface definition. The syntax might look like:
interface ConstructWidgetFrom {
fn Construct(Self) -> Widget;
}
external impl {.kind: WidgetKind, .size: Int}
as ConstructWidgetFrom { ... }
In addition, we should define a way for interfaces to define templated blanket
implementations for data classes more generally. These
implementations will typically subject to the criteria that all the data fields
of the type must implement the interface. An example use case would be to say
that a data class is serializable if all of its fields were. For this we will
need a type-of-type for capturing that criteria, maybe something like
DataFieldsImplement(MyInterface). The templated implementation will need some
way of iterating through the fields so it can perform operations fieldwise. This
feature should also implement the interfaces for any tuples whose fields satisfy
the criteria.
It is an open question how define implementations for binary operators. For
example, if Int is comparable to Float32, then {.x = 3, .y = 2.72} should
be comparable to {.x = 3.14, .y = 2}. The trick is how to declare the criteria
that "T is comparable to U if they have the same field names in the same
order, and for every field x, the type of T.x implements ComparableTo for
the type of U.x."