A Carbon class is a user-defined record type. A class has members that are referenced by their names, in contrast to a Carbon tuple which defines a product type whose members are referenced positionally.
Classes are the primary mechanism for users to extend the Carbon type system and are deeply rooted in C++ and its history (C and Simula). We call them classes rather than other terms as that is both familiar to existing programmers and accurately captures their essence: they define the types of objects with (optional) support for methods, encapsulation, and so on.
Carbon supports both named, or "nominal", and unnamed, anonymous, or "structural", class types. Nominal class types are all distinct, but structural types are equal if they have the same sequence of member types and names. Structural class literals may be used to initialize or assign values to nominal class variables.
A class type defines the interpretation of the bytes of a value of that type, including the size, data members, and layout. It defines the operations that may be performed on those values, including what methods may be called. A class type may directly have constant members. The type itself is a compile-time immutable constant value.
The use cases for classes include both cases motivated by C++ interop, and cases that we expect to be included in idiomatic Carbon-only code.
This design currently only attempts to address the "data classes" and "encapsulated types" use cases. Addressing the "interface as base class", "interop with C++ multiple inheritance" and "mixin" use cases is future work.
Data classes are types that consist of data fields that are publicly accessible and directly read and manipulated by client code. They have few if any methods, and generally are not involved in inheritance at all.
Examples include:
SortedMap or HashMapProperties:
Expected in idiomatic Carbon-only code.
Background: Kotlin has a dedicated concise syntax for defining data classes that avoids boilerplate. Python has a data class library, proposed in PEP 557, that fills a similar role.
There are several categories of types that support encapsulation. This is done by making their data fields private so access and modification of values are all done through methods defined on the type.
The common case for encapsulated types are those that do not participate in inheritance. These types neither support being inherited from (they are "final") nor do they extend other types.
Examples of this use case include:
Datestd::unique_ptr or
a file handleMutexWe expect two kinds of methods on these types: public methods defining the API for accessing and manipulating values of the type, and private helper methods used as an implementation detail of the public methods.
These types are expected in idiomatic Carbon-only code.
The subtyping you get with inheritance is that you may assign the address of an object of a derived type to a pointer to its base type. For this to work, the compiler needs implementation strategies that allow operations performed through the pointer to the base type work independent of which derived type it actually points to. These strategies include:
Note that these subtyping implementation strategies generally rely on encapsulation, but encapsulation is not a strict requirement in all cases.
This subtyping relationship also creates safety concerns, which Carbon should protect against. Slicing problems can arise when the source or target of an assignment is a dereferenced pointer to the base type. It is also incorrect to delete an object with a non-virtual destructor through a pointer to a base type.
Carbon will fully support single-inheritance type hierarchies with polymorphic types.
Polymorphic types support dynamic dispatch using a vtable, and data members, but only single inheritance. Individual methods opt in to using dynamic dispatch, so types will have a mix of "virtual" and non-virtual methods. Polymorphic types support traditional object-oriented single inheritance, a mix of subtyping and implementation and code reuse.
We exclude complex multiple inheritance schemes, virtual inheritance, and so on from this use case. This is to avoid the complexity and overhead they bring, particularly since the use of these features in C++ is generally discouraged. The rule is that every type has at most one base type with data members for subtyping purposes. Carbon will support additional base types as long as they don't have data members or don't support subtyping.
Background: The "Nothing is Something" talk by Sandi Metz and the Composition Over Inheritance Principle describe design patterns to use instead of multiple inheritance to support types that vary over multiple axes.
In rare cases where the complex multiple inheritance schemes of C++ are truly needed, they can be effectively approximated using a combination of these simpler building blocks.
Polymorphic types support a number of different kinds of methods:
Note that there are two uses for protected methods: those implemented in the base and called in the descendant, and the other way around. "The End Of Object Inheritance & The Beginning Of A New Modularity" talk by Augie Fackler and Nathaniel Manista discusses design patterns that split up types to reduce the number of kinds of calls between base and derived types, and make sure calls only go in one direction.
We expect polymorphic types in idiomatic Carbon-only code, at least for the medium term. Extending this design to support polymorphic types is future work.
We distinguish the specific case of polymorphic base classes that have no data members:
Removing support for data fields greatly simplifies supporting multiple inheritance. For example, it removes the need for a mechanism to figure out the offset of those data fields in the object. Similarly we don't need C++'s virtual inheritance to avoid duplicating those fields. Some complexities still remain, such as pointers changing values when casting to a secondary parent type, but these seem manageable given the benefits of supporting this useful case of multiple inheritance.
While an interface base class is generally for providing an API that allows decoupling two pieces of code, a polymorphic type is a collaboration between a base and derived type to provide some functionality. This is a bit like the difference between a library and a framework, where you might use many of the former but only one of the latter.
Interface base classes are primarily used for subtyping. The extent of implementation reuse is generally limited by the lack of data members, and the decoupling role they play is usually about defining an API as a set of public pure-virtual methods. Compared to other polymorphic types, they more rarely have methods with implementations (virtual or not), or have methods with restricted access. The main use case is when there is a method that is implemented in terms of pure-virtual methods. Those pure-virtual methods may be marked as protected to ensure they are only called through the non-abstract API, but can still be implemented in descendants.
While it is typical for this case to be associated with single-level inheritance hierarchies, there are some cases where there is an interface at the root of a type hierarchy and polymorphic types as interior branches of the tree. The case of interfaces extending or requiring other interface would also be modeled by deeper inheritance hierarchies.
An interface as base class needs to either have a virtual destructor or forbid deallocation.
There is significant overlap between interface base classes and Carbon interfaces. Both represent APIs as a collection of method names and signatures to implement. The subset of interfaces that support dynamic dispatch are called object-safe, following Rust:
Self in the signature of a method in a contravariant
position like a parameter.The restrictions on object-safe interfaces match the restrictions on base class
methods. The main difference is the representation in memory. A type extending a
base class with virtual methods includes a pointer to the table of methods in
the object value itself, while a type implementing an interface would store the
pointer alongside the pointer to the value in a DynPtr(MyInterface). Of
course, the interface option also allows the method table to be passed at
compile time.
Note: This presumes that we include some concept of final methods in
interfaces to match non-virtual functions in base classes.
We expect idiomatic Carbon-only code to generally use Carbon interfaces instead of interface base classes. We may still support interface base classes long term if we determine that the ability to put the pointer to the method implementations in the object value is important for users, particularly with a single parent as in the polymorphic type case. Extending this design to support interface base classes is future work.
Background: C++ abstract base classes that don't have data members and Java interfaces model this case.
While it is not common, there are cases where C++ code uses inheritance without dynamic dispatch or a vtable. Instead, methods are never overridden, and derived types only add data and methods. There are some cases where this is done in C++ but would be done differently in Carbon:
However, there are still some cases where non-virtual inheritance makes sense. One is a parameterized type where a prefix of the data is the same independent of the parameter. An example of this is containers with a small-buffer optimization, as described in the talk CppCon 2016: Chandler Carruth "High Performance Code 201: Hybrid Data Structures". By moving the data and methods that don't depend on the buffer size to a base class, we reduce the instantiation overhead for monomorphization. The base type is also useful for reducing instantiation for consumers of the container, as long as they only need to access methods defined in the base.
Another case for non-virtual inheritance is for different node types within a data structure that have some data members in common. This is done in LLVM's map, red-black tree, and list data structure types. In a linked list, the base type might have the next and previous pointers, which is enough for a sentinel node, and there would also be a derived type with the actual data member. The base type can define operations like "splice" that only operate on the pointers not the data, and this is in fact enforced by the type system. Only the derived node type needs to be parameterized by the element type, saving on instantiation costs as before.
Many of the concerns around non-polymorphic inheritance are the same as for the non-virtual methods of polymorphic types. Assignment and destruction are examples of operations that need particular care to be sure they are only done on values of the correct type, rather than through a subtyping relationship. This means having some extrinsic way of knowing when it is safe to downcast before performing one of those operations, or performing them on pointers that were never upcast to the base type.
While Carbon won't support all the C++ forms of multiple inheritance, Carbon
code will still need to interoperate with C++ code that does. Of particular
concern are the std::iostream family of types. Most uses of those types are
the input and output variations or could be migrated to use those variations,
not the harder bidirectional cases.
Much of the complexity of this interoperation could be alleviated by adopting the restriction that Carbon code can't directly access the fields of a virtual base class. In the cases where such access is needed, the workaround is to access them through C++ functions.
We do not expect idiomatic Carbon-only code to use multiple inheritance. Extending this design to support interoperating with C++ types using multiple inheritance is future work.
A mixin is a declaration of data, methods, and interface implementations that can be added to another type, called the "main type". The methods of a mixin may also use data, methods, and interface implementations provided by the main type. Mixins are designed around implementation reuse rather than subtyping, and so don't need to use a vtable.
A mixin might be an implementation detail of a data class, or encapsulated type. A mixin might partially implement an interface as base class.
Examples: intrusive linked list, intrusive reference count
In both of these examples, the mixin needs the ability to convert between a pointer to the mixin's data (like a "next" pointer or reference count) and a pointer to the containing object with the main type.
Mixins are expected in idiomatic Carbon-only code. Extending this design to support mixins is future work.
Background: Mixins are typically implemented using the curiously recurring template pattern in C++, but other languages support them directly.
See how other languages tackle this problem:
The members of a class are named, and are accessed with the . notation. For
example:
var p: Point2D = ...;
// Data member access
p.x = 1;
p.y = 2;
// Method call
Print(p.DistanceFromOrigin());
Tuples are used for cases where accessing the members positionally is more appropriate.
The data members of a class, or fields, have an order that matches the order they are declared in. This determines the order of those fields in memory, and the order that the fields are destroyed when a value goes out of scope or is deallocated.
Structural data classes, or struct types, are convenient for defining data classes in an ad-hoc manner. They would commonly be used:
class variables or valuesNote that struct types are examples of data class types and are still classes.
The "nominal data classes" section describes another
way to define a data class type. Also note that there is no struct keyword,
"struct" is just convenient shorthand terminology for a structural data class.
Structural data class literals, or struct literals, are written using this syntax:
var kvpair: auto = {.key = "the", .value = 27};
This produces a struct value with two fields:
key" and has the value "the". The type of the
field is set to the type of the value, and so is String.value" and has the value 27. The type of the
field is set to the type of the value, and so is i32.Note: A comma , may optionally be included after the last field:
var kvpair: auto = {.key = "the", .value = 27,};
Open question: To keep the literal syntax from being ambiguous with compound statements, Carbon will adopt some combination of:
{ to see if it is followed by .name;{ to introduce a compound statement in contexts introduced
by a keyword where they are required, like requiring { ... } around the
cases of an if...else statement.The type of kvpair in the last example would be represented by this
expression:
{.key: String, .value: i32}
This syntax is intended to parallel the literal syntax, and so uses commas (,)
to separate fields instead of a semicolon (;) terminator. This choice also
reflects the expected use inline in function signature declarations.
Struct types may only have data members, so the type declaration is just a list of field names and types. The result of a struct type expression is an immutable compile-time type value.
Note: Like with struct literal expressions, a comma , may optionally be
included after the last field:
{.key: String, .value: i32,}
Also note that {} represents both the empty struct literal and its type.
When initializing or assigning a variable with a data class such as a struct type to a struct value on the right hand side, the order of the fields does not have to match, just the names.
var different_order: {.x: i32, .y: i32} = {.y = 2, .x = 3};
Assert(different_order.x == 3);
Assert(different_order.y == 2);
Initialization and assignment occur field-by-field. The order of fields is
determined from the target on the left side of the =. This rule matches what
we expect for classes with encapsulation more generally.
Open question: What operations and in what order happen for assignment and initialization?
Generally speaking, the operations that are available on a data class value, such as a value with a struct type, are dependent on those operations being available for all the types of the fields.
For example, two values of the same data class type may be compared for equality or inequality if equality is supported for every member of the type:
var p: auto = {.x = 2, .y = 3};
Assert(p == {.x = 2, .y = 3});
Assert(p != {.x = 2, .y = 4});
Assert({.x = 2, .y = 4} != {.x = 5, .y = 3});
Equality and inequality comparisons are also allowed between different data class types when:
For example, since
comparison between i32 and u32 is defined,
equality comparison between values of types {.x: i32, .y: i32} and
{.y: u32, .x: u32} is as well. Equality and inequality comparisons compare
fields using the field order of the left-hand operand and stop once the outcome
of the comparison is determined. However, the comparison order and
short-circuiting are generally expected to affect only the performance
characteristics of the comparison and not its meaning.
Ordering comparisons, such as < and <=, use the order of the fields to do a
lexicographical comparison.
The argument types must have a matching order of the field names. Otherwise, the
restrictions on ordering comparisons between different data class types are
analogous to equality comparisons:
Implicit conversion from a struct type to a data class type is allowed when the set of field names is the same and implicit conversion is defined between the pairs of member types with the same field names. So calling a function effectively performs an initialization of each of the function's parameters from the caller's arguments, and will be valid when those initializations are all valid.
A data class has an unformed state if all its members do. Treatment of unformed state follows proposal #257.
Destruction is performed field-wise in reverse order.
Extending user-defined operations on the fields to an operation on an entire data class is future work.
References: The rules for assignment, comparison, and implicit conversion for argument passing were decided in question-for-leads issue #710.
The declarations for nominal class types will have:
abstract or base prefixclass introducer{, an open curly brace}, a close curly braceDeclarations should generally match declarations that can be declared in other
contexts, for example variable declarations with var will define
instance variables:
class TextLabel {
var x: i32;
var y: i32;
var text: String = "default";
}
The main difference here is that "default" is a default instead of an
initializer, and will be ignored if another value is supplied for that field
when constructing a value. Defaults must be constants whose value can be
determined at compile time.
To support circular references between class types, we allow
forward declaration of
types. Forward declarations end with semicolon ; after the name of the class,
instead of the block of declarations in curly braces {...}. A type that is
forward declared is considered incomplete until the end of a definition with the
same name.
// Forward declaration of `GraphNode`.
class GraphNode;
class GraphEdge {
var head: GraphNode*;
var tail: GraphNode*;
}
class GraphNode {
var edges: Vector(GraphEdge*);
}
// `GraphNode` is first complete here.
Open question: What is specifically allowed and forbidden with an incomplete type has not yet been decided.
SelfA class definition may provisionally include references to its own name in
limited ways. These limitations arise from the type not being complete until the
end of its definition is reached.
class IntListNode {
var data: i32;
var next: IntListNode*;
}
An equivalent definition of IntListNode, since the Self keyword is an alias
for the current type, is:
class IntListNode {
var data: i32;
var next: Self*;
}
Self refers to the innermost type declaration:
class IntList {
class IntListNode {
var data: i32;
// `Self` is `IntListNode`, not `IntList`.
var next: Self*;
}
var first: IntListNode*;
}
Any function with access to all the data fields of a class can construct one by converting a struct value to the class type:
var tl1: TextLabel = {.x = 1, .y = 2};
var tl2: auto = {.x = 1, .y = 2} as TextLabel;
Assert(tl1.x == tl2.x);
fn ReturnsATextLabel() -> TextLabel {
return {.x = 1, .y = 2};
}
var tl3: TextLabel = ReturnsATextLabel();
fn AcceptsATextLabel(tl: TextLabel) -> i32 {
return tl.x + tl.y;
}
Assert(AcceptsATextLabel({.x = 2, .y = 4}) == 6);
Note that a nominal class, unlike a struct type, can define default values for fields, and so may be initialized with a struct value that omits some or all of those fields.
Assignment to a struct value is also allowed in a function with access to all the data fields of a class. Assignment always overwrites all of the field members.
var tl: TextLabel = {.x = 1, .y = 2};
Assert(tl.text == "default");
// ✅ Allowed: assigns all fields
tl = {.x = 3, .y = 4, .text = "new"};
// ✅ Allowed: This statement is evaluated in two steps:
// 1. {.x = 5, .y = 6} is converted into a new TextLabel value,
// using default for field `text`.
// 2. tl is assigned to a TextLabel, which has values for all
// fields.
tl = {.x = 5, .y = 6};
Assert(tl.text == "default");
Open question: This behavior might be surprising because there is an ambiguity about whether to use the default value or the previous value for a field. We could require all fields to be specified when assigning, and only use field defaults when initializing a new value.
// ❌ Forbidden: should tl.text == "default" or "new"?
tl = {.x = 5, .y = 6};
Member functions can either be class functions or methods. Class functions are members of the type, while methods can only be called on instances.
A class function is like a C++ static member function, and is declared like a function at file scope. The declaration can include a definition of the function body, or that definition can be provided out of line after the class definition is finished. A common use is for constructor functions.
class Point {
fn Origin() -> Self {
return {.x = 0, .y = 0};
}
fn CreateCentered() -> Self;
var x: i32;
var y: i32;
}
fn Point.CreateCentered() -> Self {
return {.x = ScreenWidth() / 2, .y = ScreenHeight() / 2};
}
Class functions are members of the type, and may be accessed as using dot .
member access either the type or any instance.
var p1: Point = Point.Origin();
var p2: Point = p1.CreateCentered();
Method
declarations are distinguished from class function
declarations by having a self parameter in square brackets [...] before
the explicit parameter list in parens (...). There is no implicit member
access in methods, so inside the method body members are accessed through the
self parameter. Methods may be written lexically inline or after the class
declaration.
class Circle {
fn Diameter[self: Self]() -> f32 {
return self.radius * 2;
}
fn Expand[addr self: Self*](distance: f32);
var center: Point;
var radius: f32;
}
fn Circle.Expand[addr self: Self*](distance: f32) {
self->radius += distance;
}
var c: Circle = {.center = Point.Origin(), .radius = 1.5 };
Assert(Math.Abs(c.Diameter() - 3.0) < 0.001);
c.Expand(0.5);
Assert(Math.Abs(c.Diameter() - 4.0) < 0.001);
. member syntax, c.Diameter() and
c.Expand(...).Diameter computes and returns the diameter of the circle without modifying
the Circle instance. This is signified using [self: Self] in the method
declaration.c.Expand(...) does modify the value of c. This is signified using
[addr self: Self*] in the method declaration.The pattern 'addr self: type' means "first take the address of the argument,
which must be an
l-value, and
then match pattern 'self: type' against it".
If the method declaration also includes
deduced compile-time parameters,
the self parameter must be in the same list in square brackets [...]. The
self parameter may appear in any position in that list, as long as it appears
after any names needed to describe its type.
When defining a member function lexically inline, the body is deferred and processed as if it appeared immediately after the end of the outermost enclosing class, like in C++.
For example, given a class with inline function definitions:
class Point {
fn Distance[self: Self]() -> f32 {
return Math.Sqrt(self.x * self.x + self.y * self.y);
}
fn Make(x: f32, y: f32) -> Point {
return {.x = x, .y = y};
}
var x: f32;
var y: f32;
}
These are all parsed as if they were defined outside the class scope:
class Point {
fn Distance[self: Self]() -> f32;
fn Make(x: f32, y: f32) -> Point;
var x: f32;
var y: f32;
}
fn Point.Distance[self: Self]() -> f32 {
return Math.Sqrt(self.x * self.x + self.y * self.y);
}
fn Point.Make(x: f32, y: f32) -> Point {
return {.x = x, .y = y};
}
Member access is an expression; details are covered there. Because function definitions are deferred, name lookup in classes works the same regardless of whether a function is inline. The class body forms a scope for name lookup, and function definitions can perform unqualified name lookup within that scope.
For example:
class Square {
fn GetArea[self: Self]() -> f32 {
// ✅ OK: performs name lookup on `self`.
return self.size * self.size;
// ❌ Error: finds `Square.size`, but an instance is required.
return size * size;
// ❌ Error: an instance is required.
return Square.size * Square.size;
// ✅ OK: performs instance binding with `self`.
return self.(Square.size) * self.(Square.size);
// ✅ OK: uses unqualified name lookup to find `Square.size`, then performs
// instance binding with `self`.
return self.(size) * self.(size);
}
fn GetDoubled[self: Self]() -> Square {
// ✅ OK: performs name lookup on `Square` for `Create`.
return Square.Make(self.size);
// ✅ OK: performs unqualified name lookup within class scope for `Create`.
return Make(self.size);
// ✅ OK: performs name lookup on `self` for `Create`.
return self.Make(self.size);
}
fn Make(size: f32) -> Square;
var size: f32;
}
The example's name lookups refer to Create and size which are defined after
the example member access; this is valid because of
deferred member function definitions.
However, function signatures must still complete lookup without deferring. For example:
class List {
// ❌ Error: `Iterator` has not yet been defined.
fn Iterate() -> Iterator;
class Iterator {
...
}
// ✅ OK: The definition of Iterator is now available.
fn Iterate() -> Iterator;
}
An out-of-line function definition's parameters, return type, and body are evaluated as if in-scope. For example:
// ✅ OK: The return type performs unqualified name lookup into `List` for
// `Iterator`.
fn List.Iterate() -> Iterator {
...
}
We will mark data classes with an impl as Data {} line.
class TextLabel {
var x: i32;
var y: i32;
var text: String;
// This line makes `TextLabel` a data class, which defines
// a number of operations field-wise.
impl as Data {}
}
The fields of data classes must all be public. That line will add field-wise implementations and operations of all interfaces that a struct with the same fields would get by default.
The word Data here refers to an empty interface in the Carbon prologue. That
interface would then be part of our
strategy for defining how other interfaces are implemented for data classes.
References: Rationale for this approach is given in proposal #722.
Additional types may be defined in the scope of a class definition.
class StringCounts {
class Node {
var key: String;
var count: i32;
}
var counts: Vector(Node);
}
The inner type is a member of the type, and is given the name
StringCounts.Node. This case is called a member class since the type is a
class, but other kinds of type declarations, like choice types, are allowed.
Other type constants can be defined using a let declaration:
class MyClass {
let Pi:! f32 = 3.141592653589793;
let IndexType:! type = i32;
}
The :! indicates that this is defining a compile-time constant, and so does
not affect the storage of instances of that class.
You may declare aliases of the names of class members. This is to allow them to be renamed in multiple steps or support alternate names.
class StringPair {
var key: String;
var value: String;
alias first = key;
alias second = value;
}
var sp1: StringPair = {.key = "K", .value = "1"};
var sp2: StringPair = {.first = "K", .second = "2"};
Assert(sp1.first == sp2.key);
Assert(&sp1.first == &sp1.key);
Future work: This needs to be connected to the broader design of aliases, once that lands.
Carbon supports
inheritance
using a
class hierarchy,
on an opt-in basis. Classes by default are
final,
which means they may not be extended. To declare a class as allowing extension,
use either the base class or abstract class introducer:
base class MyBaseClass { ... }
A base class may be extended to get a derived class:
base class MiddleDerived {
extend base: MyBaseClass;
...
}
class FinalDerived {
extend base: MiddleDerived;
...
}
// ❌ Forbidden: class Illegal { extend base: FinalDerived; ... }
// may not extend `FinalDerived` since not declared `base` or `abstract`.
An _abstract class_ or abstract base class is a base class that may not be instantiated.
abstract class MyAbstractClass { ... }
// ❌ Forbidden: var a: MyAbstractClass = ...;
Future work: For now, the Carbon design only supports single inheritance. In the future, Carbon will support multiple inheritance with limitations on all base classes except the one listed first.
Terminology: We say MiddleDerived and FinalDerived are derived
classes, transitively extending or derived from MyBaseClass. Similarly
FinalDerived is derived from or extends MiddleDerived. MiddleDerived is
FinalDerived's immediate base class, and both MiddleDerived and
MyBaseClass are base classes of FinalDerived. Base classes that are not
abstract are called extensible classes.
A derived class has all the members of the class it extends, including data
members and methods, though it may not be able to access them if they were
declared private.
A base class may define virtual methods. These are methods whose implementation may be overridden in a derived class.
Only methods defined in the scope of the class definition may be virtual, not
any defined in
out-of-line interface impl declarations.
Interface methods may be implemented using virtual methods when the
impl is inline, and calls to
those methods by way of the interface will do virtual dispatch just like a
direct call to the method does.
Class functions may not be declared virtual.
A method is declared as virtual by using a virtual override keyword in its
declaration before fn.
base class MyBaseClass {
virtual fn Overridable[self: Self]() -> i32 { return 7; }
}
This matches C++, and makes it relatively easy for authors of derived classes to find the functions that can be overridden.
If no keyword is specified, the default for methods is that they are non-virtual. This means:
There are three virtual override keywords:
virtual - This marks a method as not present in bases of this class and
having an implementation in this class. That implementation may be
overridden in derived classes.abstract - This marks a method that must be overridden in a derived class
since it has no implementation in this class. This is short for "abstract
virtual" but is called
"pure virtual" in C++.
Only abstract classes may have unimplemented abstract methods.impl - This marks a method that overrides a method marked virtual or
abstract in the base class with an implementation specific to -- and
defined within -- this class. The method is still virtual and may be
overridden again in subsequent derived classes if this is a base class. See
method overriding in Wikipedia.
Requiring a keyword when overriding allows the compiler to diagnose when the
derived class accidentally uses the wrong signature or spelling and so
doesn't match the base class. We intentionally use the same keyword here as
for implementing interfaces, to emphasize that they are similar operations.| Keyword on method in C |
Allowed inabstract class C |
Allowed inbase class C |
Allowed in final class C |
in B whereC extends B |
in D whereD extends C |
|---|---|---|---|---|---|
virtual |
✅ | ✅ | ❌ | not present | abstractimplnot mentioned |
abstract |
✅ | ❌ | ❌ | not presentvirtualabstractimpl |
abstractimplmay not be mentioned if D is not final |
impl |
✅ | ✅ | ✅ | virtualabstractimpl |
abstractimpl |
A pointer to a base class, like MyBaseClass* is actually considered to be a
pointer to that type or any derived class, like MiddleDerived or
FinalDerived. This means that a FinalDerived* value may be implicitly cast
to type MiddleDerived* or MyBaseClass*.
This is accomplished by making the data layout of a type extending MyBaseClass
have MyBaseClass as a prefix. In addition, the first class in the inheritance
chain with a virtual method will include a virtual pointer, or vptr, pointing
to a virtual method table,
or vtable. Any calls to virtual methods will perform
dynamic dispatch by calling
the method using the function pointer in the vtable, to get the overridden
implementation from the most derived class that implements the method.
Since a final class may not be extended, the compiler can bypass the vtable and use static dispatch. In general, you can use a combination of an abstract base class and a final class instead of an extensible class if you need to distinguish between exactly a type and possibly a subtype.
base class Extensible { ... }
// Can be replaced by:
abstract class ExtensibleBase { ... }
class ExactlyExtensible {
extend base: ExtensibleBase;
...
}
Self refers to the current typeNote that Self in a class definition means "the current type being defined"
not "the type implementing this method." To implement a method in a derived
class that uses Self in the declaration in the base class, only the type of
self should change:
base class B1 {
virtual fn F[self: Self](x: Self) -> Self;
// Means exactly the same thing as:
// virtual fn F[self: B1](x: B1) -> B1;
}
class D1 {
extend base: B1;
// ❌ Illegal:
// impl fn F[self: Self](x: Self) -> Self;
// since that would mean the same thing as:
// impl fn F[self: Self](x: D1) -> D1;
// and `D1` is a different type than `B1`.
// ✅ Allowed: Parameter and return types
// of `F` match declaration in `B1`.
impl fn F[self: Self](x: B1) -> B1;
// Or: impl fn F[self: D1](x: B1) -> B1;
}
The exception is when there is a subtyping relationship such that it would be legal for a caller using the base classes signature to actually be calling the derived implementation, as in:
base class B2 {
virtual fn Clone[self: Self]() -> Self*;
// Means exactly the same thing as:
// virtual fn Clone[self: B2]() -> B2*;
}
class D2 {
extend base: B2;
// ✅ Allowed
impl fn Clone[self: Self]() -> Self*;
// Means the same thing as:
// impl fn Clone[self: D2]() -> D2*;
// which is allowed since `D2*` is a
// subtype of `B2*`.
}
Like for classes without inheritance, constructors for a derived class are
ordinary functions that return an instance of the derived class. Generally
constructor functions should return the constructed value without copying, as in
proposal
#257: Initialization of memory and variables.
This means either
creating the object in the return statement itself,
or in
a returned var declaration.
As before, instances can be created by casting a struct value into the class
type, this time with a .base member to initialize the members of the immediate
base type.
class MyDerivedType {
extend base: MyBaseType;
fn Make() -> MyDerivedType {
return {.base = MyBaseType.Make(), .derived_field = ...};
}
}
There are two cases that aren't well supported with this pattern:
While expected to be relatively rarely needed, we will address both of these concerns with a specialized type just used during construction of base classes, called the partial facet type for the class.
The partial facet for a base class type like MyBaseType is written
partial MyBaseType.
partial MyBaseClass and MyBaseClass have the same fields in the same
order with the same data layout. The only difference is that
partial MyBaseClass doesn't use (look into) its hidden vptr slot. To
reliably catch any bugs where virtual function calls occur in this state,
both fast and hardened release builds will initialize the hidden vptr slot
to a null pointer. Debug builds will initialize it to an alternate vtable
whose functions will abort the program with a clear diagnostic.partial MyBaseClass has the same data layout but only uses a subset,
there is a subtyping relationship between these types. A MyBaseClass value
is a partial MyBaseClass value, but not the other way around. So you can
cast MyBaseClass* to partial MyBaseClass*, but the other direction is
not safe.MyBaseClass may be instantiated, there is a conversion from
partial MyBaseClass to MyBaseClass. It changes the value by filling in
the hidden vptr slot. If MyBaseClass is abstract, then attempting that
conversion is an error.partial MyBaseClass is considered final, even if MyBaseClass is not.
This is despite the fact that from a data layout perspective,
partial MyDerivedClass will have partial MyBaseClass as a prefix if
MyDerivedClass extends MyBaseClass. The type partial MyBaseClass
specifically means "exactly this and no more." This means we don't need to
look at the hidden vptr slot, and we can instantiate it even if it doesn't
have a virtual destructor.partial may only be applied to a base class. For final
classes, there is no need for a second type.The general pattern is that base classes can define constructors returning the partial facet type.
base class MyBaseClass {
fn Make() -> partial Self {
return {.base_field_1 = ..., .base_field_2 = ...};
}
// ...
}
Extensible classes can be instantiated even from a partial facet value:
var mbc: MyBaseClass = MyBaseClass.Make();
The conversion from partial MyBaseClass to MyBaseClass only fills in the
vptr value and can be done in place. After the conversion, all public methods
may be called, including virtual methods.
The partial facet is required for abstract classes, since otherwise they may not be instantiated. Constructor functions for abstract classes should be marked protected so they may only be accessed in derived classes.
abstract class MyAbstractClass {
protected fn Make() -> partial Self {
return {.base_field_1 = ..., .base_field_2 = ...};
}
// ...
}
// ❌ Error: can't instantiate abstract class
var abc: MyAbstractClass = ...;
If a base class wants to store a pointer to itself somewhere in the constructor function, there are two choices:
An extensible class could use the plain type instead of the partial facet.
base class MyBaseClass {
fn Make() -> Self {
returned var result: Self = {...};
StoreMyPointerSomewhere(&result);
return var;
}
}
The other choice is to explicitly cast the type of its address. This pointer should not be used to call any virtual method until the object is finished being constructed, since the vptr will be null.
abstract class MyAbstractClass {
protected fn Make() -> partial Self {
returned var result: partial Self = {...};
// Careful! Pointer to object that isn't fully constructed!
StoreMyPointerSomewhere(&result as Self*);
return var;
}
}
The constructor for a derived class may construct values from a partial facet of the class' immediate base type or the full type:
abstract class MyAbstractClass {
protected fn Make() -> partial Self { ... }
}
// Base class returns a partial type
base class Derived {
extend base: MyAbstractClass;
protected fn Make() -> partial Self {
return {.base = MyAbstractClass.Make(), .derived_field = ...};
}
...
}
base class MyBaseClass {
fn Make() -> Self { ... }
}
// Base class returns a full type
base class ExtensibleDerived {
extend base: MyBaseClass;
fn Make() -> Self {
return {.base = MyBaseClass.Make(), .derived_field = ...};
}
...
}
And final classes will return a type that does not use the partial facet:
class FinalDerived {
extend base: MiddleDerived;
fn Make() -> Self {
return {.base = MiddleDerived.Make(), .derived_field = ...};
}
...
}
Observe that the vptr is only assigned twice in release builds if you use partial facets:
In the case that the base class can be instantiated, tooling could optionally
recommend that functions returning Self that are used to initialize a derived
class be changed to return partial Self instead. However, the consequences of
returning Self instead of partial Self when the value will be used to
initialize a derived class are fairly minor:
Since the assignment operator method should not be virtual, it is only safe to implement it for final types. However, following the maxim that Carbon should "focus on encouraging appropriate usage of features rather than restricting misuse", we allow users to also implement assignment on extensible classes, even though it can lead to slicing.
Every non-abstract type is destructible, meaning has a defined destructor
function called when the lifetime of a value of that type ends, such as when a
variable goes out of scope. The destructor for a class may be customized using
the destructor keyword:
class MyClass {
destructor [self: Self] { ... }
}
or:
class MyClass {
// Can modify `self` in the body.
destructor [addr self: Self*] { ... }
}
If a class has no destructor declaration, it gets the default destructor,
which is equivalent to destructor [self: Self] { }.
The destructor for a class is run before the destructors of its data members. The data members are destroyed in reverse order of declaration. Derived classes are destroyed before their base classes, so the order of operations is:
Destructors may be declared in class scope and then defined out-of-line:
class MyClass {
destructor [addr self: Self*];
}
destructor MyClass [addr self: Self*] { ... }
It is illegal to delete an instance of a derived class through a pointer to one
of its base classes unless it has a
virtual destructor.
An abstract or base class' destructor may be declared virtual using the
virtual introducer, in which case any derived class destructor declaration
must be impl:
base class MyBaseClass {
virtual destructor [addr self: Self*] { ... }
}
class MyDerivedClass {
extend base: MyBaseClass;
impl destructor [addr self: Self*] { ... }
}
The properties of a type, whether type is abstract, base, or final, and whether the destructor is virtual or non-virtual, determines which facet types it satisfies.
Concrete. This means you can create local and
member variables of this type. Concrete types have destructors that are
called when the local variable goes out of scope or the containing object of
the member variable is destroyed.Deletable. These
may be safely deleted through a pointer.Concrete, Deletable, or both are Destructible. These
are types that may be deleted through a pointer, but it might not be safe.
The concerning situation is when you have a pointer to a base class without
a virtual destructor. It is unsafe to delete that pointer when it is
actually pointing to a derived class.Note: The names Deletable and Destructible are
placeholders since they do not
conform to the decision on
question-for-leads issue #1058: "How should interfaces for core functionality be named?".
| Class | Destructor | Concrete |
Deletable |
Destructible |
|---|---|---|---|---|
| abstract | non-virtual | no | no | no |
| abstract | virtual | no | yes | yes |
| base | non-virtual | yes | no | yes |
| base | virtual | yes | yes | yes |
| final | any | yes | yes | yes |
The compiler automatically determines which of these
facet types a given type
satisfies. It is illegal to directly implement Concrete, Deletable, or
Destructible. For more about these constraints, see
"destructor constraints" in the detailed generics design.
A pointer to Deletable types may be passed to the Delete method of the
Allocator interface. To
deallocate a pointer to a base class without a virtual destructor, which may
only be done when it is not actually pointing to a value with a derived type,
call the UnsafeDelete method instead. Note that you may not call
UnsafeDelete on abstract types without virtual destructors, it requires
Destructible.
interface Allocator {
// ...
fn Delete[T:! Deletable, addr self: Self*](p: T*);
fn UnsafeDelete[T:! Destructible, addr self: Self*](p: T*);
}
To pass a pointer to a base class without a virtual destructor to a
checked-generic function expecting a Deletable type, use the
UnsafeAllowDelete
type adapter.
class UnsafeAllowDelete(T:! Concrete) {
extend adapt T;
impl as Deletable {}
}
// Example usage:
fn RequiresDeletable[T:! Deletable](p: T*);
var x: MyExtensible;
RequiresDeletable(&x as UnsafeAllowDelete(MyExtensible)*);
If a virtual method is transitively called from inside a destructor, the implementation from the current class is used, not any overrides from derived classes. It will abort the execution of the program if that method is abstract and not implemented in the current class.
Future work: Allow or require destructors to be declared as taking
partial Self in order to prove no use of virtual methods.
Types satisfy the
TrivialDestructor
facet type if:
{ },TrivialDestructor, andTrivialDestructor.For example, a struct type implements TrivialDestructor if
all its members do.
TrivialDestructor implies that their destructor does nothing, which may be
used to generate optimized specializations.
There is no provision for handling failure in a destructor. All operations that could potentially fail must be performed before the destructor is called. Unhandled failure during a destructor call will abort the program.
Future work: Allow or require destructors to be declared as taking
[var self: Self].
Alternatives considered:
By default, all members of a class are fully publicly accessible. Access can be restricted by adding a keyword, called an access modifier, prior to the declaration. Access modifiers are how Carbon supports encapsulation.
The access modifier is written before any virtual override keyword.
Rationale: Carbon makes members public by default for a few reasons:
Additionally, there is precedent for this approach in modern object-oriented languages such as Kotlin and Python, both of which are well regarded for their usability.
Keywords controlling visibility are attached to individual declarations instead of C++'s approach of labels controlling the visibility for all following declarations to reduce context sensitivity. This matches Rust, Swift, Java, C#, Kotlin, and D.
References: Proposal #561: Basic classes included the decision that members default to publicly accessible originally asked in issue #665.
As in C++, private means only accessible to members of the class and any
friends.
class Point {
fn Distance[self: Self]() -> f32;
// These are only accessible to members of `Point`.
private var x: f32;
private var y: f32;
}
A private virtual or private abstract method may be implemented in derived
classes, even though it may not be called. This allows derived classes to
customize the behavior of a function called by a method of the base class, while
still preventing the derived class from calling it. This matches the behavior of
C++ and is more orthogonal.
Future work: private will give the member internal linkage unless it needs
to be external because it is used in an inline method or template. We may in the
future
add a way to specify internal linkage explicitly.
Open questions: Using private to mean "restricted to this class" matches
C++. Other languages support restricting to different scopes:
Comparison to other languages: C++, Rust, and Swift all make class members
private by default. C++ offers the struct keyword that makes members public by
default.
Protected members may only be accessed by members of this class, members of derived classes, and any friends.
base class MyBaseClass {
protected fn HelperClassFunction(x: i32) -> i32;
protected fn HelperMethod[self: Self](x: i32) -> i32;
protected var data: i32;
}
class MyDerivedClass {
extend base: MyBaseClass;
fn UsesProtected[addr self: Self*]() {
// Can access protected members in derived class
var x: i32 = HelperClassFunction(3);
self->data = self->HelperMethod(x);
}
}
Classes may have a friend declaration:
class Buddy { ... }
class Pal {
private var x: i32;
friend Buddy;
}
This declares Buddy to be a friend of Pal, which means that Buddy can
access all members of this class, even the ones that are declared private or
protected.
The friend keyword is followed by the name of an existing function, type, or
parameterized family of types. Unlike C++, it won't act as a forward declaration
of that name. The name must be resolvable by the compiler, and so may not be a
member of a template.
Future work: There should be a convenient way of allowing tests in the same library as the class definition to access private members of the class. Ideally this could be done without changing the class definition itself, since it doesn't affect the class' public API.
A function may construct a class, by casting a struct value to the class type, if it has access to (write) all of its fields.
Future work: There should be a way to limit which code can construct a class even when it only has public fields. This will be resolved in question-for-leads issue #803.
Developers may define how standard Carbon operators, such as + and /, apply
to custom types by implementing the
interface that corresponds to that operator
for the types of the operands. See the
"operator overloading" section of
the generics design. The specific interface used for a
given operator may be found in the
expressions design.
This includes features that need to be designed, questions to answer, and a description of the provisional syntax in use until these decisions have been made.
We could allow you to write {x, y} as a short hand for {.x = x, .y = y}.
Structs are being considered as a possible mechanism for implementing optional named parameters. We have three main candidate approaches: allowing struct types to have field defaults, having dedicated support for destructuring struct values in pattern contexts, or having a dedicated optional named parameter syntax.
If struct types could have field defaults, you could write a function declaration with all of the optional parameters in an option struct:
fn SortIntVector(
v: Vector(i32)*,
options: {.stable: bool = false,
.descending: bool = false} = {}) {
// Code using `options.stable` and `options.descending`.
}
// Uses defaults of `.stable` and `.descending` equal to `false`.
SortIntVector(&v);
SortIntVector(&v, {});
// Sets `.stable` option to `true`.
SortIntVector(&v, {.stable = true});
// Sets `.descending` option to `true`.
SortIntVector(&v, {.descending = true});
// Sets both `.stable` and `.descending` options to `true`.
SortIntVector(&v, {.stable = true, .descending = true});
// Order can be different for arguments as well.
SortIntVector(&v, {.descending = true, .stable = true});
We might instead support destructuring struct patterns with defaults:
fn SortIntVector(
v: Vector(i32)*,
{stable: bool = false, descending: bool = false}) {
// Code using `stable` and `descending`.
}
This would allow the same syntax at the call site, but avoids some concerns with field defaults and allows some other use cases such as destructuring return values.
We might support destructuring directly:
var {key: String, value: i32} = ReturnKeyValue();
or by way of a mechanism that converts a struct into a tuple:
var (key: String, value: i32) =
ReturnKeyValue().extract(.key, .value);
// or maybe:
var (key: String, value: i32) =
ReturnKeyValue()[(.key, .value)];
Similarly we might support optional named parameters directly instead of by way of struct types.
Some discussion on this topic has occurred in:
We want four things so that Carbon's object-safe interfaces may interoperate with C++ abstract base classes without data members, matching the interface as base class use case:
AsBaseClass(MyInterface).AsInterface(MyIBC).DynPtr
of the corresponding interface.DynPtr(MyInterface) values to a proxy type that extends
the corresponding base class AsBaseType(MyInterface).Note that the proxy type extending AsBaseType(MyInterface) would be a
different type than DynPtr(MyInterface) since the receiver input to the
function members of the vtable for the former does not match those in the
witness table for the latter.
We allow a derived class to define a class function with the
same name as a class function in the base class. For example, we expect it to be
pretty common to have a constructor function named Create at all levels of the
type hierarchy.
Beyond that, we may want some rules or restrictions about defining methods in a derived class with the same name as a base class method without overriding it. There are some opportunities to improve on and simplify the C++ story:
References: This was discussed in the open discussion on 2021-07-12.
This design directly supports Carbon classes inheriting from a single C++ class.
class CarbonClass {
extend base: Cpp.CPlusPlusClass;
fn Make() -> Self {
return {.base = Cpp.CPlusPlusClass(...), .other_fields = ...};
}
...
}
To allow C++ classes to extend Carbon classes, there needs to be some way for C++ constructors to initialize their base class:
We could explicitly call the Carbon factory function, as in:
// `Base` is a Carbon class which gets converted to a
// C++ class for interop purposes:
class Base {
public:
virtual ~Base() {}
static auto Make() -> Base;
};
// In C++
class Derived : public Base {
public:
virtual ~Derived() override {}
// This isn't currently a case where C++ guarantees no copy,
// and so it currently still requires a notional copy and
// there appear to be implementation challenges with
// removing them. This may require an extension to make work
// reliably without an extraneous copy of the base subobject.
Derived() : Base(Base::Make()) {}
};
However, this doesn't work in the case where Base can't be instantiated,
or Base does not have a copy constructor, even though it shouldn't be
called due to RVO.
TODO: Ask zygoloid to fill this in.
Carbon won't support declaring virtual base classes, and the C++ interop use cases Carbon needs to support are limited. This will allow us to simplify the C++ interop by allowing Carbon to delegate initialization of virtual base classes to the C++ side.
This requires that we enforce two rules:
We will need some way to declare mixins. This syntax will need a way to distinguish defining versus requiring member variables. Methods may additionally be given a default definition but may be overridden. Interface implementations may only be partially provided by a mixin. Mixin methods will need to be able to convert between pointers to the mixin type and the main type.
Open questions include whether a mixin is its own type that is a member of the containing type, and whether mixins are templated on the containing type. Mixins also complicate how constructors work.
Carbon will need some way for users to specify the memory layout of class types beyond simple ordering of fields, such as controlling the packing and alignment for the whole type or individual members.
We may allow members of a derived class like to put data members in the final padding of its base class prefix. Tail-padding reuse has both advantages and disadvantages, so we may have some way for a class to explicitly mark that its tail padding is available for use by a derived class,
Advantages:
Expr by reusing tail padding).Disadvantages:
memcpy(p, q, sizeof(Base)) to copy around base class subobjects
if the destination is an in-lifetime, because they might overlap other
objects' representations.However, we can still use memcpy and memset to initialize a base class
subobject, even if its tail padding might be reused, so long as we guarantee
that no other object lives in the tail padding and is initialized before the
base class. In C++, that happens only due to virtual base classes getting
initialized early and laid out at the end of the object; if we disallow virtual
base classes then we can guarantee that initialization order is address order,
removing most of the downside of tail-padding reuse.
static variablesAt the moment, there is no proposal to support
static member variables,
in line with avoiding global variables more generally. Carbon may need some
support in this area, though, for parity with and migration from C++.
Carbon might want to support members of a type that are accessed like a data member but return a computed value like a function. This has a number of implications:
However, there are likely to be differences between computed properties and
other data members, such as the ability to take the address of them. We might
want to support "read only" data members, that can be read through the public
API but only modified with private access, for data members which may need to
evolve into a computed property. There are also questions regarding how to
support assigning or modifying computed properties, such as using +=.
We should define a way for defining implementations of interfaces for struct types. To satisfy coherence, these implementations would have to be defined in the library with the interface definition. The syntax might look like:
interface ConstructWidgetFrom {
fn Construct(Self) -> Widget;
}
impl {.kind: WidgetKind, .size: i32}
as ConstructWidgetFrom { ... }
In addition, we should define a way for interfaces to define templated blanket
implementations for data classes more generally. These
implementations will typically subject to the criteria that all the data fields
of the type must implement the interface. An example use case would be to say
that a data class is serializable if all of its fields were. For this we will
need a facet type for capturing that criteria, maybe something like
DataFieldsImplement(MyInterface). The templated implementation will need some
way of iterating through the fields so it can perform operations fieldwise. This
feature should also implement the interfaces for any tuples whose fields satisfy
the criteria.
It is an open question how to define implementations for binary operators. For
example, if i32 is comparable to f64, then {.x = 3, .y = 2.72} should be
comparable to {.x = 3.14, .y = 2}. The trick is how to declare the criteria
that "T is comparable to U if they have the same field names in the same
order, and for every field x, the type of T.x implements ComparableTo for
the type of U.x."
#257: Initialization of memory and variables
#561: Basic classes: use cases, struct literals, struct types, and future work
#875: Principle: Information accumulation
Self and .Selfclass and interface syntax