This document attempts to clarify our goals for the design of the generics feature for Carbon. While these are not strict requirements, they represent the yardstick by which we evaluate design decisions. We do expect to achieve most of these goals, though some of these goals are somewhat aspirational or forward-looking.
Carbon will support both checked and template generics to support generic programming by way of compile-time parameterization of language constructs.
Carbon's checked generics will feature early type checking and complete definition checking.
Carbon's template generics, in contrast, will more closely follow the compile-time duck typing approach of C++ templates.
Generic functions and generic types will all take some compile-time parameters, which will frequently be types, and in some cases will be deduced from the types of the values of explicit parameters.
If a compile-time parameter is a type, the generic function's signature can
specify constraints that the caller's type must satisfy. For example, a
resizable array type (like C++'s std::vector) might have a compile-time type
parameter with the constraint that the type must be movable and have a static
size. A sort function might apply to any array whose elements are comparable and
movable.
A constraint might involve multiple compile-time parameters. For example, a merge function might apply to two arbitrary containers so long as their elements have the same type.
We need some way to express the constraints on a compile-time type parameter. In Carbon we express these "type constraints" by saying we restrict to types that implement specific interfaces. Interfaces describe an API a type could implement; for example, it might specify a set of functions, including names and signatures. A type implementing an interface may be passed as a compile-time type argument to a function that has that interface as a requirement of its compile-time type parameter. Then, the functions defined in the interface may be called in the body of the function. Further, interfaces have names that allow them to be reused.
Similar compile-time and run-time constructs may be found in other programming languages:
In addition to specifying the methods available on a type, we may in the future expand the role of interfaces to allow other type constraints, such as on size, prefix of the data layout, specified method implementations, tests that must pass, etc. This might be part of making interfaces as expressive as classes, as part of a strategy to migrate to a future version of Carbon that uses interfaces instead of, rather than in addition to, standard inheritance-and-classes object-oriented language support. For the moment, everything beyond specifying the methods available is out of scope.
The entire idea of statically typed languages is that coding against specific types and interfaces is a better model and experience. Unfortunately, templates don't provide many of those benefits to programmers until it's too late, when users are consuming the API. Templates also come with high overhead, such as template error messages.
We want Carbon code to move towards more rigorously type-checked constructs. However, existing C++ code is full of unrestricted usage of compile-time duck-typed templates. They are incredibly convenient to write and so likely will continue to exist for a long time.
Carbon will have direct support for templates in addition to checked generics. Carbon's template system will be similar to C++ templates with some specific changes:
Other aspects of Carbon will reduce the number of situations where errors will only be detected after the template is instantiated:
Our goal for checked generics support in Carbon is to get most of the expressive benefits of C++ templates and open overloading with fewer downsides. Additionally, we want to support some dynamic dispatch use cases; for example, in cases that inheritance struggles with.
To clarify the expressive range we are aiming for, here are some specific use cases we expect Carbon checked generics to cover.
We in particular want to support generic programming, including:
std::complex<T>std::chrono APIsThese would generally involve static, compile-time type arguments, and so would generally be used with static dispatch.
Interfaces in C++ are often represented by abstract base classes. Checked generics should offer an alternative that does not rely on inheritance. This means looser coupling and none of the problems of multiple inheritance. Some people, such as Sean Parent, advocate for runtime polymorphism patterns in C++ that avoid inheritance because it can cause runtime performance, correctness, and code maintenance problems in some situations. Those patterns require a lot of boilerplate and complexity in C++. It would be nice if those patterns were simpler to express with Carbon checked generics. More generally, Carbon checked generics will provide an alternative for those situations inheritance doesn't handle as well. As a specific example, we would like Carbon checked generics to supplant the need to support multiple inheritance in Carbon.
This is a case that would use dynamic dispatch.
Types which only support subclassing for test stubs and mocks, as in "dependency injection", should be able to easily migrate to checked generics. This extends outside the realm of testing, allowing general configuration of how dependencies can be satisfied. For example, checked generics might be used to configure how a library writes logs.
This would allow you to avoid the runtime overhead of virtual functions, using static dispatch without the poor build experience of templates.
One name lookup problem we would like to avoid is caused by open overloading. Overloading is where you provide multiple implementations of a function with the same name, and the implementation used in a specific context is determined by the argument types. Open overloading is overloading where the overload set is not restricted to a single file or library. This works with Argument-dependent lookup, or ADL, a mechanism for enabling open overloading without having to reopen the namespace where the function was originally defined. Together these enable C++ customization points.
This is commonly used to provide a type-specific implementation of some operation, but doesn't provide any enforcement of consistency across the different overloads. It makes the meaning of code dependent on which overloads are imported, and is at odds with being able to type check a function's definition before instantiation.
Our goal is to address this use case, known more generally as the expression problem, with a checked-generics mechanism that does enforce consistency so that type checking is possible without seeing all implementations. This will be Carbon's replacement for open overloading. This is encapsulated in Carbon's "one static open extension mechanism" principle. As a consequence, Carbon checked generics will need to be able to support operator overloading.
A specific example is the absolute value function Abs. We would like to write
Abs(x) for a variety of types. For some types T, such as Int32 or
Float64, the return type will be the same T. For other types, such as
Complex64 or Quaternion, the return type will be different. Checked generic
functions that call Abs will need a way to specify whether they only operate
on T such that Abs has signature T -> T.
This does create an issue when interoperating with C++ code using open overloading, which will need to be addressed.
For any real-world C++ template, there shall be an idiomatic reformulation in Carbon checked generics that has equal or better performance. Performance is the top priority for Carbon, and we expect to use checked generics pervasively, and so they can't compromise that goal in release builds.
Nice to have: There are cases where we should aim to do better than C++ templates. For example, the additional structure of checked generics should make it easier to reduce generated code duplication, reducing code size and cache misses.
Compared to C++ templates, we expect to reduce build times, particularly in development builds. We also expect the compiler to be able to report clearer errors, and report them earlier in the build process.
One source of improvement is that the bodies of checked generic functions and types can be type checked once when they are defined, instead of every time they are used. This is both a reduction in the total work done, and how errors can be reported earlier. On use, the errors can be a lot clearer since they will be of the form "argument did not satisfy function's contract as stated in its signature" instead of "substitution failed at this line of the function's implementation."
Nice to have: In development builds, we will have the option of using dynamic dispatch to reduce build times. We may also be able to reduce the amount of redundant compilation work even with the static strategy by identifying instantiations with the same arguments or identical implementations and only generating code for them once.
With a template, the implementation is part of the interface and types are only checked when the function is called and the template is instantiated.
A checked-generic function is type checked when it is defined, and type checking can't use any information that is only known when the function is instantiated such as the exact argument types. Furthermore, calls to a checked-generic function may be type checked using only its declaration, not its body.
A general property of checked generics is they are more predictable than templates. They make clear when a type satisfies the requirements of a function; they have a documented contract. Further, that contract is enforced by the compiler, not sensitive to implementation details in the function body. This eases evolution by reducing (but not eliminating) the impact of Hyrum's law.
Nice to have: We also want well-defined boundaries between what is legal and not. This is "will my code be accepted by the compiler" predictability. We would prefer to avoid algorithms in the compiler with the form "run for up to N steps and report an error if it isn't resolved by then." For example, C++ compilers will typically have a template recursion limit. With checked generics, these problems arise due to trying to reason whether something is legal in all possible instantiations, rather than with specific, concrete types.
Some of this is likely unavoidable or too costly to avoid, as most existing checked generics systems have undecidable aspects to their type system, including Rust and Swift (1, 2, 3, 4). We fully expect there to be metaprogramming facilities in Carbon that will be able to execute arbitrary Turing machines, with infinite loops and undecidable stopping criteria. We don't see this as a problem though, just like we don't worry about trying to make the compiler reliably prevent you from writing programs that don't terminate.
We would like to distinguish "the executed steps are present in the program's source" from "the compiler has to search for a proof that the code is legal." In the former case, the compiler can surface a problem to the user by pointing to lines of code in a trace of execution. The user could employ traditional debugging techniques to refine their understanding until they can determine a fix. What we want to avoid is the latter case, since it has bad properties:
If we can't find acceptable restrictions to make problems efficiently decidable, the next best solution is to require the proof to be in the source instead of derived by the compiler. If authoring the proof is too painful for the user, the we should invest in putting the proof search into IDEs or other tooling.
Enable simple user control of whether to use dynamic or static dispatch.
Implementation strategy: There are two strategies for generating code for checked-generic functions:
By default, we expect the implementation strategy to be controlled by the compiler, and not semantically visible to the user. For example, the compiler might use the static strategy for release builds and the dynamic strategy for development. Or it might choose between them on a more granular level based on code analysis, specific features used in the code, or profiling -- maybe some specific specializations are needed for performance, but others would just be code bloat.
We require that all checked generic functions can be compiled using the static specialization strategy. For example, the values for checked parameters must be statically known at the callsite. Other limitations are listed below.
Nice to have: It is desirable that the majority of functions with checked parameters also support the dynamic strategy. Specific features may prevent the compiler from using the dynamic strategy, but they should ideally be relatively rare, and easy to identify. Language features should avoid making it observable whether function code generated once or many times. For example, you should not be able to take the address of a function with checked parameters, or determine if a function was instantiated more than once using function-local static variables.
There are a few obstacles to supporting dynamic dispatch efficiently, which may limit the extent it is used automatically by implementations. For example, the following features would benefit substantially from guaranteed monomorphization:
bool into the lower
bits of a pointer, or packing bit-fields with checked-parameter widths.While it is possible to address these with dynamic dispatch, handling some of them might have far-reaching and surprising performance implications. We don't want to compromise our goal for predictable performance.
We will allow the user to explicitly opt-in to using the dynamic strategy in specific cases. This could be just to control binary size in cases the user knows are not performance sensitive, or it could be to get the additional capability of operating on values with dynamic types. We may need to restrict this in various ways to maintain efficiency, like Rust does with object-safe traits.
We also anticipate that the user may want to force the compiler to use the static strategy in specific cases. This might be to keep runtime performance acceptable even when running a development or debug build.
We want there to be a natural, incremental upgrade path from templated code to checked-generic code. The first step of migrating C++ template code would be to first convert it to a Carbon template. The problem is then how to convert templates to checked generics within Carbon. This gives us these sub-goals:
Replacing a regular, non-parameterized function with a checked-generic function should not affect existing callers of the function. There may be some differences, such as when taking the address of the function, but ordinary calls should not see any difference. In particular, the return type of a checked-generic function should match, without any type erasure or additional named members.
We want the generics system to have the coherence property, so that the implementation of an interface for a type is well defined. Since a checked-generic function only depends on interface implementations, they will always behave consistently on a given type, independent of context. For more on this, see this description of what coherence is and why Rust enforces it.
Coherence greatly simplifies the language design, since it reduces the need for complicated rules to picking an implementation when there are many candidates. It also has a number of benefits for users:
The main downside of coherence is that there are some capabilities we would like for interfaces that are in tension with having an orphan rule limiting where implementations may be defined. For example, we would like to address the expression problem. We can get some of the way there by allowing the implementation of an interface for a type to be defined with either the interface or the type. But some use cases remain:
We should have some mechanism for addressing these use cases. There are multiple approaches that could work:
Alternatives to coherence are discussed in an appendix.
We want to avoid adding rules for name lookup that are specific to generics. This is in contrast to Rust which has different lookup rules inside its traits. Instead, we should structure generics in a way that reuses existing name lookup facilities of the language.
Nice to have: One application of this that would be nice to have is if the
names of a type's members were all determined by a type's definition. So if x
has type T, then if you write x.y you should be able to look up y in the
definition of T. This might need to be somewhat indirect in some cases. For
example, if T inherits from U, the name y might come from U and not be
mentioned in the definition of T directly. We may have similar mechanisms
where T gets methods that have default implementations in interfaces it
implements, as long as the names of those interfaces are explicitly mentioned in
the definition of T.
Many languages have implemented checked-generics systems, and we should learn from those experiences. We should copy what works and makes sense in the context of Carbon, and change decisions that led to undesirable compromises. We are taking the strongest guidance from Rust and Swift, which have similar goals and significant experience with the implementation and usability of checked generics. They both use nominal interfaces, were designed with checked generics from the start, and produce native code. Contrast with Go which uses structural interfaces, or Java which targets a virtual machine that predated its generics feature.
For example, Rust has found that supporting defaults for interface methods is a valuable feature. It is useful for evolution, implementation reuse, and for bridging the gap between the minimal functionality a type wants to implement and the rich API that users want to consume (example).
We still have the flexibility to make simplifications that Rust cannot because
they need to maintain compatibility. We could remove the concept of
fundamental and explicit control over which methods may be specialized. These
are complicated and
impose coherence restrictions.
Interfaces can either be structural, as in Go, or nominal, as in Rust and Swift. Structural interfaces match any type that has the required methods, whereas nominal interfaces only match if there is an explicit declaration stating that the interface is implemented for that specific type. Carbon will support nominal interfaces, allowing them to designate semantics beyond the basic structure of the methods.
This means that interfaces implicitly specify the intended semantics and
invariants of and between those functions. Unlike the function signatures, this
contract is between the implementers and the consumers of interfaces and is not
enforced by Carbon itself. For example, a Draw method would mean different
things when it is part of a GameResult interface versus an Image2D
interface, even if those methods happen to have the same signature.
Evolution is a high priority for Carbon, and so will need mechanisms to support evolution when using checked generics. New additions to an interface might:
Experience with C++ concepts has shown that interfaces are hard to evolve without these kinds of supporting language mechanisms. Otherwise changes to interfaces need to made simultaneously with updates to types that implement the interface or functions that consume it.
Another way of supporting evolution is to allow one interface to be
substitutable for another. For example, a feature that lets you use an
implementation of Interface1 for a type to automatically get an implementation
of Interface2, as well as the other way around, would help transitioning
between those two interfaces.
Evolution in particular means that the set of names in an interface can change, and so two interfaces that don't start with name conflicts can develop them.
To handle name conflicts, interfaces should be separate, isolated namespaces. We should provide mechanisms to allow one type to implement two interfaces that accidentally use the same name for different things, and for functions to use interfaces with name conflicts together on a single type. Contrast this with Swift, where a type can only supply one associated type of a given name even when implementing multiple protocols. Similarly a function in Swift with a given name and signature can only have a single implementation for a type.
Note this is possible since interfaces are nominal. The place where types specify that they implement an interface is also the vehicle for unambiguously designating which function implementation goes with what interface.
There will need to be some bridge for C++ extension points that currently rely
on open overloading or
ADL. For
example, we need some way for C++
customization points
like swap to work on Carbon types. We might define CPlusPlus.ADL.swap as a
Carbon interface to be that bridge. Carbon types could implement that interface
to work from C++, and Carbon functions could use that interface to invoke swap
on C++ types.
Similarly, we will want some way to implement Carbon interfaces for C++ types.
For example, we might have a template implementation of an AddWith interface
for any C++ type that implements operator+.
What are we not doing with checked generics, particularly things that some other languages do?
Checked generics don't need to provide full flexibility of C++ templates:
C++ compilers must defer full type checking of templates until they are instantiated by the user. Carbon will not defer type checking of checked-generic definitions.
We want all generic Carbon code to support static dispatch. This means we won't support unbounded type families. Unbounded type families are when recursion creates an infinite collection of types, such as in this example from Swift or:
fn Sort[T:! Ordered](list: List(T)) -> List(T) {
if (list.size() == 1) return list;
var chunks: List(List(T)) = FormChunks(list, sqrt(list.size()));
chunks = chunks.ApplyToEach(Sort);
chunks = Sort(chunks);
return MergeSortedListOfSortedLists(chunks);
}
This, given an implementation of Ordered for any list with elements that are
themselves Ordered, would recursively call itself to produce a set of types
without bound. That is, calling Sort on a List(Int) would internally call
Sort on a List(List(Int)) and so on recursively without any static limit.
We won't require all checked-generic Carbon code to support dynamic dispatch, but we would like it to be an implementation option for the compiler in the majority of cases.
Lastly, runtime specialization is out of scope as an implementation strategy. That is, some language runtimes JIT a specialization when it is first needed, but it is not a goal for Carbon to support such an implementation strategy.
Proposals: