This proposal replaces the termination algorithm for impl selection. The
previous algorithm relied on a recursion limit, which is counter to
our goal for predictability.
The replacement is to terminate if any impl lookup performed while considering
an impl declaration depends transitively on the same impl declaration with a
"strict superset" of the types in the query.
Consider this impl declaration:
interface I;
impl forall [T:! type where Optional(.Self) is I] T as I;
A type like i32 is a valid value of T, and so implements I, if
Optional(i32) implements I. This impl declaration could also possibly be
used to give an implementation of Optional(i32) for interface I, but only if
there was an implementation of Optional(Optional(i32)) for interface I. The
job of the termination rule is to report an error instead of being caught in an
infinite loop in this situation.
Ideally, a termination rule would identify the loop in a minimal way. This has a few benefits, including reducing compile times and making error messages as short and understandable as possible. One downside of the original recursion limit rule is its tendency to only detect a problem after the loop had been repeated many times. This problem is worse if the recursion limit is large.
Another concern with using a recursion limit is that refactorings that would otherwise be legal can increase the depth of recursion, causing spurious failures. The workaround for this and other spurious failures is to increase the recursion limit. This makes the other problems with using a recursion limit worse.
Note that determining whether a particular set of impl declarations terminates
is
equivalent to the halting problem
(content warning: contains many instances of an obscene word as part of a
programming language name), and so is undecidable in general. So any termination
rule will have some false positives or spurious failures, where it reports an
error even though it would in fact complete if allowed to continue running. We
would like a criteria that correctly classifies the examples that arise in
practice.
The first termination rule was introduced in proposal #920: Generic parameterized impls (details 5), following Rust and C++. The problems with using a recursion limit were recognized at the time that proposal was written, but no alternative was known.
Alternatives termination rules have since been discussed:
PR #2602: Implement the termination algorithm for impl selection described in #2458 implements the termination rule of this proposal in Explorer.
We replace the termination criteria with a rule that the types in the impl
query must never get strictly more complicated when considering the same impl
declaration again. The way we measure the complexity of a set of types is by
counting how many of each base type appears. A base type is the name of a type
without its parameters. For example, the base types in this query
Pair(Optional(i32), bool) impls AddWith(Optional(i32)) are:
PairOptional twicei32 twiceboolAddWithA query is strictly more complicated if at least one count increases, and no
count decreases. So Optional(Optional(i32)) is strictly more complicated than
Optional(i32) but not strictly more complicated than Optional(bool).
This rule, when combined with
the acyclic rule that a query
can't repeat exactly, guarantees termination. This rule
is expected to identify a problematic sequence of impl declaration
instantiations in a way that is easier for the user to understand. Consider the
example from before,
interface I;
impl forall [T:! type where Optional(.Self) is I] T as I;
This impl declaration matches the query i32 impls I as long as
Optional(i32) impls I. That is a strictly more complicated query, though,
since it contains all the base types of the starting query (i32 and I), plus
one more (Optional). As a result, an error can be given after one step, rather
than after hitting a large recursion limit. And that error can state explicitly
what went wrong: we went from a query with no Optional to one with one,
without anything else decreasing.
Note this only triggers a failure when the same impl declaration is considered
with the strictly more complicated query. For example, if the declaration is not
considered since there is a more specialized impl declaration that is
preferred by the
type-structure overlap rule, as
in:
impl forall [T:! type where Optional(.Self) is I] T as I;
impl Optional(bool) as I;
// OK, because we never consider the first `impl`
// declaration when looking for `Optional(bool) impls I`.
let U:! I = bool;
// Error: cycle with `i32 impls I` depending on
// `Optional(i32) impls I`, using the same `impl`
// declaration, as before.
let V:! I = i32;
The rule is also robust in the face of refactoring:
impl declaration is
parameterized, only on the query.For non-type arguments we have to expand beyond base types to consider other kinds of keys. These other keys are in a separate namespace from base types.
2 and -3 are used as arguments to parameters with type i32, then the
i32 key will have count 5.Optional(i32) value of .Some(7) is
recorded as keys .Some (with a count of 1) and i32 (with a count of
7).V
that takes any number of i32 arguments, with an infinite set of distinct
instantiations: V(0), V(0, 0), V(0, 0, 0), ...
tuple key in this namespace is used to track the total number of
components of tuple values. The values of those elements will be tracked
using their own keys.Non-type argument values not covered by these cases are deleted from the query entirely for purposes of the termination algorithm. This requires that two queries that only differ by non-type arguments are considered identical and therefore are rejected by the acyclic rule. Otherwise, we could construct an infinite family of non-type argument values that could be used to avoid termination.
Let's call a (finite or infinite) sequence of type expressions "good" if no later element is strictly more complex than an earlier element, and no type expression is repeated. We would like to prove that any good sequence of type expressions with a finite set of keys is finite.
We can restrict to good sequences that don't repeat any multiset of keys, since there are only a finite number of types with a given multiset of keys. Proof: If none of the types have a variadic parameter list, then there is at most one type for every distinct permutation of base types. If some types are variadic, then we can get a conservative finite upper bound by multiplying the number of distinct permutations by the number of different possible arity combinations. The number of arity combinations is finite since, ignoring non-type arguments, the total arity must equal the number of base types in the type minus 1.
The proof of termination is by induction on the number N of distinct keys.
N == 1, then types map to a multiset of a single key, which can be
represented by the count of the number of times that key appears. That
number must be non-negative and decreasing in the sequence, and so the
length of the sequence is bounded by the value of the first element. So good
sequences with N == 1 must be finite.N distinct keys must be finite, consider
a good sequence with N+1 distinct keys. Its first element will be
represented by a non-negative integer (N+1)-tuple, (i_0, i_1, ..., i_N).
Every element after that will be in at least one of the
i_0 + i_1 + ... + i_N hyperplanes (co-dimension 1) given by these
equations:
x_0 = 0, x_0 = 1, ..., x_0 = i_0 - 1 (i_0 different equations,
each defining a separate hyperplane)x_1 = 0, x_1 = 1, ..., x_1 = i_1 - 1x_N = 0, x_N = 1, ..., x_N = i_N - 1N+1 distinct keys is finite, completing
the induction.This bound given by this construction is not completely tight, since there is overlap between the hyperplanes. It is tight once that overlap is taken into account, though. We can construct sequences that reach the upper bound by visiting the points in the union of the hyperplanes in descending order of their L1-norm (sum of the components).
Note: The text of this argument was derived from comments on issue #2458: Infinite recursion during impl selection.
This proposal advances these Carbon goals:
We
considered
a rule which would ensure termination by forbidding the depth of the type tree
in the query from increasing. This depth could either be measured in the query
or in the values of types used to parameterize impl declaration. Either way,
this raises a concern that otherwise safe refactorings might trigger spurious
termination errors. Specifically, refactorings that replace a type, like
String, with an alias to a parameterized type, like BasicString(Char8),
could change the tree depths in impl declarations in files that were not part
of the refactoring.
impl declaration separatelyInstead of measuring the complexity of the impl query as a whole, we
considered measuring the complexity of the argument values of parameters in an
individual impl declaration. The advantage of this would be fewer spurious
failures due to the termination rule.
We decided against it because it is a more complex rule and it is sensitive to
the specifics of how impl declarations are parameterized. This raises concerns
about refactorings introducing termination rule failures. We did not want to
incorporate this change without evidence that those spurious failures would be a
problem in practice.
Instead of measuring the complexity of the entire impl query together, we
could consider keys in the type and interface parts of the query to be in
distinct namespaces. This would reduce the spurious failures due to the
termination rule, but not as much as the previous alternative. It avoids the
problem of the previous alternative, since it is not sensitive to the specifics
of how impl declarations are parameterized.
We are not choosing this alternative now since it is a more complicated rule to explain. But we would consider this alternative in the future if we find it beneficial in practice to support the additional cases this rule permits.
We considered a rule that would forbid repeating the multiset of types. This
would simplify the termination argument. However, we thought it important to
support impl declarations that effectively shuffled the terms around into some
canonical form, as in this example:
impl forall [T:! I(Optional(.Self))] Optional(T) as I(T);
Here, Optional(bool) impls I(bool) if bool impls I(Optional(bool)). This
rule can only be applied a finite number of times, and is something we imagine
might arise naturally, so it seemed good to support.
We considered different handling for non-type argument values that did not have an integral or choice type. The alternative rule required the value to stay constant. This led to a number of edge cases to consider, like how to identify the same argument when the type constructor may be called a different number of times. The rule we chose to use instead has the advantages of being simpler and also accepting more cases. With the current rule, the value of those arguments may change freely, they just don't create different type expressions for purposes of detecting termination.