For generics, users will define interfaces that describe types. These interfaces may have associated types (see Swift, Rust):
interface Container {
let Elt:! Type;
let Slice:! Container;
let Subseq:! Container;
}
The next step is to express constraints between these associated types.
Container.Slice.Elt == Container.EltContainer.Slice.Slice == Container.SliceContainer.Subseq.Elt == Container.EltContainer.Subseq.Slice == Container.SliceContainer.Subseq.Subseq == Container.SubseqLanguages commonly express these constraints using where clauses, as in
Swift and
Rust, that
directly represent these constraints as equations.
interface Container {
let Elt:! Type;
let Slice:! Container where .Elt == Elt
and .Slice == Slice;
let Subseq:! Container where .Elt == Elt
and .Slice == Slice
and .Subseq == Subseq;
}
These relations give us a semi-group, where in general deciding whether two sequences are equivalent is undecidable, see Swift type checking is undecidable - Discussion - Swift Forums.
If you allow the full undecidable set of where clauses, there are some
unpleasant options:
Furthermore, the algorithms are expensive and perform extensive searches.
The goal is to identify some restrictions such that:
The expected cases are things of these forms:
X == YX == Y.ZX == X.YX == Y.XFor ease of exposition, I'm going to assume that all associated types are
represented by single letters, and omit the dots . between them.
Observe that the equivalence class of words you can rewrite to in these expected cases are described by a regular expression:
X can be rewritten to (X|Y) using the rewrite X == YX can be rewritten to XY* using the rewrite X == XYX can be rewritten to Y*X using the rewrite X == YXThese regular expressions can be used instead of the rewrite rules: anything
matching the regular expression can be rewritten to anything else matched by the
same regular expression. This means we can take any sequence of letters, replace
anything that matches the regular expression by the regular expression itself,
and get a regular expression that matches the original sequence of letters. We
can even get a shortest/smallest representative of the class by dropping all the
things with a * and picking the shortest/smallest between alternatives
separated by a |.
Given a set of rewrite equations, is there a terminating algorithm that either:
We would be willing to reject cases where there are more equivalence classes
than rewrite rules. An example of this would be the rewrite rules A == XY and
B == YZ have equivalence classes A|XY, B|YZ, XYZ|AZ|XB.
In particular, we've identified two operations for combining two finite automata depending on how they overlap:
X and Y, where Y
completely matches a subsequence matched by X, then we need to extend X
to include all the rewrites offered by Y. For example, if X is AB* and
Y is (B|C), then we can extend X to A(B|C)*.X and Y, and there is a string
ABC such that X matches the prefix AB and Y matches the suffix BC,
then we need to make sure there is a rule for matching ABC.Even though it is straightforward to translate a single rewrite rule into a regular expression that matches at least some portion of the equivalence class, to match the entire equivalence classes we need to apply these two operations in ways that can run into difficulties.
One concern is that an individual regular expression might not cover the entire
equivalence for a rewrite rule. We may need to apply the union and containment
operations to combine the automata with itself. For example, we may convert
the rewrite rule AB == BA into the regular expression AB|BA, but this
doesn't completely describe the set of equivalence classes that this rule
creates, since it triggers the union operation twice to add AAB|ABA|BAA and
ABB|BAB|BBA. These would also trigger the union operation, and so on
indefinitely, so the rewrite rule AB == BA does not have a finite set of
regular equivalence classes. Similarly the rule A == BCB has an infinite
family of equivalence classes: BCB|A, BCBCB|ACB|BCA,
BCBCBCB|ACBCB|ACA|BCACB|BCBCA, ...
The other concern is that even in the case that each rule by itself can be faithfully translated into a regular expression that captures the equivalence classes of the rewrite, attempts to combine multiple rules using the two operations might never reach a fixed point.
In both cases, we are looking for criteria that we can use to decide if the application of the operations will terminate. If it would not terminate, then we need to report an error to the user instead of applying those two operations endlessly.
A == B => A|BA == BC => A|BCA == BBC => A|BBCA == BCC => A|BCCA == AB => AB*A == BA => B*AA == AA => AA* or A*A or A+A == ABC => A(BC)*A == ABB => A(BB)*A == AAA => A(AA)*A == BCA => (BC)*AA == BBA => (BB)*AA == ACA => A(CA)* or (AC)*AA==AB + A==AC => AB* + AC* => A(B|C)*A==AB + B==BC => AB* + BC* => A(B|C)*, BC* with relation
A(B|C)* BC* == A(B|C)*; Note: A == AB == ABC == ACA==AB + C==CB => AB*, CB*, no relationsA==AB + A==CA => AB* + C*A => C*AB*A==AB + C==BC => AB* + B*C => AB*, B*C, where if you have AB*C
it doesn't matter how you split the Bs between AB* and B*C.A==AB + B==AB => AB* + A*B => (A|B)+A==AB + B==CB => AB* + C*B => A(C*B)*, C*B with A(C*B)* C*B ->
A(C*B)*A==AB + B==CB + C==CD => AB* + C*B + CD* => A((CD*)*B)*,
(CD*)*B, CD*A==ABC + D==CB => A(BC)* + (CB|D) => A(BD*C)*, A(BD*C)*BD*,
(CB|D)A==ABBBBB + A==ABBBBBBBB (or A==AB^5 + A==AB^8)=> A(BBBBB)* +
A(BBBBBBBB)* => AB*, since A=AB^8=AB^16=AB^11=AB^6=ABAB == BA, has equivalence classes: AB|BA, AAB|ABA|BAA, ABB|BAB|BBA,
...AB==C + C==BAABC == CBA, has equivalence classes: ABC|CBA, ABABC|ABCBA|CBABA,
ABCBC|CBABC|CBCBA, ...A == BAB, involves back references or counting the number of Bs:
A|BAB|BBABB|BBBABBB|...A == BAC, involves matching the number of Bs to Cs:
A|BAC|BBACC|BBBACCC|...A == BCB, has equivalence classes: BCB|A, BCBCB|ACB|BCA,
BCBCBCB|ACBCB|ACA|BCACB|BCBCA, ...A == BB, has equivalence classes: A|BB, BBB|AB|BA, (A|BB)(A|BB)|BAB,
...A == BBB, has equivalence classes: A|BBB, BBBB|AB|BA,
BBBBB|ABB|BAB|BBA, ...AA == BB, has equivalence classes: AA|BB, AAA|ABB|BBA, BAA|BBB|AAB,
...AB == AA, has equivalence classes: A(A|B), A(A|B)(A|B),
A(A|B)(A|B)(A|B), ...AB == BB, similarlyAB == BC, has equivalence classes: (AB|BC), (AAB|ABC|BCC), ...,
A^iBC^(N-i)A == AAB and A == BAASee these notes from discussions: