We need to be able to compare values for equality, and to compare ordered values for relative ordering.
We refer to tests that check whether two values are the same or different as equality comparisons, and to tests that determine the relative ordering of two values as relational comparisons.
There is near-universal convention on the use of the following symbols for relational operators:
<, <=, >, and >= perform ordered comparisons (less than, less than
or equal to, greater than, greater than or equal to).There are rare exceptions in somewhat esoteric languages: some languages use ≤
and ≥, but these are not straightforward to type for many potential Carbon
developers.
For equality operators, there is some divergence but still a very strong trend:
== for
equality comparison and != for inequality comparison.= as equality
comparison, with some using a different symbol (such as := or <-) for
assignment and others distinguishing assignment from equality comparison
based on context.== for "equal to" and /= for "not equal to". The
latter is intended to resemble a ≠ symbol.<> for inequality
comparison. Python 2 permits this as a synonym for !=.eq and ne for string comparisons; some shells and UNIX test
use -eq and -ne for integer comparisons.Some languages support multiple different kinds of equality comparison, such as
both a value comparison (typically ==) and an object identity comparison
(typically === or is). Some languages that freely convert between numbers
and strings have different operators to perform a string comparison versus a
numeric comparison. Fortran has custom .eqv. and .neqv. for equality
comparisons of Boolean values.
Some languages have synonyms for equality operators. For example, Fortran allows
.eq., .ne., .gt., and so on, as synonyms for ==, /=, >, and so on.
This appears to be historical: FORTRAN 77 had only the dotted forms of these
operators.
C++ has three-way comparisons, written using the <=> operator. These provide a
useful mechanism to allow overloading the behavior of relational comparisons
without defining four separate operator overloads for relational comparisons.
Similarly, Python provides a __cmp__ special method that can be used to
implement all equality and relational comparisons.
Python permits comparisons to be chained: that is, a < b <= c is interpreted
as a < b and b <= c, except that b is evaluated only once. In most C-family
languages, that expression is instead interpreted as (a < b) <= c, which
computes the value of a < b, maps false to 0 and true to 1, then
compares the result to c.
Carbon will provide the following operators:
== and !=.<, <=, >, >=.Each has the obvious mathematical meaning, where == means =, != means ≠,
<= means ≤, and >= means ≥.
There will be no three-way comparison operator symbol. The interface used to support overloading comparison operators will provide a named function to perform three-way comparisons.
Chained comparisons are an error: a comparison expression cannot appear as an unparenthesized operand of another comparison operator.
For built-in types, we will follow these rules:
T and U, then it is valid between built-in types
that implicitly convert to T and U.The first two rules are expected to also apply for other built-in operators, such as arithmetic. The third rule is specific to comparisons.
One consequence of the first rule is that we do not convert operands in a way that might lose information. This is generally also implied by the third rule.
All six operators are infix binary operators. For standard Carbon types, they
produce a Bool value.
The comparison operators are all at the same precedence level. This level is
lower than operators used to compute (non-Boolean) values, higher than the
logical operators and and or, and incomparable with the precedence of not.
For example, this is OK:
if (n + m * 3 < n * n and 3 < m and m < 6) {}
... but these are errors:
// Error, ambiguous: `(not a) == b` or `not (a == b)`?
if (not a == b) {}
// Error, requires parentheses: `a == (not b)`.
if (a == not b) {}
// Error, requires parentheses: `not (f < 5.0)`.
if (not f < 5.0) {}
The comparison operators are non-associative. For example:
// Error, need `3 < m and m < 6`.
if (3 < m < 6) {}
// Error, need `a == b and b == c`.
if (a == b == c) {}
// Error, need `(m > 1) == (n > 1)`.
if (m > 1 == n > 1) {}
Built-in comparisons are permitted:
Int(n) or
Unsigned(n)), orFloat(n)),
orIn each case, the result is the mathematically-correct answer. This applies even
when comparing Int(n) with Unsigned(m).
For example:
// The value of `v` is True, because `a` is less than `b`, even though the
// result of either an `i32` comparison or a `u32` comparison would be False.
fn f(a: i32, b: u32) -> Bool { return a < b; }
let v: Bool = f(-1, 4_000_000_000);
// This does not compile, because `i64` values in general (and 10^18 in
// particular) are not exactly representable in the type `f32`.
let f: f32 = 1.0e18;
let n: i64 = 1_000_000_000_000_000_000;
let w: Bool = f == n;
Comparisons involving integer and floating-point constants are not covered by these rules and are discussed separately.
As specified in #820, we support the following implicit conversions:
Int(n) to Int(m) if m > n,Unsigned(n) to Int(m) or Unsigned(m) if m > n,Float(n) to Float(m) if m > n, andInt(n) to Float(m) if Float(m) can represent all values of
Int(n).These rules can be summarized as: a type T can be converted to U if every
value of type T is a value of type U.
Additionally #820
permits conversions from certain kinds of integer and floating-point constants
to Int(n) and Float(n) types if the constant can be represented in the type.
All built-in comparisons can be viewed as performing implicit conversions on at
most one of the operands in order to reach a suitable pair of identical or very
similar types, and then performing a comparison on those types. The target types
for these implicit conversions are, for each suitable value n:
Int(n) vs Int(n)Unsigned(n) vs Unsigned(n)Int(n) vs Unsigned(n)Unsigned(n) vs Int(n)Float(n) vs Float(n)There will in general be multiple combinations of implicit conversions that will
lead to one of the above forms, but we will arrive at the same result regardless
of which is selected, because all comparisons are mathematically correct and all
implicit conversions are lossless. Implementations are expected to do whatever
is most efficient: for example, for u16 < i32 it is likely that the best
choice would be to promote the u16 to i32, not u32.
Because we only ever convert at most one operand, we never use an intermediate
type that is larger than both input types. For example, both i32 and f32 can
be implicitly converted to f64, but we do not permit comparisons between i32
and f32 even though we could perform those comparisons in f64. If such
comparisons were permitted, the results could be surprising:
// OK, i32 can exactly represent this value.
var n: i32 = 2_000_000_001;
// OK, this value is within the representable range for f32.
var f: f32 = 2_000_000_001.0;
// This comparison could compare unequal, because f32 cannot exactly represent
// the value 2,000,000,001.
if (n == f) { ... }
// OK with explicit cast, but may still compare unequal.
if (n == f as f64) { ... }
if (n as f64 == f) { ... }
The two kinds of mixed-type comparison may be less efficient than the other kinds due to the slightly wider domain.
Note that this approach diverges from C++, which would convert both operands to a common type first, sometimes performing a lossy conversion potentially giving an incorrect result, sometimes converting both operands, and sometimes using a wider type than either of the operand types.
As described in #820, integer constants can be implicitly converted to any integer or floating-point type that can represent their value, and floating-point constants can be implicitly converted to any floating-point type that can represent their value. We permit the following comparisons involving constants:
Note that this disallows comparisons between, for example, i32 and an integer
literal that cannot be represented in i32. Such comparisons would always be
tautological. This decision should be revisited if it proves problematic in
practice, for example in templated code where the literal is sometimes in range.
The choice to give correct results for signed/unsigned comparisons has a performance impact in practice, because it exposes operations that some processors do not currently directly support. Sample microbenchmarks for implementations of several operations show the following performance on x86_64:
| Operation | Mathematical comparison time | C++ comparison time | Ratio |
|---|---|---|---|
i64 < u64 |
1636 | 798 | 2.0x |
u64 < i64 |
1956 | 798 | 2.5x |
The execution times here are computed as operation time minus no-op time.
The mixed-type operations typically have 2-2.5x the execution time of the same-type operations. However, this is a predictable performance change, and can be controlled by the developer by converting the operands to a suitable type prior to the conversion if a faster same-type comparison is preferred over a correct mixed-type comparison.
The above comparison attempts to demonstrate a worst-case difference. In many cases, better code can be generated for the mixed-type comparison. For example, when branching on the result of the comparison, the difference is significantly reduced:
| Operation | Mathematical comparison time | C++ comparison time | Ratio |
|---|---|---|---|
i64 < u64 |
996 | 991 | 1.0x |
u64 < i64 |
1973 | 997 | 2.0x |
Separate interfaces will be provided to permit overloading equality and relational comparisons. The exact design of those interfaces is left to a future proposal. As non-binding design guidance for such a proposal:
==. The != operator can optionally also be
overridden, with a default implementation that returns not (a == b).
Overriding != separately from == is expected to be used to support
floating-point NaN comparisons and for C++ interoperability.Bool, for uses such as a vector comparison producing a vector of Bool
values. We should decide whether we wish to support such uses.In addition to being defined for standard Carbon numeric types, equality and relational comparisons are also defined for all "data" types:
Relational comparisons for these types provide a lexicographical ordering. This proposal defers to #710 for details on comparison support for classes.
In each case, the comparison is only available if it is supported by all element types.
The Bool type should be treated as a choice type, and so should support
equality comparisons and relational comparisons if and only if choice types do
in general. That decision is left to a future proposal.
Performance-critical software:
Code that is easy to read, understand, and write:
Interoperability with and migration from existing C++ code:
We could use /= instead of != for not-equal comparisons.
Advantages:
! for both "not equals" and template/generic use in
:! bindings.! meaning "not" in the language because we use
a not operator.Disadvantages:
a /= b would likely be expected to mean an a = a / b compound
assignment.not for logical negation and
!= for inequality comparison.We could use =/= instead of != for not-equal comparisons.
Advantages:
=/= looks like an == with a line through the middle.Disadvantages:
=/= one character longer, and harder to type on US-ASCII
keyboards because the keys are distant but likely to be typed with the same
finger.We could support Python-like chained comparisons.
Advantages:
Disadvantages:
and and or to announce the
control flow. The non-short-circuiting option will evaluate subexpressions
unnecessarily, which creates a tension with our performance goal.a < b == cmp comparing the result of a < b against the Boolean value
cmp.See also the ongoing discussion in #451.
We could convert the operands of comparison operators in a way that's equivalent to C++'s behavior.
Advantages:
Disadvantages:
We could provide a symbol for three-way comparisons, such as C++20's <=>.
Advantages:
Disadvantages:
f32.notWe could permit comparisons to appear as the immediate operand of not without
parentheses.
Advantages:
True rather than False: not f < 5.0.Disadvantages:
not cond1 == cond2
might intend to compare not cond1 to cond2 rather than comparing
cond1 != cond2.