# Comparison operators [Pull request](https://github.com/carbon-language/carbon-lang/pull/702) ## Table of contents - [Problem](#problem) - [Background](#background) - [Terminology](#terminology) - [Usage in existing languages](#usage-in-existing-languages) - [Three-way comparisons](#three-way-comparisons) - [Chained comparisons](#chained-comparisons) - [Proposal](#proposal) - [Details](#details) - [Precedence](#precedence) - [Associativity](#associativity) - [Built-in comparisons and implicit conversions](#built-in-comparisons-and-implicit-conversions) - [Consistency with implicit conversions](#consistency-with-implicit-conversions) - [Comparisons with constants](#comparisons-with-constants) - [Performance](#performance) - [Overloading](#overloading) - [Default implementations for basic types](#default-implementations-for-basic-types) - [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Alternatives considered](#alternatives-considered) - [Alternative symbols](#alternative-symbols) - [Chained comparisons](#chained-comparisons-1) - [Convert operands like C++](#convert-operands-like-c) - [Provide a three-way comparison operator](#provide-a-three-way-comparison-operator) - [Allow comparisons as the operand of `not`](#allow-comparisons-as-the-operand-of-not) ## Problem We need to be able to compare values for equality, and to compare ordered values for relative ordering. ## Background ### Terminology We refer to tests that check whether two values are the same or different as _equality_ comparisons, and to tests that determine the relative ordering of two values as _relational_ comparisons. ### Usage in existing languages There is near-universal convention on the use of the following symbols for relational operators: - `<`, `<=`, `>`, and `>=` perform ordered comparisons (less than, less than or equal to, greater than, greater than or equal to). There are rare exceptions in somewhat esoteric languages: some languages use `≤` and `≥`, but these are not straightforward to type for many potential Carbon developers. For equality operators, there is some divergence but still a very strong trend: - C-family languages, Rust, Swift, Kotlin, Zig, Nim, Ruby, etc. use `==` for equality comparison and `!=` for inequality comparison. - Some languages, such as ALGOL, APL, BASIC, and PL/I, use `=` as equality comparison, with some using a different symbol (such as `:=` or `<-`) for assignment and others distinguishing assignment from equality comparison based on context. - Haskell and Fortran use `==` for "equal to" and `/=` for "not equal to". The latter is intended to resemble a ≠ symbol. - Some languages, such as Pascal and BASIC, use `<>` for inequality comparison. Python 2 permits this as a synonym for `!=`. - Perl uses `eq` and `ne` for string comparisons; some shells and UNIX `test` use `-eq` and `-ne` for integer comparisons. Some languages support multiple different kinds of equality comparison, such as both a value comparison (typically `==`) and an object identity comparison (typically `===` or `is`). Some languages that freely convert between numbers and strings have different operators to perform a string comparison versus a numeric comparison. Fortran has custom `.eqv.` and `.neqv.` for equality comparisons of Boolean values. Some languages have synonyms for equality operators. For example, Fortran allows `.eq.`, `.ne.`, `.gt.`, and so on, as synonyms for `==`, `/=`, `>`, and so on. This appears to be historical: FORTRAN 77 had only the dotted forms of these operators. ### Three-way comparisons C++ has three-way comparisons, written using the `<=>` operator. These provide a useful mechanism to allow overloading the behavior of relational comparisons without defining four separate operator overloads for relational comparisons. Similarly, Python provides a `__cmp__` special method that can be used to implement all equality and relational comparisons. ### Chained comparisons Python permits comparisons to be chained: that is, `a < b <= c` is interpreted as `a < b and b <= c`, except that `b` is evaluated only once. In most C-family languages, that expression is instead interpreted as `(a < b) <= c`, which computes the value of `a < b`, maps `false` to `0` and `true` to `1`, then compares the result to `c`. ## Proposal Carbon will provide the following operators: - Equality comparison operators: `==` and `!=`. - Relational comparison operators: `<`, `<=`, `>`, `>=`. Each has the obvious mathematical meaning, where `==` means =, `!=` means ≠, `<=` means ≤, and `>=` means ≥. There will be no three-way comparison operator symbol. The interface used to support overloading comparison operators will provide a named function to perform three-way comparisons. Chained comparisons are an error: a comparison expression cannot appear as an unparenthesized operand of another comparison operator. For built-in types, we will follow these rules: - Behave consistently with implicit conversions: if an operation is valid between built-in types `T` and `U`, then it is valid between built-in types that implicitly convert to `T` and `U`. - Never invent an intermediate type that is larger than both operand types. - A comparison produces either a mathematically correct answer or a compilation error. The first two rules are expected to also apply for other built-in operators, such as arithmetic. The third rule is specific to comparisons. One consequence of the first rule is that we do not convert operands in a way that might lose information. This is generally also implied by the third rule. ## Details All six operators are infix binary operators. For standard Carbon types, they produce a `Bool` value. ### Precedence The comparison operators are all at the same precedence level. This level is lower than operators used to compute (non-Boolean) values, higher than the logical operators `and` and `or`, and incomparable with the precedence of `not`. For example, this is OK: ``` if (n + m * 3 < n * n and 3 < m and m < 6) {} ``` ... but these are errors: ``` // Error, ambiguous: `(not a) == b` or `not (a == b)`? if (not a == b) {} // Error, requires parentheses: `a == (not b)`. if (a == not b) {} // Error, requires parentheses: `not (f < 5.0)`. if (not f < 5.0) {} ``` ### Associativity The comparison operators are non-associative. For example: ``` // Error, need `3 < m and m < 6`. if (3 < m < 6) {} // Error, need `a == b and b == c`. if (a == b == c) {} // Error, need `(m > 1) == (n > 1)`. if (m > 1 == n > 1) {} ``` ### Built-in comparisons and implicit conversions Built-in comparisons are permitted: - when both operands are of standard Carbon integer types (`Int(n)` or `Unsigned(n)`), or - when both operands are of standard Carbon floating-point types (`Float(n)`), or - when one operand is of floating-point type and the other is of integer type, if all values of the integer type can be exactly represented in the floating-point type In each case, the result is the mathematically-correct answer. This applies even when comparing `Int(n)` with `Unsigned(m)`. For example: ``` // The value of `v` is True, because `a` is less than `b`, even though the // result of either an `i32` comparison or a `u32` comparison would be False. fn f(a: i32, b: u32) -> Bool { return a < b; } let v: Bool = f(-1, 4_000_000_000); // This does not compile, because `i64` values in general (and 10^18 in // particular) are not exactly representable in the type `f32`. let f: f32 = 1.0e18; let n: i64 = 1_000_000_000_000_000_000; let w: Bool = f == n; ``` Comparisons involving integer and floating-point constants are not covered by these rules and are [discussed separately](#comparisons-with-constants). #### Consistency with implicit conversions As specified in [#820](https://github.com/carbon-language/carbon-lang/pull/820), we support the following implicit conversions: - from `Int(n)` to `Int(m)` if `m > n`, - from `Unsigned(n)` to `Int(m)` or `Unsigned(m)` if `m > n`, - from `Float(n)` to `Float(m)` if `m > n`, and - from `Int(n)` to `Float(m)` if `Float(m)` can represent all values of `Int(n)`. These rules can be summarized as: a type `T` can be converted to `U` if every value of type `T` is a value of type `U`. Additionally [#820](https://github.com/carbon-language/carbon-lang/pull/820) permits conversions from certain kinds of integer and floating-point constants to `Int(n)` and `Float(n)` types if the constant can be represented in the type. All built-in comparisons can be viewed as performing implicit conversions on at most one of the operands in order to reach a suitable pair of identical or very similar types, and then performing a comparison on those types. The target types for these implicit conversions are, for each suitable value `n`: - `Int(n)` vs `Int(n)` - `Unsigned(n)` vs `Unsigned(n)` - `Int(n)` vs `Unsigned(n)` - `Unsigned(n)` vs `Int(n)` - `Float(n)` vs `Float(n)` There will in general be multiple combinations of implicit conversions that will lead to one of the above forms, but we will arrive at the same result regardless of which is selected, because all comparisons are mathematically correct and all implicit conversions are lossless. Implementations are expected to do whatever is most efficient: for example, for `u16 < i32` it is likely that the best choice would be to promote the `u16` to `i32`, not `u32`. Because we only ever convert at most one operand, we never use an intermediate type that is larger than both input types. For example, both `i32` and `f32` can be implicitly converted to `f64`, but we do not permit comparisons between `i32` and `f32` even though we could perform those comparisons in `f64`. If such comparisons were permitted, the results could be surprising: ``` // OK, i32 can exactly represent this value. var n: i32 = 2_000_000_001; // OK, this value is within the representable range for f32. var f: f32 = 2_000_000_001.0; // This comparison could compare unequal, because f32 cannot exactly represent // the value 2,000,000,001. if (n == f) { ... } // OK with explicit cast, but may still compare unequal. if (n == f as f64) { ... } if (n as f64 == f) { ... } ``` The two kinds of mixed-type comparison may be [less efficient](#performance) than the other kinds due to the slightly wider domain. Note that this approach diverges from C++, which would convert both operands to a common type first, sometimes performing a lossy conversion potentially giving an incorrect result, sometimes converting both operands, and sometimes using a wider type than either of the operand types. #### Comparisons with constants As described in [#820](https://github.com/carbon-language/carbon-lang/pull/820), integer constants can be implicitly converted to any integer or floating-point type that can represent their value, and floating-point constants can be implicitly converted to any floating-point type that can represent their value. We permit the following comparisons involving constants: - A constant can be compared with a value of any type to which it can be implicitly converted. - Any two constants can be compared, even if there is no type that can represent both. Note that this disallows comparisons between, for example, `i32` and an integer literal that cannot be represented in `i32`. Such comparisons would always be tautological. This decision should be revisited if it proves problematic in practice, for example in templated code where the literal is sometimes in range. #### Performance The choice to give correct results for signed/unsigned comparisons has a performance impact in practice, because it exposes operations that some processors do not currently directly support. [Sample microbenchmarks](https://quick-bench.com/q/1_xA8G_jXci_yeOKt0WgCc6eGN4) for implementations of several operations show the following performance on x86_64: | Operation | Mathematical comparison time | C++ comparison time | Ratio | | ----------- | ---------------------------- | ------------------- | ----- | | `i64 < u64` | 1636 | 798 | 2.0x | | `u64 < i64` | 1956 | 798 | 2.5x | The execution times here are computed as operation time minus no-op time. The mixed-type operations typically have 2-2.5x the execution time of the same-type operations. However, this is a predictable performance change, and can be controlled by the developer by converting the operands to a suitable type prior to the conversion if a faster same-type comparison is preferred over a correct mixed-type comparison. The above comparison attempts to demonstrate a worst-case difference. In many cases, better code can be generated for the mixed-type comparison. For example, when [branching on the result of the comparison](https://quick-bench.com/q/mXJiHK3_RcCH4fgB88phQscLu88), the difference is significantly reduced: | Operation | Mathematical comparison time | C++ comparison time | Ratio | | ----------- | ---------------------------- | ------------------- | ----- | | `i64 < u64` | 996 | 991 | 1.0x | | `u64 < i64` | 1973 | 997 | 2.0x | ### Overloading Separate interfaces will be provided to permit overloading equality and relational comparisons. The exact design of those interfaces is left to a future proposal. As non-binding design guidance for such a proposal: - The interface for equality comparisons should primarily provide the ability to override the behavior of `==`. The `!=` operator can optionally also be overridden, with a default implementation that returns `not (a == b)`. Overriding `!=` separately from `==` is expected to be used to support floating-point NaN comparisons and for C++ interoperability. - The interface for relational comparisons should primarily provide the ability to specify a three-way comparison operator. The individual relational comparison operators can optionally be overridden separately, with a default implementation in terms of the three-way comparison operator. This facility is expected to be used primarily to support C++ interoperability. - Overloaded comparison operators may wish to produce a type other than `Bool`, for uses such as a vector comparison producing a vector of `Bool` values. We should decide whether we wish to support such uses. ### Default implementations for basic types In addition to being defined for standard Carbon numeric types, equality and relational comparisons are also defined for all "data" types: - Tuples. - Structs (structural data classes). - Classes implementing an interface that identifies them as data classes. Relational comparisons for these types provide a lexicographical ordering. This proposal defers to [#710](https://github.com/carbon-language/carbon-lang/issues/710) for details on comparison support for classes. In each case, the comparison is only available if it is supported by all element types. The `Bool` type should be treated as a choice type, and so should support equality comparisons and relational comparisons if and only if choice types do in general. That decision is left to a future proposal. ## Rationale based on Carbon's goals - _Performance-critical software:_ - The use of a three-way comparison as the central primitive for overloading relational comparisons provides predictable, composable performance for comparing hierarchical data structures. - The performance of mixed comparisons may be slower than in C++, but this is because it's performing a different operation. This performance change is predictable, and can be controlled by the programmer by performing suitable non-value-preserving casts to a common type prior to the comparison. - _Code that is easy to read, understand, and write:_ - The chosen precedence and associativity rules aim to avoid bugs and ensure the code does what it appears to do, requiring parentheses in cases where the intent is unclear. - The choice to not perform lossy conversions on operands of a comparison operator removes a source of bugs caused by unintended lossy conversions. - _Interoperability with and migration from existing C++ code:_ - The use of the chosen operator symbols exactly matches C++, reducing friction for developers and code moving between the two languages, and for interoperability. ## Alternatives considered ### Alternative symbols We could use `/=` instead of `!=` for not-equal comparisons. Advantages: - Avoids overloading `!` for both "not equals" and template/generic use in `:!` bindings. - There is no other usage of `!` meaning "not" in the language because we use a `not` operator. Disadvantages: - Unfamiliar to C++ programmers. - `a /= b` would likely be expected to mean an `a = a / b` compound assignment. - Breaks consistency with Python, which uses `not` for logical negation and `!=` for inequality comparison. We could use `=/=` instead of `!=` for not-equal comparisons. Advantages: - As above; also `=/=` looks like an `==` with a line through the middle. Disadvantages: - This would be inventive and unlike all other languages. As above, breaks consistency with Python. - This would make `=/=` one character longer, and harder to type on US-ASCII keyboards because the keys are distant but likely to be typed with the same finger. ### Chained comparisons We could support Python-like chained comparisons. Advantages: - Small ergonomic improvement for range comparisons. - Middle operand is evaluated only once. Disadvantages: - Using the middle expression as an argument to two different functions may create problems, as the value will need to be stored somewhere, potentially changing the semantics of the operator expression as we can no longer move from the operand. - Both short-circuiting behavior and non-short-circuiting behavior will be surprising and unintuitive to some. The short-circuiting option will introduce control flow without a keyword to announce it, which goes against our design decision to use a keyword for `and` and `or` to announce the control flow. The non-short-circuiting option will evaluate subexpressions unnecessarily, which creates a tension with our performance goal. - Experienced C++ developers may expect a different behavior, such as `a < b == cmp` comparing the result of `a < b` against the Boolean value `cmp`. See also the ongoing discussion in [#451](https://github.com/carbon-language/carbon-lang/issues/451). ### Convert operands like C++ We could convert the operands of comparison operators in a way that's equivalent to C++'s behavior. Advantages: - May ease migration from C++. - May allow programmers to reuse some intuition, for example when comparing floating-point values against integer values. - Allows more efficient machine code to be generated for source code that takes no special care about the types of comparison operands. - Improves performance predictability for C++ developers unfamiliar with Carbon's rules. Disadvantages: - Produces incorrect results. - Does not provide a simple syntax for correct mixed-type comparisons. ### Provide a three-way comparison operator We could provide a symbol for three-way comparisons, such as C++20's `<=>`. Advantages: - The use of a symbol rather than a named member of an interface for this functionality may ease migration from C++20. Disadvantages: - Reserves a symbol for an operation that should not be used directly except in special circumstances, and that will produce a nuanced type even when comparing standard Carbon types such as `f32`. ### Allow comparisons as the operand of `not` We could permit comparisons to appear as the immediate operand of `not` without parentheses. Advantages: - Provides an easier syntax for floating-point comparisons where the desired result for a NaN operand is `True` rather than `False`: `not f < 5.0`. Disadvantages: - Introduces ambiguity when comparing Boolean values: `not cond1 == cond2` might intend to compare `not cond1` to `cond2` rather than comparing `cond1 != cond2`.