Просмотр исходного кода

Comparison operators (#702)

This proposal introduces the operators `==`, `!=`, `<`, `<=`, `>`, and
`>=` to Carbon.

Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>
Co-authored-by: Geoff Romer <gromer@google.com>
Co-authored-by: josh11b <josh11b@users.noreply.github.com>
Richard Smith 4 лет назад
Родитель
Сommit
3b11b8d2cf
1 измененных файлов с 512 добавлено и 0 удалено
  1. 512 0
      proposals/p0702.md

+ 512 - 0
proposals/p0702.md

@@ -0,0 +1,512 @@
+# Comparison operators
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/702)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+-   [Background](#background)
+    -   [Terminology](#terminology)
+    -   [Usage in existing languages](#usage-in-existing-languages)
+    -   [Three-way comparisons](#three-way-comparisons)
+    -   [Chained comparisons](#chained-comparisons)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [Precedence](#precedence)
+    -   [Associativity](#associativity)
+    -   [Built-in comparisons and implicit conversions](#built-in-comparisons-and-implicit-conversions)
+        -   [Consistency with implicit conversions](#consistency-with-implicit-conversions)
+        -   [Comparisons with constants](#comparisons-with-constants)
+        -   [Performance](#performance)
+    -   [Overloading](#overloading)
+    -   [Default implementations for basic types](#default-implementations-for-basic-types)
+-   [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Alternative symbols](#alternative-symbols)
+    -   [Chained comparisons](#chained-comparisons-1)
+    -   [Convert operands like C++](#convert-operands-like-c)
+    -   [Provide a three-way comparison operator](#provide-a-three-way-comparison-operator)
+    -   [Allow comparisons as the operand of `not`](#allow-comparisons-as-the-operand-of-not)
+
+<!-- tocstop -->
+
+## Problem
+
+We need to be able to compare values for equality, and to compare ordered values
+for relative ordering.
+
+## Background
+
+### Terminology
+
+We refer to tests that check whether two values are the same or different as
+_equality_ comparisons, and to tests that determine the relative ordering of two
+values as _relational_ comparisons.
+
+### Usage in existing languages
+
+There is near-universal convention on the use of the following symbols for
+relational operators:
+
+-   `<`, `<=`, `>`, and `>=` perform ordered comparisons (less than, less than
+    or equal to, greater than, greater than or equal to).
+
+There are rare exceptions in somewhat esoteric languages: some languages use `≤`
+and `≥`, but these are not straightforward to type for many potential Carbon
+developers.
+
+For equality operators, there is some divergence but still a very strong trend:
+
+-   C-family languages, Rust, Swift, Kotlin, Zig, Nim, Ruby, etc. use `==` for
+    equality comparison and `!=` for inequality comparison.
+-   Some languages, such as ALGOL, APL, BASIC, and PL/I, use `=` as equality
+    comparison, with some using a different symbol (such as `:=` or `<-`) for
+    assignment and others distinguishing assignment from equality comparison
+    based on context.
+-   Haskell and Fortran use `==` for "equal to" and `/=` for "not equal to". The
+    latter is intended to resemble a ≠ symbol.
+-   Some languages, such as Pascal and BASIC, use `<>` for inequality
+    comparison. Python 2 permits this as a synonym for `!=`.
+-   Perl uses `eq` and `ne` for string comparisons; some shells and UNIX `test`
+    use `-eq` and `-ne` for for integer comparisons.
+
+Some languages support multiple different kinds of equality comparison, such as
+both a value comparison (typically `==`) and an object identity comparison
+(typically `===` or `is`). Some languages that freely convert between numbers
+and strings have different operators to perform a string comparison versus a
+numeric comparison. Fortran has custom `.eqv.` and `.neqv.` for equality
+comparisons of Boolean values.
+
+Some languages have synonyms for equality operators. For example, Fortran allows
+`.eq.`, `.ne.`, `.gt.`, and so on, as synonyms for `==`, `/=`, `>`, and so on.
+This appears to be historical: FORTRAN 77 had only the dotted forms of these
+operators.
+
+### Three-way comparisons
+
+C++ has three-way comparisons, written using the `<=>` operator. These provide a
+useful mechanism to allow overloading the behavior of relational comparisons
+without defining four separate operator overloads for relational comparisons.
+
+Similarly, Python provides a `__cmp__` special method that can be used to
+implement all equality and relational comparisons.
+
+### Chained comparisons
+
+Python permits comparisons to be chained: that is, `a < b <= c` is interpreted
+as `a < b and b <= c`, except that `b` is evaluated only once. In most C-family
+languages, that expression is instead interpreted as `(a < b) <= c`, which
+computes the value of `a < b`, maps `false` to `0` and `true` to `1`, then
+compares the result to `c`.
+
+## Proposal
+
+Carbon will provide the following operators:
+
+-   Equality comparison operators: `==` and `!=`.
+-   Relational comparison operators: `<`, `<=`, `>`, `>=`.
+
+Each has the obvious mathematical meaning, where `==` means =, `!=` means ≠,
+`<=` means ≤, and `>=` means ≥.
+
+There will be no three-way comparison operator symbol. The interface used to
+support overloading comparison operators will provide a named function to
+perform three-way comparisons.
+
+Chained comparisons are an error: a comparison expression cannot appear as an
+unparenthesized operand of another comparison operator.
+
+For built-in types, we will follow these rules:
+
+-   Behave consistently with implicit conversions: if an operation is valid
+    between built-in types `T` and `U`, then it is valid between built-in types
+    that implicitly convert to `T` and `U`.
+-   Never invent an intermediate type that is larger than both operand types.
+-   A comparison produces either a mathematically correct answer or a
+    compilation error.
+
+The first two rules are expected to also apply for other built-in operators,
+such as arithmetic. The third rule is specific to comparisons.
+
+One consequence of the first rule is that we do not convert operands in a way
+that might lose information. This is generally also implied by the third rule.
+
+## Details
+
+All six operators are infix binary operators. For standard Carbon types, they
+produce a `Bool` value.
+
+### Precedence
+
+The comparison operators are all at the same precedence level. This level is
+lower than operators used to compute (non-Boolean) values, higher than the
+logical operators `and` and `or`, and incomparable with the precedence of `not`.
+
+For example, this is OK:
+
+```
+if (n + m * 3 < n * n and 3 < m and m < 6) {}
+```
+
+... but these are errors:
+
+```
+// Error, ambiguous: `(not a) == b` or `not (a == b)`?
+if (not a == b) {}
+// Error, requires parentheses: `a == (not b)`.
+if (a == not b) {}
+// Error, requires parentheses: `not (f < 5.0)`.
+if (not f < 5.0) {}
+```
+
+### Associativity
+
+The comparison operators are non-associative. For example:
+
+```
+// Error, need `3 < m and m < 6`.
+if (3 < m < 6) {}
+// Error, need `a == b and b == c`.
+if (a == b == c) {}
+// Error, need `(m > 1) == (n > 1)`.
+if (m > 1 == n > 1) {}
+```
+
+### Built-in comparisons and implicit conversions
+
+Built-in comparisons are permitted:
+
+-   when both operands are of standard Carbon integer types (`Int(n)` or
+    `Unsigned(n)`), or
+-   when both operands are of standard Carbon floating-point types (`Float(n)`),
+    or
+-   when one operand is of floating-point type and the other is of integer type,
+    if all values of the integer type can be exactly represented in the
+    floating-point type
+
+In each case, the result is the mathematically-correct answer. This applies even
+when comparing `Int(n)` with `Unsigned(m)`.
+
+For example:
+
+```
+// The value of `v` is True, because `a` is less than `b`, even though the
+// result of either an `i32` comparison or a `u32` comparison would be False.
+fn f(a: i32, b: u32) -> Bool { return a < b; }
+let v: Bool = f(-1, 4_000_000_000);
+
+// This does not compile, because `i64` values in general (and 10^18 in
+// particular) are not exactly representable in the type `f32`.
+let f: f32 = 1.0e18;
+let n: i64 = 1_000_000_000_000_000_000;
+let w: Bool = f == n;
+```
+
+Comparisons involving integer and floating-point constants are not covered by
+these rules and are [discussed separately](#comparisons-with-constants).
+
+#### Consistency with implicit conversions
+
+As specified in [#820](https://github.com/carbon-language/carbon-lang/pull/820),
+we support the following implicit conversions:
+
+-   from `Int(n)` to `Int(m)` if `m > n`,
+-   from `Unsigned(n)` to `Int(m)` or `Unsigned(m)` if `m > n`,
+-   from `Float(n)` to `Float(m)` if `m > n`, and
+-   from `Int(n)` to `Float(m)` if `Float(m)` can represent all values of
+    `Int(n)`.
+
+These rules can be summarized as: a type `T` can be converted to `U` if every
+value of type `T` is a value of type `U`.
+
+Additionally [#820](https://github.com/carbon-language/carbon-lang/pull/820)
+permits conversions from certain kinds of integer and floating-point constants
+to `Int(n)` and `Float(n)` types if the constant can be represented in the type.
+
+All built-in comparisons can be viewed as performing implicit conversions on at
+most one of the operands in order to reach a suitable pair of identical or very
+similar types, and then performing a comparison on those types. The target types
+for these implicit conversions are, for each suitable value `n`:
+
+-   `Int(n)` vs `Int(n)`
+-   `Unsigned(n)` vs `Unsigned(n)`
+-   `Int(n)` vs `Unsigned(n)`
+-   `Unsigned(n)` vs `Int(n)`
+-   `Float(n)` vs `Float(n)`
+
+There will in general be multiple combinations of implicit conversions that will
+lead to one of the above forms, but we will arrive at the same result regardless
+of which is selected, because all comparisons are mathematically correct and all
+implicit conversions are lossless. Implementations are expected to do whatever
+is most efficient: for example, for `u16 < i32` it is likely that the best
+choice would be to promote the `u16` to `i32`, not `u32`.
+
+Because we only ever convert at most one operand, we never use an intermediate
+type that is larger than both input types. For example, both `i32` and `f32` can
+be implicitly converted to `f64`, but we do not permit comparisons between `i32`
+and `f32` even though we could perform those comparisons in `f64`. If such
+comparisons were permitted, the results could be surprising:
+
+```
+// OK, i32 can exactly represent this value.
+var n: i32 = 2_000_000_001;
+// OK, this value is within the representable range for f32.
+var f: f32 = 2_000_000_001.0;
+// This comparison could compare unequal, because f32 cannot exactly represent
+// the value 2,000,000,001.
+if (n == f) { ... }
+// OK with explicit cast, but may still compare unequal.
+if (n == f as f64) { ... }
+if (n as f64 == f) { ... }
+```
+
+The two kinds of mixed-type comparison may be [less efficient](#performance)
+than the other kinds due to the slightly wider domain.
+
+Note that this approach diverges from C++, which would convert both operands to
+a common type first, sometimes performing a lossy conversion potentially giving
+an incorrect result, sometimes converting both operands, and sometimes using a
+wider type than either of the operand types.
+
+#### Comparisons with constants
+
+As described in [#820](https://github.com/carbon-language/carbon-lang/pull/820),
+integer constants can be implicitly converted to any integer or floating-point
+type that can represent their value, and floating-point constants can be
+implicitly converted to any floating-point type that can represent their value.
+We permit the following comparisons involving constants:
+
+-   A constant can be compared with a value of any type to which it can be
+    implicitly converted.
+-   Any two constants can be compared, even if there is no type that can
+    represent both.
+
+Note that this disallows comparisons between, for example, `i32` and an integer
+literal that cannot be represented in `i32`. Such comparisons would always be
+tautological. This decision should be revisited if it proves problematic in
+practice, for example in templated code where the literal is sometimes in range.
+
+#### Performance
+
+The choice to give correct results for signed/unsigned comparisons has a
+performance impact in practice, because it exposes operations that some
+processors do not currently directly support.
+[Sample microbenchmarks](https://quick-bench.com/q/1_xA8G_jXci_yeOKt0WgCc6eGN4)
+for implementations of several operations show the following performance on
+x86_64:
+
+| Operation   | Mathematical comparison time | C++ comparison time | Ratio |
+| ----------- | ---------------------------- | ------------------- | ----- |
+| `i64 < u64` | 1636                         | 798                 | 2.0x  |
+| `u64 < i64` | 1956                         | 798                 | 2.5x  |
+
+The execution times here are computed as operation time minus no-op time.
+
+The mixed-type operations typically have 2-2.5x the execution time of the
+same-type operations. However, this is a predictable performance change, and can
+be controlled by the developer by converting the operands to a suitable type
+prior to the conversion if a faster same-type comparison is preferred over a
+correct mixed-type comparison.
+
+The above comparison attempts to demonstrate a worst-case difference. In many
+cases, better code can be generated for the mixed-type comparison. For example,
+when
+[branching on the result of the comparison](https://quick-bench.com/q/mXJiHK3_RcCH4fgB88phQscLu88),
+the difference is significantly reduced:
+
+| Operation   | Mathematical comparison time | C++ comparison time | Ratio |
+| ----------- | ---------------------------- | ------------------- | ----- |
+| `i64 < u64` | 996                          | 991                 | 1.0x  |
+| `u64 < i64` | 1973                         | 997                 | 2.0x  |
+
+### Overloading
+
+Separate interfaces will be provided to permit overloading equality and
+relational comparisons. The exact design of those interfaces is left to a future
+proposal. As non-binding design guidance for such a proposal:
+
+-   The interface for equality comparisons should primarily provide the ability
+    to override the behavior of `==`. The `!=` operator can optionally also be
+    overridden, with a default implementation that returns `not (a == b)`.
+    Overriding `!=` separately from `==` is expected to be used to support
+    floating-point NaN comparisons and for C++ interoperability.
+-   The interface for relational comparisons should primarily provide the
+    ability to specify a three-way comparison operator. The individual
+    relational comparison operators can optionally be overridden separately,
+    with a default implementation in terms of the three-way comparison operator.
+    This facility is expected to be used primarily to support C++
+    interoperability.
+-   Overloaded comparison operators may wish to produce a type other than
+    `Bool`, for uses such as a vector comparison producing a vector of `Bool`
+    values. We should decide whether we wish to support such uses.
+
+### Default implementations for basic types
+
+In addition to being defined for standard Carbon numeric types, equality and
+relational comparisons are also defined for all "data" types:
+
+-   Tuples.
+-   Structs (structural data classes).
+-   Classes implementing an interface that identifies them as data classes.
+
+Relational comparisons for these types provide a lexicographical ordering. This
+proposal defers to
+[#710](https://github.com/carbon-language/carbon-lang/issues/710) for details on
+comparison support for classes.
+
+In each case, the comparison is only available if it is supported by all element
+types.
+
+The `Bool` type should be treated as a choice type, and so should support
+equality comparisons and relational comparisons if and only if choice types do
+in general. That decision is left to a future proposal.
+
+## Rationale based on Carbon's goals
+
+-   _Performance-critical software:_
+
+    -   The use of a three-way comparison as the central primitive for
+        overloading relational comparisons provides predictable, composable
+        performance for comparing hierarchical data structures.
+    -   The performance of mixed comparisons may be slower than in C++, but this
+        is because it's performing a different operation. This performance
+        change is predictable, and can be controlled by the programmer by
+        performing suitable non-value-preserving casts to a common type prior to
+        the comparison.
+
+-   _Code that is easy to read, understand, and write:_
+
+    -   The chosen precedence and associativity rules aim to avoid bugs and
+        ensure the code does what it appears to do, requiring parentheses in
+        cases where the intent is unclear.
+    -   The choice to not perform lossy conversions on operands of a comparison
+        operator removes a source of bugs caused by unintended lossy
+        conversions.
+
+-   _Interoperability with and migration from existing C++ code:_
+
+    -   The use of the chosen operator symbols exactly matches C++, reducing
+        friction for developers and code moving between the two languages, and
+        for interoperability.
+
+## Alternatives considered
+
+### Alternative symbols
+
+We could use `/=` instead of `!=` for not-equal comparisons.
+
+Advantages:
+
+-   Avoids overloading `!` for both "not equals" and template/generic use in
+    `:!` bindings.
+-   There is no other usage of `!` meaning "not" in the language because we use
+    a `not` operator.
+
+Disadvantages:
+
+-   Unfamiliar to C++ programmers.
+-   `a /= b` would likely be expected to mean an `a = a / b` compound
+    assignment.
+-   Breaks consistency with Python, which uses `not` for logical negation and
+    `!=` for inequality comparison.
+
+We could use `=/=` instead of `!=` for not-equal comparisons.
+
+Advantages:
+
+-   As above; also `=/=` looks like an `==` with a line through the middle.
+
+Disadvantages:
+
+-   This would be inventive and unlike all other languages. As above, breaks
+    consistency with Python.
+-   This would make `=/=` one character longer, and harder to type on US-ASCII
+    keyboards because the keys are distant but likely to be typed with the same
+    finger.
+
+### Chained comparisons
+
+We could support Python-like chained comparisons.
+
+Advantages:
+
+-   Small ergonomic improvement for range comparisons.
+-   Middle operand is evaluated only once.
+
+Disadvantages:
+
+-   Using the middle expression as an argument to two different functions may
+    create problems, as the value will need to be stored somewhere, potentially
+    changing the semantics of the operator expression as we can no longer move
+    from the operand.
+-   Both short-circuiting behavior and non-short-circuiting behavior will be
+    surprising and unintuitive to some. The short-circiuting option will
+    introduce control flow without a keyword to announce it, which goes against
+    our design decision to use a keyword for `and` and `or` to announce the
+    control flow. The non-short-circuiting option will evaluate subexpressions
+    unnecessarily, which creates a tension with our performance goal.
+-   Experienced C++ developers may expect a different behavior, such as
+    `a < b == cmp` comparing the result of `a < b` against the Boolean value
+    `cmp`.
+
+See also the ongoing discussion in
+[#451](https://github.com/carbon-language/carbon-lang/issues/451).
+
+### Convert operands like C++
+
+We could convert the operands of comparison operators in a way that's equivalent
+to C++'s behavior.
+
+Advantages:
+
+-   May ease migration from C++.
+-   May allow programmers to reuse some intuition, for example when comparing
+    floating-point values against integer values.
+-   Allows more efficient machine code to be generated for source code that
+    takes no special care about the types of comparison operands.
+-   Improves performance predictability for C++ developers unfamiliar with
+    Carbon's rules.
+
+Disadvantages:
+
+-   Produces incorrect results.
+-   Does not provide a simple syntax for correct mixed-type comparisons.
+
+### Provide a three-way comparison operator
+
+We could provide a symbol for three-way comparisons, such as C++20's `<=>`.
+
+Advantages:
+
+-   The use of a symbol rather than a named member of an interface for this
+    functionality may ease migration from C++20.
+
+Disadvantages:
+
+-   Reserves a symbol for an operation that should not be used directly except
+    in special circumstances, and that will produce a nuanced type even when
+    comparing standard Carbon types such as `f32`.
+
+### Allow comparisons as the operand of `not`
+
+We could permit comparisons to appear as the immediate operand of `not` without
+parentheses.
+
+Advantages:
+
+-   Provides an easier syntax for floating-point comparisons where the desired
+    result for a NaN operand is `True` rather than `False`: `not f < 5.0`.
+
+Disadvantages:
+
+-   Introduces ambiguity when comparing Boolean values: `not cond1 == cond2`
+    might intend to compare `not cond1` to `cond2` rather than comparing
+    `cond1 != cond2`.