|
|
@@ -0,0 +1,705 @@
|
|
|
+# Arithmetic expressions
|
|
|
+
|
|
|
+<!--
|
|
|
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
|
|
|
+Exceptions. See /LICENSE for license information.
|
|
|
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
+-->
|
|
|
+
|
|
|
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/1083)
|
|
|
+
|
|
|
+<!-- toc -->
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+- [Problem](#problem)
|
|
|
+- [Background](#background)
|
|
|
+ - [Symbols](#symbols)
|
|
|
+ - [Semantics](#semantics)
|
|
|
+- [Proposal](#proposal)
|
|
|
+- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
|
|
|
+ - [Choice of operator symbols](#choice-of-operator-symbols)
|
|
|
+ - [Signed integer semantics](#signed-integer-semantics)
|
|
|
+- [Alternatives considered](#alternatives-considered)
|
|
|
+ - [Use a sufficiently wide result type to avoid overflow](#use-a-sufficiently-wide-result-type-to-avoid-overflow)
|
|
|
+ - [Guarantee that the program never proceeds with an incorrect value after overflow](#guarantee-that-the-program-never-proceeds-with-an-incorrect-value-after-overflow)
|
|
|
+ - [Guarantee that all integer arithmetic is two's complement](#guarantee-that-all-integer-arithmetic-is-twos-complement)
|
|
|
+ - [Treat overflow as an error but don't optimize on it](#treat-overflow-as-an-error-but-dont-optimize-on-it)
|
|
|
+ - [Don't let `Unsigned` arithmetic wrap](#dont-let-unsigned-arithmetic-wrap)
|
|
|
+ - [Provide separate wrapping types](#provide-separate-wrapping-types)
|
|
|
+ - [Do not provide an ordering or division for `uN`](#do-not-provide-an-ordering-or-division-for-un)
|
|
|
+ - [Give unary `-` lower precedence](#give-unary---lower-precedence)
|
|
|
+ - [Include a unary plus operator](#include-a-unary-plus-operator)
|
|
|
+ - [Floating-point modulo operator](#floating-point-modulo-operator)
|
|
|
+ - [Provide different division operators](#provide-different-division-operators)
|
|
|
+ - [Use different division and modulo semantics](#use-different-division-and-modulo-semantics)
|
|
|
+ - [Use different precedence groups for division and multiplication](#use-different-precedence-groups-for-division-and-multiplication)
|
|
|
+ - [Use the same precedence group for modulo and multiplication](#use-the-same-precedence-group-for-modulo-and-multiplication)
|
|
|
+ - [Use a different spelling for modulo](#use-a-different-spelling-for-modulo)
|
|
|
+- [Future work](#future-work)
|
|
|
+ - [Provide separate wrapping operators](#provide-separate-wrapping-operators)
|
|
|
+ - [Provide separate operations to detect overflow](#provide-separate-operations-to-detect-overflow)
|
|
|
+
|
|
|
+<!-- tocstop -->
|
|
|
+
|
|
|
+## Problem
|
|
|
+
|
|
|
+Carbon needs a set of arithmetic operators in order to perform basic
|
|
|
+calculations on various kinds of built-in and user-defined numbers and
|
|
|
+number-like values: integers, real numbers, complex numbers, vectors, matrices,
|
|
|
+and so on.
|
|
|
+
|
|
|
+## Background
|
|
|
+
|
|
|
+### Symbols
|
|
|
+
|
|
|
+Conventions for arithmetic operators have a very long tradition. The following
|
|
|
+symbols have common meaning across C, C++, Java, JavaScript, Rust, Swift, and
|
|
|
+many other languages:
|
|
|
+
|
|
|
+- `+` and `-` mean addition and subtraction, as in mathematics. Unary `-`
|
|
|
+ forms a negated value -- an additive inverse. Sometimes, unary `+` is
|
|
|
+ permitted, as a no-op.
|
|
|
+- `*` means multiplication, diverging from the use of '×', '⋅', or simply
|
|
|
+ juxtaposition in normal mathematical notation. However, this symbol does
|
|
|
+ have some visual similarity to '×'.
|
|
|
+- `/` means division, diverging from the use of '÷', fraction notation, or an
|
|
|
+ exponent of -1 in mathematics. However, this symbol does somewhat visually
|
|
|
+ resemble fractional notation that has "fallen over" to the left.
|
|
|
+- `%` means remainder, diverging from the use of the word "mod" in
|
|
|
+ mathematics, and, perhaps confusingly, resembling the '÷' division operator
|
|
|
+ from mathematics.
|
|
|
+
|
|
|
+### Semantics
|
|
|
+
|
|
|
+Efficient integer types often have finite bounds on the numbers they can
|
|
|
+represent, and we expect Carbon's to be no different. This presents a problem
|
|
|
+for operations that might produce values outside those bounds: what should the
|
|
|
+behavior of a primitive arithmetic operation be if the result cannot be
|
|
|
+represented? Moreover, some operations, such as division by zero, have no
|
|
|
+mathematically-defined result.
|
|
|
+
|
|
|
+Different languages take different approaches to this problem.
|
|
|
+
|
|
|
+- In C and C++, signed integer overflow and division by zero -- including
|
|
|
+ floating-point division by zero -- have undefined behavior, meaning that
|
|
|
+ there are no constraints on the behavior of a program that performs such a
|
|
|
+ calculation.
|
|
|
+- In [Rust](https://rust-lang.github.io/rfcs/0560-integer-overflow.html),
|
|
|
+ these conditions are classified as being errors, which, depending on various
|
|
|
+ factors, will either panic or give a two's complement result. Rust also
|
|
|
+ provides a `Wrapping<T>` type, where `T` is a signed or unsigned integer
|
|
|
+ type, that provides arithmetic with guaranteed two's complement wrapping
|
|
|
+ semantics.
|
|
|
+- In Swift, overflow triggers a runtime fatal error. Arithmetic operators can
|
|
|
+ be prefixed with `&`, such as `big_num &* other_big_num`, to request two's
|
|
|
+ complement wrapping behavior instead.
|
|
|
+- In Java, integer arithmetic has two's complement wrapping behavior. Division
|
|
|
+ by zero throws an exception.
|
|
|
+- In JavaScript, all arithmetic is (at least notionally) performed in IEEE 754
|
|
|
+ double-precision floating-point, so all operations have defined results,
|
|
|
+ although some integer operations may produce non-integer results, such as
|
|
|
+ infinities or NaNs, and some produce incorrect integer results. For example,
|
|
|
+ in JavaScript, `100000001 * 100000001` evaluates to `10000000200000000` not
|
|
|
+ to `10000000200000001`.
|
|
|
+- In LLVM, the result of overflow is a poison value, which results in
|
|
|
+ undefined behavior only when it is observed, for example by a branch, and
|
|
|
+ otherwise propagates to dependent values. This permits speculative execution
|
|
|
+ of arithmetic that might overflow.
|
|
|
+
|
|
|
+## Proposal
|
|
|
+
|
|
|
+Carbon will provide the five usual binary arithmetic operators -- `+`, `-`, `*`,
|
|
|
+`/`, and `%` -- with their usual semantics. A unary minus `-` operator will also
|
|
|
+be provided, but no unary plus `+` operator.
|
|
|
+
|
|
|
+| Notation | Meaning |
|
|
|
+| -------- | ------------------ |
|
|
|
+| `-a` | Negation |
|
|
|
+| `a + b` | Addition |
|
|
|
+| `a - b` | Subtraction |
|
|
|
+| `a * b` | Multiplication |
|
|
|
+| `a / b` | Division |
|
|
|
+| `a % b` | Modulo / remainder |
|
|
|
+
|
|
|
+These operators follow the same precedence rules as in C-family languages --
|
|
|
+multiplicative operators bind tighter than additive operators, and negation
|
|
|
+binds tighter than multiplicative operators. Unlike in other C-family languages,
|
|
|
+no precedence is defined between `%` and other binary operators, so
|
|
|
+`a + b % c * d` is an error, and does not mean `a + (b % c) * d` as it would in
|
|
|
+C++.
|
|
|
+
|
|
|
+Unlike in C++, lossy conversions are never performed. Instead, built-in
|
|
|
+arithmetic is not permitted unless one of the operand types can represent all
|
|
|
+the values of the other operand. The calculation is performed in that operand
|
|
|
+type; there is no implicit widening to Carbon's equivalent of `int`.
|
|
|
+
|
|
|
+Unsigned integer types `uN` model arithmetic modulo 2<sup>`N`</sup>, and we
|
|
|
+strongly advise that they are not used except when those semantics are desired,
|
|
|
+such as in random number generation, cryptography, hashing, and similar domains.
|
|
|
+
|
|
|
+For signed integer types `iN`, it is a programming error if overflow occurs. We
|
|
|
+guarantee that in development build modes such errors will result in a runtime
|
|
|
+trap, and that in hardened build modes the result will either be a trap or the
|
|
|
+two's complement value.
|
|
|
+
|
|
|
+Initially, no support will be provided for wrapping signed integer arithmetic,
|
|
|
+nor for non-wrapping unsigned integer arithmetic. This can be added by a future
|
|
|
+proposal if we find there is sufficient demand for either.
|
|
|
+
|
|
|
+Floating-point types use IEEE 754 semantics, with round-to-nearest rounding
|
|
|
+mode, no signaling NaNs, and no floating-point exceptions.
|
|
|
+
|
|
|
+This proposal takes no position on whether the `+` operator is supported on
|
|
|
+strings to perform concatenation.
|
|
|
+
|
|
|
+See the changes to the design for more details; the rest of this proposal will
|
|
|
+focus on the rationale and alternatives.
|
|
|
+
|
|
|
+## Rationale based on Carbon's goals
|
|
|
+
|
|
|
+- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem)
|
|
|
+ - Treating overflow as a programming error empowers static and dynamic
|
|
|
+ analysis tools to distinguish arithmetic bugs from intentional
|
|
|
+ wraparound.
|
|
|
+- [Performance-critical software](/docs/project/goals.md#performance-critical-software)
|
|
|
+ - Allowing the optimizer to assume that signed integer overflow does not
|
|
|
+ occur in performance build modes allows certain important loop
|
|
|
+ optimizations to fire that would otherwise be incorrect.
|
|
|
+ - Avoiding extending into larger types when performing arithmetic on small
|
|
|
+ operands aids in the ability to vectorize.
|
|
|
+- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution)
|
|
|
+ - Each subexpression of an arithmetic expression is given a specific type,
|
|
|
+ rather than being computed in a type of sufficient width for any
|
|
|
+ possible value, in order to make it simple to factor out subexpressions.
|
|
|
+- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write)
|
|
|
+ - Using the same operators as other languages, with largely the same
|
|
|
+ semantics as C++, should aid readability and writeability especially for
|
|
|
+ those with less Carbon-specific knowledge.
|
|
|
+ - Performing built-in arithmetic in the larger operand type and refusing
|
|
|
+ cases where a lossy conversion would be performed in C++ reduces the
|
|
|
+ scope for confusion and surprises.
|
|
|
+- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms)
|
|
|
+ - Treating signed integer overflow as an error condition, and having it
|
|
|
+ produce a runtime error in some build modes, will assist with certain
|
|
|
+ forms of testing and with analysis of program behavior. For example,
|
|
|
+ fuzz testing can be used to locate overflow bugs, with confidence that
|
|
|
+ any overflow detected is unintentional.
|
|
|
+- [Modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments)
|
|
|
+ - The choice of division and modulo semantics aims to provide fast
|
|
|
+ execution and small code size on modern hardware architectures.
|
|
|
+- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)
|
|
|
+ - Using the same operator set as C++ may aid migration of C++ developers
|
|
|
+ and make code using interop between Carbon and C++ easier to follow.
|
|
|
+ - Making similar choices for the behavior of signed and unsigned types
|
|
|
+ will make a correct but unidiomatic conversion of C++ into Carbon
|
|
|
+ easier; however, in idiomatic Carbon, signed types are expected to be
|
|
|
+ used more frequently.
|
|
|
+
|
|
|
+### Choice of operator symbols
|
|
|
+
|
|
|
+We want Carbon to provide a close analogue of standard mathematical expression
|
|
|
+notation, with some unsurprising set of operators. While accessibility to
|
|
|
+beginners is important, the most important audience for which the operators set
|
|
|
+should be unsurprising is programmers with C++ or Rust experience. The choice of
|
|
|
+`+`, `-`, `*`, `/`, and `%` is ubiquitous among programming languages, and any
|
|
|
+significant change to this symbol set for the benefit of programming novices
|
|
|
+with a mathematical background would be a detriment to those with a background
|
|
|
+in other programming languages.
|
|
|
+
|
|
|
+Therefore we choose the conventional symbol set.
|
|
|
+
|
|
|
+### Signed integer semantics
|
|
|
+
|
|
|
+Carbon prioritizes fast and predictable performance above all other factors.
|
|
|
+Allowing an optimizer to assume that certain forms of arithmetic do not result
|
|
|
+in overflow can improve program performance and allow the generation of smaller
|
|
|
+code. Moreover, while in general we only want to provide the means for accessing
|
|
|
+the fastest possible implementation of an operation, we expect integer
|
|
|
+arithmetic to be so ubiquitous that it's important that the most efficient
|
|
|
+option be the default option.
|
|
|
+
|
|
|
+At the same time, it's important that Carbon code can be used in safety-critical
|
|
|
+scenarios where unbounded undefined behavior on integer arithmetic is
|
|
|
+problematic, so in a hardened build mode, we provide stronger guarantees.
|
|
|
+
|
|
|
+## Alternatives considered
|
|
|
+
|
|
|
+### Use a sufficiently wide result type to avoid overflow
|
|
|
+
|
|
|
+We could define that integer operators always produce a sufficiently wide result
|
|
|
+type such that overflow never occurs, and defer all overflow handling --
|
|
|
+checking for out-of-bounds values, wrapping around, or a programmer assertion
|
|
|
+that overflow simply doesn't happen -- until the end of the computation or some
|
|
|
+explicit step.
|
|
|
+
|
|
|
+For example:
|
|
|
+
|
|
|
+```
|
|
|
+fn F(a: i32, b: i32, c: i32) {
|
|
|
+ // a * b is computed in i63,
|
|
|
+ // ... + c is computed in i64.
|
|
|
+ // =! is an assertion that the value is in-range.
|
|
|
+ let d: i32 =! a * b + c;
|
|
|
+
|
|
|
+ // Same, but =% says to wrap around.
|
|
|
+ let e: i32 =% a * b + c;
|
|
|
+
|
|
|
+ // If the value doesn't fit in the specified type, pattern-matching fails.
|
|
|
+ let f: i32 = a * b + c else { return; }
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+We could put a limit on how large an intermediate type can be, and reject if it
|
|
|
+would require a larger intermediate operand than the largest we can efficiently
|
|
|
+support.
|
|
|
+
|
|
|
+This approach seems quite promising, but close inspection finds a number of
|
|
|
+non-trivial issues.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- No implicit undefined behavior or incorrect results on overflow.
|
|
|
+- Forces developer to think about the possibility of overflow and write down
|
|
|
+ what they intend to happen.
|
|
|
+- If the final assignment is `=%` or `=!`, all intermediate arithmetic other
|
|
|
+ than division and remainder can be done in the result type, avoiding the
|
|
|
+ need for wide computations.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- This approach is novel and it's unknown whether developers would accept its
|
|
|
+ ergonomic burden.
|
|
|
+- Decreases the uniformity of the model. While we can generalize `=!` to mean
|
|
|
+ "assign assuming the RHS fits into the LHS", there doesn't seem to be any
|
|
|
+ good generalization of `=%` to other situations and types.
|
|
|
+- The largest type for which we can truly efficiently perform arithmetic is
|
|
|
+ `i64` / `u64`. While 128-bit arithmetic is typically available on our target
|
|
|
+ architectures, use of it will often be less efficient and may increase
|
|
|
+ register pressure. Calculations as simple as `(a + b) / c` may be rejected
|
|
|
+ if the operands are already in the largest efficient type.
|
|
|
+- Operations such as negation and division can increase the width of the
|
|
|
+ operand: `-i32.Min` and `i32.Min / -1` don't fit in `i32`. The following
|
|
|
+ approaches to this problem were considered:
|
|
|
+ - Remove the `Min` value from `iN` types, so the negative and positive
|
|
|
+ range are identical. However, this would severely violate programmer
|
|
|
+ expectations, for example in some important bit-manipulation cases.
|
|
|
+ - Give integer types a range of values rather than simply a bit-width. For
|
|
|
+ example, we can say that negation on `i32` produces a type that can
|
|
|
+ represent [-2<sup>31</sup>+1, 2<sup>31</sup>], which still fits in 32
|
|
|
+ bits. However, this would add significant complexity to the type system,
|
|
|
+ and with this approach, division would still increase the bit width: for
|
|
|
+ example, `a / b`, where `a` and `b` are `iN`s, has 2<sup>`N`</sup>+1
|
|
|
+ distinct possible values. This is especially surprising because integer
|
|
|
+ division is usually expected to make a number smaller!
|
|
|
+- Refactoring code becomes more challenging, as the appropriate intermediate
|
|
|
+ type must be determined. Mitigating this, the type system would inform the
|
|
|
+ programmer when they make a mistake.
|
|
|
+- The `!` and `%` annotations become quite viral: they would be needed not
|
|
|
+ only in initialization and assignment, but likely also in `case`, `return`,
|
|
|
+ and other initialization contexts not using `=`. As an alternative, the
|
|
|
+ annotation could be put on the outermost arithmetic operator, but that is
|
|
|
+ likely to be syntactically awkward, especially in unparenthesized
|
|
|
+ expressions such as `a + b +% c`.
|
|
|
+- It will likely become idiomatic in many communities to either always use `%`
|
|
|
+ or always use `!`.
|
|
|
+ - Always using `!` means that the developers see no benefit compared to
|
|
|
+ the behavior as proposed here, and nonetheless pay an ergonomic cost.
|
|
|
+ - Always using `%` means that we lose any ability to distinguish between
|
|
|
+ intentional wraparound and overflow, making a class of bugs that would
|
|
|
+ otherwise be easily detectible be undetectable without making the
|
|
|
+ program behavior correct.
|
|
|
+- Division by zero is still not handled.
|
|
|
+
|
|
|
+### Guarantee that the program never proceeds with an incorrect value after overflow
|
|
|
+
|
|
|
+We could say that overflow errors always result in program termination, possibly
|
|
|
+with the permission for the compiler to give the mathematically correct result
|
|
|
+instead if it so chooses.
|
|
|
+
|
|
|
+This would result in slower and larger code being generated in a lot of cases.
|
|
|
+Some optimizations would still be possible if the optimizer could ensure that it
|
|
|
+only increased the set of cases for which the correct result is given, but
|
|
|
+current optimizers are not well-suited to perform that task.
|
|
|
+
|
|
|
+Given Carbon's emphasis on performance, this approach is rejected without
|
|
|
+prejudice until we have a demonstration that no important performance metric is
|
|
|
+harmed by it.
|
|
|
+
|
|
|
+### Guarantee that all integer arithmetic is two's complement
|
|
|
+
|
|
|
+Instead of making overflow a programming error, we could define it as two's
|
|
|
+complement. This is, for example, the approach taken by Java.
|
|
|
+
|
|
|
+The major problem with this approach is that it makes erroneous wrapping and
|
|
|
+intentional wrapping indistinguishable. This would make finding such bugs much
|
|
|
+harder for readers of the code, and all but impossible for static and dynamic
|
|
|
+analysis tools. Making overflow issues programming errors allows problems to be
|
|
|
+caught earlier and more reliably.
|
|
|
+
|
|
|
+### Treat overflow as an error but don't optimize on it
|
|
|
+
|
|
|
+We could follow Rust and say that overflow is an error, but that we promise
|
|
|
+we'll either catch it or give two's complement semantics. This approach is
|
|
|
+currently rejected for the same reason we reject
|
|
|
+[guaranteeing we catch all cases where we can't give a correct result](#guarantee-that-the-program-never-proceeds-with-an-incorrect-value-after-overflow).
|
|
|
+
|
|
|
+### Don't let `Unsigned` arithmetic wrap
|
|
|
+
|
|
|
+We could treat `Unsigned(N)` like `Integer(N)`, and make it an error by default
|
|
|
+if it overflows. However, this doesn't seem like a great fit for the problem
|
|
|
+domain.
|
|
|
+
|
|
|
+There are, broadly speaking, two different classes of use cases we want to
|
|
|
+support:
|
|
|
+
|
|
|
+- Cases where the developer wants a number. They might have some expectation
|
|
|
+ of the range of the number -- eg, non-negative, or strictly positive, or
|
|
|
+ between -1 and 100 -- or they might just want a number and not have explicit
|
|
|
+ bounds. We expect `iN` to be used for all such cases, and do not want to
|
|
|
+ treat the non-negative cases as a distinct and special type.
|
|
|
+- Cases where the developer wants the ring ℤ/2<sup>n</sup>ℤ of integers modulo
|
|
|
+ 2<sup>n</sup>, for example in cryptography, hashing, or random number
|
|
|
+ generation.
|
|
|
+
|
|
|
+The second class is certainly rarer than the first, but contains many important
|
|
|
+use cases.
|
|
|
+
|
|
|
+The typical arguments for using an unsigned type for the first class of use
|
|
|
+cases are:
|
|
|
+
|
|
|
+- Better expression of developer intent. It is generally preferable to make
|
|
|
+ invalid states unrepresentable, and if negative numbers are invalid, then
|
|
|
+ unsigned types are better suited than signed types. However, supporting this
|
|
|
+ use case with the same types used to support the wrapping use cases results
|
|
|
+ in a situation where one side or the other has to make compromises. The
|
|
|
+ alternative would be to have three different kinds of type: signed,
|
|
|
+ unsigned, and modulo. But in that setup, it's not clear that the value added
|
|
|
+ by unsigned types is worthwhile. Also, it's common to want to take the
|
|
|
+ difference of such unsigned quantities, and it's generally preferable for
|
|
|
+ such subtractions to produce a negative result rather than a subtle bug.
|
|
|
+ Moreover, while a restriction to non-negative values is common, supporting
|
|
|
+ only the case of a range restriction to [0, 2<sup>N</sup>-1], but not any
|
|
|
+ other range, does not do a good job of addressing the general desire to
|
|
|
+ capture intent and to make invalid states unrepresentable.
|
|
|
+- Ability to reduce storage size. Spending a sign bit every time a number is
|
|
|
+ stored, even when it's known to be non-negative is wasteful. This is an
|
|
|
+ important concern, and one we should address, but it's thought to be better
|
|
|
+ to address this by annotating a field with a description of how it should be
|
|
|
+ packed -- such as in a bit-field -- rather than by changing the type of the
|
|
|
+ data and possibly the behavior of operations on it.
|
|
|
+
|
|
|
+Additionally, providing unsigned types for which overflow is an error introduces
|
|
|
+a much larger risk of that error state being inadvertently reached than for
|
|
|
+signed types, because calculations typically involve numbers that are close to
|
|
|
+zero, which is far from overflowing in signed types but near to overflowing in
|
|
|
+unsigned types. For example, a calculation such as `a - b + c` may be known to
|
|
|
+always produce a non-negative result, and the developer may be tempted to use
|
|
|
+non-negative non-wrapping types for `a`, `b`, and `c`, but doing so introduces
|
|
|
+the risk that the intermediate `a - b` calculation is negative. By contrast,
|
|
|
+overflow would only occur in a signed calculation if the numbers involved were
|
|
|
+very large.
|
|
|
+
|
|
|
+### Provide separate wrapping types
|
|
|
+
|
|
|
+We could provide distinct types for wrapping versus non-wrapping use cases,
|
|
|
+independent of the choice of signedness. This approach is being followed by
|
|
|
+Rust.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- Avoids coupling two decisions that are logically independent.
|
|
|
+- Allows developer intent to be expressed more explicitly, in the case where
|
|
|
+ the intent is a non-negative but non-wrapping integer.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Twice as many kinds of integer types, likely meaning twice as many `impl`s
|
|
|
+ need to be defined for each operation, twice as many overloads in each
|
|
|
+ overload set, and so on.
|
|
|
+- The presence of unsigned types for which overflow is an error has a greater
|
|
|
+ risk of inadvertent overflow, as described in the previous section.
|
|
|
+- Less familiar to those coming from C++.
|
|
|
+
|
|
|
+### Do not provide an ordering or division for `uN`
|
|
|
+
|
|
|
+For arithmetic purposes, we treat `uN` as the integers modulo 2<sup>N</sup>.
|
|
|
+There is no single canonical total order for that ring, and because
|
|
|
+multiplication by even numbers loses information by discarding the high-order
|
|
|
+bit, not all non-zero numbers have
|
|
|
+[multiplicative inverses](https://en.wikipedia.org/wiki/Modular_multiplicative_inverse),
|
|
|
+and so division is not well-defined. We could be more mathematically pure by
|
|
|
+refusing to implement `<` and friends for `uN`, and similarly refusing to
|
|
|
+support `/`.
|
|
|
+
|
|
|
+However, the `uN` types exist as much for pragmatic purposes as for mathematical
|
|
|
+ones. Making the choice to treat all `uN` values as non-negative is somewhat
|
|
|
+arbitrary, but is both unsurprising to those coming from C++ and useful for the
|
|
|
+cases where some ordering is desired.
|
|
|
+
|
|
|
+### Give unary `-` lower precedence
|
|
|
+
|
|
|
+In this proposal, `- a * b` is interpreted as `(-a) * b`. We could instead
|
|
|
+interpret it as `- (a * b)`.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- Can be argued as better following mathematical convention.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Unfamiliar to people coming from most other programming languages, including
|
|
|
+ C++.
|
|
|
+- If we followed our normal precedence partial ordering rule, this would mean
|
|
|
+ that `a * -b` is invalid, because `*` has higher precedence than `-`.
|
|
|
+
|
|
|
+Comparison to other languages:
|
|
|
+
|
|
|
+- C and C++ give all unary operators (including unary `-` and unary `+`)
|
|
|
+ higher precedence than any binary operator.
|
|
|
+- Go, Rust, and Swift give unary `-` higher precedence than multiplication.
|
|
|
+- Python binds unary `-` more tightly than multiplication but less tightly
|
|
|
+ than exponentiation, so `- a * b` is `(- a) * b` but `- a ** b` is
|
|
|
+ `- (a ** b)`.
|
|
|
+- Haskell gives unary `-` the same precedence as binary `-` and rejects
|
|
|
+ `a * - b`.
|
|
|
+
|
|
|
+### Include a unary plus operator
|
|
|
+
|
|
|
+C and C++ include a unary `+` operator. In principle, this operator permits
|
|
|
+lists of numbers to be written with an operator attached to each:
|
|
|
+
|
|
|
+```cpp
|
|
|
+int arr[] = {
|
|
|
+ -100,
|
|
|
+ -50,
|
|
|
+ +20,
|
|
|
+ +400
|
|
|
+};
|
|
|
+```
|
|
|
+
|
|
|
+... but in practice this use case is rare at best, and unary `+` in C++ is
|
|
|
+instead mainly used to coerce operands to prvalues of built-in types. For
|
|
|
+example, `auto *p = +[]{ /*...*/ };` might be used to create a function pointer
|
|
|
+(`auto` deduction would fail without the unary `+` operator), and
|
|
|
+`min(+Class::static_member, x)` might be used to force `Class::static_member` to
|
|
|
+be loaded, to avoid requiring the static member to be defined.
|
|
|
+
|
|
|
+Such usage of `+` is a design wart that we need not replicate. If we want a
|
|
|
+mechanism to decay an operand, we can pick a better name for it than `+`.
|
|
|
+
|
|
|
+### Floating-point modulo operator
|
|
|
+
|
|
|
+It would be possible and sometimes useful to support the `%` operator for
|
|
|
+floating-point types.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- When desired, `f1 % f2` would be a more concise notation than a call to an
|
|
|
+ `fmod` function (however it is named).
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Uses of this operation would likely be rare and unfamiliar enough that
|
|
|
+ people would assume an integer operation is being performed when they see
|
|
|
+ the operator.
|
|
|
+- This operation is not available in hardware in many modern architectures,
|
|
|
+ and providing it as a built-in operator may create a false impression of its
|
|
|
+ implementation as a non-trivial library function.
|
|
|
+- We lack sufficient evidence of utility to propose it at this time.
|
|
|
+
|
|
|
+### Provide different division operators
|
|
|
+
|
|
|
+We could follow Python3 and provide an `/` operator for integers that produces a
|
|
|
+floating-point (or perhaps rational) type. This would be mathematically clean:
|
|
|
+ignoring overflow and precision loss, all standard properties for division would
|
|
|
+be maintained. Or we could follow Haskell and refuse to provide an `/` operator
|
|
|
+between integers on the basis that such division is not mathematically defined
|
|
|
+for that type. Or we could provide a division operator that produces a pair of
|
|
|
+dividend and modulo, or an `Optional(Int)` that is absent whenever the division
|
|
|
+is inexact. In each case, functionality not available through operators could be
|
|
|
+provided with named functions instead.
|
|
|
+
|
|
|
+All of these options are likely to be surprising to programmers coming from C++,
|
|
|
+to a level that outweighs the perceived benefit.
|
|
|
+
|
|
|
+### Use different division and modulo semantics
|
|
|
+
|
|
|
+There are multiple somewhat-reasonable ways to define division operations for
|
|
|
+signed integers (whether we provide those operations as operators or library
|
|
|
+functions). Assuming operator notation for now, and that we define modulo as
|
|
|
+`a % b == a - a / b * b`, the following options all have merit:
|
|
|
+
|
|
|
+<!-- prettier-ignore -->
|
|
|
+| Property | Round towards zero (truncating division) | Round towards negative infinity (floor division) | Round based on sign of divisor\[1] (Euclidean division) |
|
|
|
+| ------------------------------------ | -------- | -------- | --------- |
|
|
|
+| `(-a) / b ==`<br>` a / (-b)` | :+1: Yes | :+1: Yes | No |
|
|
|
+| `(-a) / b ==`<br>` -(a / b)` | :+1: Yes | No | :+1: Yes |
|
|
|
+| `a / (-b) ==`<br>` -(a / b)` | :+1: Yes | No | No |
|
|
|
+| `(a + k * b) / b`<br>` == a / b + k` | No | :+1: Yes | :+1: Yes |
|
|
|
+| Sign of `a % b` | Same as `a` | :+1: Same as `b` | :+1: Never negative |
|
|
|
+| x86 instruction? | :+1: Yes: `cqo` (or similar) + `idiv` | First option + fixup:<br> `s * (a / (s * b))` where `s` is `sign(a) * sign(b)`[2] | First option + fixup:<br>`a / b - (a % b < 0)` |
|
|
|
+| LLVM IR + optimization support | :+1: Yes | No | No |
|
|
|
+| Use in existing languages | C, C++, Rust, Swift <br> `quotRem` in Haskell | `//` and `%` in Python <br> `/` in Python 2 only <br> `divMod` in Haskell | None? |
|
|
|
+
|
|
|
+The cells marked :+1: suggest generally desirable properties. For further
|
|
|
+reading, see
|
|
|
+[this Microsoft Research paper](http://research.microsoft.com/pubs/151917/divmodnote-letter.pdf).
|
|
|
+
|
|
|
+Our options here are as follows:
|
|
|
+
|
|
|
+- Pick one of the two interpretations as the meaning of `/`, and (optionally)
|
|
|
+ pick one of the above interpretations as the meaning of `%`); perhaps
|
|
|
+ provide the others as library functions or as additional operators.
|
|
|
+- Do not provide any of these operators for integers and provide only named
|
|
|
+ functions.
|
|
|
+
|
|
|
+Note that we are not required to provide a `%` that is consistent with our
|
|
|
+chosen `/` operator. (We could pick truncate-towards-zero for `/` and
|
|
|
+truncate-towards-negative-infinity for `%`, for example.) There is
|
|
|
+long-established tradition here, but it's unclear to what extent practicing
|
|
|
+programmers really care about the relationship between `/` and `%`.
|
|
|
+
|
|
|
+It is likely that most Carbon code that performs division and modulo between
|
|
|
+signed integer types does not actually care about what happens when either
|
|
|
+operand is negative. Therefore, following our goals of supporting
|
|
|
+high-performance code and current CPU architectures, we will choose to implement
|
|
|
+`/` and `%` as division with truncation towards zero. The other variants can be
|
|
|
+provided by a library function, if at all.
|
|
|
+
|
|
|
+[1]: That is: for positive divisors, round towards negative infinity; for
|
|
|
+negative divisors, round towards positive infinity.
|
|
|
+
|
|
|
+[2]: Here, `sign(a)` is 1 if `a >= 0` and is -1 if `a < 0`.
|
|
|
+
|
|
|
+### Use different precedence groups for division and multiplication
|
|
|
+
|
|
|
+Under this proposal, division and multiplication are in the same precedence
|
|
|
+group, and are left-associative: `a * b / c * d` is `((a * b) / c) / d`, and not
|
|
|
+`(a * b) / (c * d)` or some other grouping.
|
|
|
+
|
|
|
+It's not feasible to provide a different interpretation here due to the risk of
|
|
|
+confusing developers migrating from other languages. However, we could reject
|
|
|
+such expressions and require explicit parentheses.
|
|
|
+
|
|
|
+While the value of accepting code such as this is relatively low, the fact that
|
|
|
+both existing programming languages and common mathematical education treat
|
|
|
+multiplication and division as the same, left-associative, precedence group
|
|
|
+means that it's unlikely to be a significant burden to expect Carbon developers
|
|
|
+to remember this precedence rule.
|
|
|
+
|
|
|
+### Use the same precedence group for modulo and multiplication
|
|
|
+
|
|
|
+In most languages with the set of arithmetic operators discussed in this
|
|
|
+proposal, `%` is considered a multiplicative operator, and so a sequence of `*`,
|
|
|
+`/`, and `%` operators is processed left-to-right. In some sense this is
|
|
|
+reasonable: `%` is notionally performing a division, after all. Moreover, in a
|
|
|
+code search, I was unable to find evidence that it's common for precedence
|
|
|
+errors with `%` to be checked in to source control, and C++ compilers don't have
|
|
|
+warnings for mixing `%` with `+` without parentheses.
|
|
|
+
|
|
|
+Giving `%` and `/` the same precedence also allows some kinds of code to be
|
|
|
+written symmetrically:
|
|
|
+
|
|
|
+```c++
|
|
|
+char two_digit_number[] = {
|
|
|
+ '0' + m / 10,
|
|
|
+ '0' + m % 10,
|
|
|
+ 0
|
|
|
+};
|
|
|
+char four_digit_number[] = {
|
|
|
+ '0' + n / 1000,
|
|
|
+ '0' + n / 100 % 10,
|
|
|
+ '0' + n / 10 % 10,
|
|
|
+ '0' + n % 10,
|
|
|
+ 0
|
|
|
+};
|
|
|
+```
|
|
|
+
|
|
|
+With minimal parentheses, that example would be written as follows under this
|
|
|
+proposal:
|
|
|
+
|
|
|
+```carbon
|
|
|
+var two_digit_number: Array(Char, 3) = (
|
|
|
+ '0' + m / 10,
|
|
|
+ '0' + (m % 10),
|
|
|
+ 0
|
|
|
+);
|
|
|
+var four_digit_number: Array(Char, 5) = (
|
|
|
+ '0' + n / 1000,
|
|
|
+ '0' + ((n / 100) % 10),
|
|
|
+ '0' + ((n / 10) % 10),
|
|
|
+ '0' + (n % 10),
|
|
|
+ 0
|
|
|
+);
|
|
|
+```
|
|
|
+
|
|
|
+We could use the same precedence rule as other languages, and permit examples
|
|
|
+similar to the above to be written without any parentheses.
|
|
|
+
|
|
|
+However, our [rule for precedence](p0555.md#when-to-add-precedence-edges) is:
|
|
|
+
|
|
|
+> For every combination of operators, either it should be reasonable to expect
|
|
|
+> most or all developers who regularly use Carbon to reliably remember the
|
|
|
+> precedence, or there should not be a precedence rule.
|
|
|
+
|
|
|
+It is not clear that a precedence rule that gives a defined meaning to, for
|
|
|
+example, `a + b % c + d` would satisfy this rule. Moreover, in mathematics, the
|
|
|
+"mod" operator generally binds very loosely: in 'n = m + 1 (mod 5)', the '(mod
|
|
|
+5)' applies to the entire equality, certainly not to the '1'.
|
|
|
+
|
|
|
+Therefore, we do not give modulo higher precedence than addition, and require
|
|
|
+parentheses when mixing the two. This decision should be revisited if it is a
|
|
|
+significant source of friction in practice.
|
|
|
+
|
|
|
+### Use a different spelling for modulo
|
|
|
+
|
|
|
+We could use a different spelling for the modulo operation. For example, we
|
|
|
+could spell it as `mod`.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- The percent sign has no relation to modulo or remainder. The only known
|
|
|
+ justification for this particular choice of symbol is that it resembles the
|
|
|
+ '÷' symbol; however, that symbol means division, not remainder.
|
|
|
+- Would free up the `%` symbol for other uses that may be more prevalent than
|
|
|
+ modulo. However, we would need to be cautious when adding any alternative
|
|
|
+ uses to avoid confusion for people who expect `%` to mean modulo.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- This would be unfamiliar to developers coming from C++ and other languages
|
|
|
+ with similar operator sets.
|
|
|
+- There is no other common established symbol for this operation. Using a
|
|
|
+ keyword such as `mod` would break our loose convention of using keywords for
|
|
|
+ non-overloadable operators and operator symbols for overloadable operators.
|
|
|
+ Using a function would substantially increase the verbosity of certain kinds
|
|
|
+ of code.
|
|
|
+
|
|
|
+## Future work
|
|
|
+
|
|
|
+### Provide separate wrapping operators
|
|
|
+
|
|
|
+We could provide distinct operators with wrapping semantics for types where
|
|
|
+overflow is normally a programming error. For example, Swift provides
|
|
|
+[overflow operators](https://docs.swift.org/swift-book/LanguageGuide/AdvancedOperators.html#ID37)
|
|
|
+`&+`, `&-`, and `&*` for this purpose.
|
|
|
+
|
|
|
+This proposal takes no position on whether this would be useful. However, given
|
|
|
+that in this proposal, unsigned types already have wrapping semantics, the
|
|
|
+pressure to provide operators for signed types with those semantics is somewhat
|
|
|
+reduced.
|
|
|
+
|
|
|
+### Provide separate operations to detect overflow
|
|
|
+
|
|
|
+It would be useful to provide a way to perform an arithmetic operation if
|
|
|
+possible, and to provide a separate codepath to handle the case where the
|
|
|
+arithmetic would have overflowed. For example, we could imagine the Carbon
|
|
|
+standard library providing a facility such as:
|
|
|
+
|
|
|
+```
|
|
|
+fn AddWithOverflow[N:! BigInt](a: Integer(N), b: Integer(N)) -> Optional(Integer(N));
|
|
|
+```
|
|
|
+
|
|
|
+While this would undoubtedly be useful, this proposal provides no facility for
|
|
|
+this operation.
|