|
|
@@ -0,0 +1,318 @@
|
|
|
+# Numeric literal semantics
|
|
|
+
|
|
|
+<!--
|
|
|
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
|
|
|
+Exceptions. See /LICENSE for license information.
|
|
|
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
+-->
|
|
|
+
|
|
|
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/144)
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+<!-- toc -->
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+- [Problem](#problem)
|
|
|
+- [Background](#background)
|
|
|
+- [Proposal](#proposal)
|
|
|
+- [Details](#details)
|
|
|
+ - [Prelude support](#prelude-support)
|
|
|
+ - [Implicit conversions](#implicit-conversions)
|
|
|
+ - [Examples](#examples)
|
|
|
+- [Alternatives considered](#alternatives-considered)
|
|
|
+ - [Use an ordinary integer or floating-point type for literals](#use-an-ordinary-integer-or-floating-point-type-for-literals)
|
|
|
+ - [Use same type for all literals](#use-same-type-for-all-literals)
|
|
|
+ - [Allow leading `-` in literal tokens](#allow-leading---in-literal-tokens)
|
|
|
+
|
|
|
+<!-- tocstop -->
|
|
|
+
|
|
|
+## Problem
|
|
|
+
|
|
|
+When a numeric literal appears in a program, we need to understand its
|
|
|
+semantics:
|
|
|
+
|
|
|
+- What type does it have?
|
|
|
+- What value is produced by operations on it?
|
|
|
+- When can it validly be used to initialize an object?
|
|
|
+
|
|
|
+## Background
|
|
|
+
|
|
|
+In C++, numeric literals have either an integral type or a floating-point type.
|
|
|
+C++ provides permission for implementations to add extended integral types, but
|
|
|
+in practice (for bad reasons relating to `intmax_t`) implementations do not do
|
|
|
+so, so there are a small finite set of types that any given numeric literal
|
|
|
+might have:
|
|
|
+
|
|
|
+- `int`, `long`, `long long`, or `unsigned` versions of these
|
|
|
+- `float`, `double`, or `long double`
|
|
|
+
|
|
|
+The choice of type is determined solely by the literal.
|
|
|
+
|
|
|
+The C++ approach is error-prone and problematic:
|
|
|
+
|
|
|
+- Lossy conversions from literals in initializers are permitted.
|
|
|
+- Lossy operations on literals are permitted; for example, on a typical
|
|
|
+ implementation, `1 << 60` has value `0` because `1` is a 32-bit type.
|
|
|
+- Attempting to naturally express some values has undefined behavior; for
|
|
|
+ example, `int x = -2147483648;` typically results in undefined behavior even
|
|
|
+ when -2147483648 is a valid `int` value.
|
|
|
+- Integer literals with value 0 have special semantics that are lost when the
|
|
|
+ integer is passed to a function: "perfect" forwarding doesn't work for such
|
|
|
+ literals.
|
|
|
+- The built-in types are privileged: only the types listed above have
|
|
|
+ literals. There is no syntax for a 64-bit integer literal, only for (for
|
|
|
+ example) a `long int` literal, which may or may not 64 bits wide.
|
|
|
+- The type of a literal can be unpredictable in portable code, as it can
|
|
|
+ depend on which type a particular value happens to fit into.
|
|
|
+
|
|
|
+## Proposal
|
|
|
+
|
|
|
+Numeric literals have a type derived from their value, and can be converted to
|
|
|
+any type that can represent that value.
|
|
|
+
|
|
|
+Simple operations such as arithmetic that involve only literals also produce
|
|
|
+values of literal types.
|
|
|
+
|
|
|
+## Details
|
|
|
+
|
|
|
+Numeric literals have a type derived from their value. Two integer literals have
|
|
|
+the same type if and only if they represent the same integer. Two real number
|
|
|
+literals have the same type if and only if they represent the same real number.
|
|
|
+
|
|
|
+That is:
|
|
|
+
|
|
|
+- For every integer, there is a type representing literals with that integer
|
|
|
+ value.
|
|
|
+- For every rational number, there is a type representing literals with that
|
|
|
+ real value.
|
|
|
+- The types for real numbers are distinct from the types for integers, even
|
|
|
+ for real numbers that represent integers. `var x: i32 = 1.0;` is invalid.
|
|
|
+
|
|
|
+Primitive operators are available between numeric literals, and produce values
|
|
|
+with numeric literal types. For example, the type of `1 + 2` is the same as the
|
|
|
+type of `3`.
|
|
|
+
|
|
|
+Numeric types can provide conversions to support initialization from numeric
|
|
|
+literals. Because the value of the literal is carried in the type, a type-level
|
|
|
+decision can be made as to whether the conversion is valid.
|
|
|
+
|
|
|
+The integer types defined in the standard library permit conversion from integer
|
|
|
+literal types whose values are representable in the integer type. The
|
|
|
+floating-point types defined in the Carbon library permit conversion from
|
|
|
+integer and rational literal types whose values are between the minimum and
|
|
|
+maximum finite value representable in the floating-point type.
|
|
|
+
|
|
|
+### Prelude support
|
|
|
+
|
|
|
+The following types are defined in the Carbon prelude:
|
|
|
+
|
|
|
+- An arbitrary-precision integer type.
|
|
|
+
|
|
|
+ ```
|
|
|
+ class BigInt;
|
|
|
+ ```
|
|
|
+
|
|
|
+- A rational type, parameterized by a type used for its numerator and
|
|
|
+ denominator.
|
|
|
+
|
|
|
+ ```
|
|
|
+ class Rational(T:! Type);
|
|
|
+ ```
|
|
|
+
|
|
|
+ The exact constraints on `T` are not yet decided.
|
|
|
+
|
|
|
+- A type representing integer literals.
|
|
|
+
|
|
|
+ ```
|
|
|
+ class IntLiteral(N:! BigInt);
|
|
|
+ ```
|
|
|
+
|
|
|
+- A type representing floating-point literals.
|
|
|
+
|
|
|
+ ```
|
|
|
+ class FloatLiteral(X:! Rational(BigInt));
|
|
|
+ ```
|
|
|
+
|
|
|
+All of these types are usable during compilation. `BigInt` supports the same
|
|
|
+operations as `Int(n)`. `Rational(T)` supports the same operations as
|
|
|
+`Float(n)`.
|
|
|
+
|
|
|
+The types `IntLiteral(n)` and `FloatLiteral(x)` also support primitive integer
|
|
|
+and floating-point operations such as arithmetic and comparison, but these
|
|
|
+operations are typically heterogeneous: for example, an addition between
|
|
|
+`IntLiteral(n)` and `IntLiteral(m)` produces a value of type
|
|
|
+`IntLiteral(n + m)`.
|
|
|
+
|
|
|
+### Implicit conversions
|
|
|
+
|
|
|
+`IntLiteral(n)` converts to any sufficiently large integer type, as if by:
|
|
|
+
|
|
|
+```
|
|
|
+impl [template N:! BigInt, template M:! BigInt]
|
|
|
+ IntLiteral(N) as ImplicitAs(Int(M))
|
|
|
+ if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt {
|
|
|
+ ...
|
|
|
+}
|
|
|
+impl [template N:! BigInt, template M:! BigInt]
|
|
|
+ IntLiteral(N) as ImplicitAs(Unsigned(M))
|
|
|
+ if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt {
|
|
|
+ ...
|
|
|
+}
|
|
|
+```
|
|
|
+
|
|
|
+The above is for exposition purposes only; various parts of this syntax are not
|
|
|
+yet decided.
|
|
|
+
|
|
|
+Similarly, `IntLiteral(x)` and `FloatLiteral(x)` convert to any sufficiently
|
|
|
+large floating-point type, and produce the nearest representable floating-point
|
|
|
+value. Conversions in which `x` lies exactly half-way between two values are
|
|
|
+rejected, as
|
|
|
+[previously decided](/docs/design/lexical_conventions/numeric_literals.md#ties).
|
|
|
+Conversions in which `x` is outside the range of finite values of the
|
|
|
+floating-point type are also represented, rather than saturating to the finite
|
|
|
+range or producing an infinity.
|
|
|
+
|
|
|
+### Examples
|
|
|
+
|
|
|
+```carbon
|
|
|
+// This is OK: the initializer is of the integer literal type with value
|
|
|
+// -2147483648 despite being written as a unary `-` applied to a literal.
|
|
|
+var x: i32 = -2147483648;
|
|
|
+
|
|
|
+// This initializes y to 2^60.
|
|
|
+var y: i64 = 1 << 60;
|
|
|
+
|
|
|
+// This forms a rational literal whose value is one third, and converts it to
|
|
|
+// the nearest representable value of type `f64`.
|
|
|
+var z: f64 = 1.0 / 3.0;
|
|
|
+
|
|
|
+// This is an error: 300 cannot be represented in type `i8`.
|
|
|
+var c: i8 = 300;
|
|
|
+
|
|
|
+fn f[template T:! Type](v: T) {
|
|
|
+ var x: i32 = v * 2;
|
|
|
+}
|
|
|
+
|
|
|
+// OK: x = 2_000_000_000.
|
|
|
+f(1_000_000_000);
|
|
|
+
|
|
|
+// Error: 4_000_000_000 can't be represented in type `i32`.
|
|
|
+f(2_000_000_000);
|
|
|
+
|
|
|
+// No storage required for the bound when it's of integer literal type.
|
|
|
+struct Span(template T:! Type, template BoundT:! Type) {
|
|
|
+ var begin: T*;
|
|
|
+ var bound: BoundT;
|
|
|
+}
|
|
|
+
|
|
|
+// Returns 1, because 1.3 can implicitly convert to f32, even though conversion
|
|
|
+// to f64 might be a more exact match.
|
|
|
+fn G() -> i32 {
|
|
|
+ match (1.3) {
|
|
|
+ case _: f32 => { return 1; }
|
|
|
+ case _: f64 => { return 2; }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+// Can only be called with a literal 0.
|
|
|
+fn PassMeZero(_: IntLiteral(0));
|
|
|
+
|
|
|
+// Can only be called with integer literals in the given range.
|
|
|
+fn ConvertToByte[template N:! BigInt](_: IntLiteral(N)) -> i8
|
|
|
+ if N >= -128 and N <= 127 {
|
|
|
+ return N as i8;
|
|
|
+}
|
|
|
+
|
|
|
+// Given any int literal, produces a literal whose value is one higher.
|
|
|
+fn OneHigher(L: IntLiteral(template _:! BigInt)) -> auto {
|
|
|
+ return L + 1;
|
|
|
+}
|
|
|
+// Error: 256 can't be represented in type `i8`.
|
|
|
+var v: i8 = OneHigher(255);
|
|
|
+```
|
|
|
+
|
|
|
+## Alternatives considered
|
|
|
+
|
|
|
+### Use an ordinary integer or floating-point type for literals
|
|
|
+
|
|
|
+We could decide on a fixed-width type based on the form of the literal, for
|
|
|
+example using a type suffix with some rules to determine what type to pick for
|
|
|
+unsuffixed literals.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- This follows what C++ does.
|
|
|
+- Can determine the type of a floating-point number without requiring
|
|
|
+ contextual information.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Surprising behavior when applying an operator to a literal would result in
|
|
|
+ overflow. Even if we diagnose this, a diagnostic that `-2147483648` is
|
|
|
+ invalid because it overflows is surprising.
|
|
|
+- Creates additional literal syntax that users will need to understand.
|
|
|
+- May select types that don't match the programmer's expectations.
|
|
|
+- Whatever types we pick are privileged.
|
|
|
+
|
|
|
+### Use same type for all literals
|
|
|
+
|
|
|
+We could give literals a single, arbitrary-precision type (say, `Integer` for
|
|
|
+integer literals and `Rational` for real literals).
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- Only introduces two new types, not an unbounded parameterized family of
|
|
|
+ types.
|
|
|
+- Writing a function that takes any integer literal can be done with more
|
|
|
+ obvious syntax and less syntactic overhead. Instead of:
|
|
|
+ ```
|
|
|
+ fn OneHigher(L: IntLiteral(template _:! BigInt));
|
|
|
+ ```
|
|
|
+ we could write
|
|
|
+ ```
|
|
|
+ fn OneHigher(template L:! Integer);
|
|
|
+ ```
|
|
|
+ However, with this proposal, a function taking any integer expression that
|
|
|
+ can be evaluated to a constant can be written as
|
|
|
+ ```
|
|
|
+ fn F(template N:! BigInt);
|
|
|
+ ```
|
|
|
+ and such a function would accept all integer literals, as well as
|
|
|
+ non-literal constants.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Our mechanism for specifying the behavior of operations such as arithmetic
|
|
|
+ is based on interface implementations, which are looked up by type.
|
|
|
+ Supporting `impl` selection based on values would introduce substantial
|
|
|
+ complexity.
|
|
|
+- If we introduce an arbitrary-precision integer type, it would be
|
|
|
+ inconsistent to support it only during compilation. However, if we allow its
|
|
|
+ use at runtime, programs may use it accidentally, with an invisible
|
|
|
+ performance cost. For example, `var x: auto = 123;` would result in `x`
|
|
|
+ having an infinite-precision type, possibly involving invisible dynamic
|
|
|
+ allocation.
|
|
|
+ - Under this proposal, the type of `x` is a type that can only represent
|
|
|
+ the value `123`; as such, `x` is effectively immutable. The
|
|
|
+ arbitrary-precision integer type introduced in this proposal can only be
|
|
|
+ used explicitly by programs naming it.
|
|
|
+
|
|
|
+### Allow leading `-` in literal tokens
|
|
|
+
|
|
|
+We could treat a leading `-` character as part of a numeric literal token, so
|
|
|
+that -- for example -- `-123` would be a single `-123` token rather than a unary
|
|
|
+negation applied to a literal `123`.
|
|
|
+
|
|
|
+Advantages:
|
|
|
+
|
|
|
+- This would narrowly solve the problem that `INT_MIN` cannot be written
|
|
|
+ directly, without any of the other implications of this proposal.
|
|
|
+
|
|
|
+Disadvantages:
|
|
|
+
|
|
|
+- Makes the behavior of unary `-` less uniform.
|
|
|
+- Prevents the introduction of infix or postfix operators that bind more
|
|
|
+ tightly than unary `-`, such as an infix exponentiation operator: `-2**2`
|
|
|
+ may be expected to evaluate to -4, not to +4.
|