Carbon needs operations for working with bit representations of values.
C++ provides four bitwise operations for Boolean algebra: complement (~), and
(&), or (|), and xor (^). These are all useful in bit-manipulation code
(although ^ is used substantially less than the others). In addition, C++
provides two bit-shift operators << and >> that can perform three different
operations: left shift, arithmetic right shift, and logical right shift. The
meaning of >> is determined by the signedness of the first operand.
C and Swift use the same set of operators as C++. Rust uses most of the same
operators, but uses ! instead of ~ for complement, unifying it with the
logical not operator, which is spelled not in Carbon and as ! in Rust and
C++. Go uses most of the same operators as C++, but uses unary prefix ^
instead of ~ for complement, mirroring binary ^ for xor.
In addition to the operators provided by C and C++, bit-rotate operators are present in some languages, and a short notation for them may be convenient. Finally, there are other non-degenerate binary Boolean operations with no common operator symbol:
~a | b).a | ~b).The behavior of shift operators in C++ had a turbulent past. The behavior of shift operators has always been undefined if the right operand is out of range -- not between zero inclusive and the bit-width of the left operator exclusive -- but other restrictions have varied:
There is a clear trend towards defining more cases, following two's complement rules.
Use the same operator set as C++, but replace ~ with unary prefix ^.
Define the behavior for all cases of << and >> except where the right
operand is either negative or is greater than or equal to the bit-width of the
left operand.
See changes to the design, and in particular the new section on bitwise and shift operators.
<< and >> when the right-hand operand
is out of range, we can directly use hardware instructions for these
operations whose behavior in these cases vary between architectures.<< and >> as programming errors when
the right hand operand is out of range, but Carbon guarantees that such
errors will not directly result in unbounded misbehavior in hardened
builds.~
operator is mechanically replaced with ^. This change is expected to
be easy for programmers to accommodate, especially given that Rust's
choice to replace ~ with ! does not seem to be a source of sustained
complaints.The operator syntax for bitwise operators was decided in #545. Some of the specific alternatives considered are discussed below.
We considered using various multi-character spellings for the bitwise and, or, xor, and complement operators:
&:, |:, ^:, ~:.&., .|., .^., .~..*., .+., .!=., .!./\, \/, (+), -|bitand, bitor, bitxor, complThe advantage of switching to such a set of operators is that this would free up
the single-character &, |, ^, and ~ tokens for other uses that may occur
more frequently in Carbon programs. We have some candidate uses for these
operators already:
& is used for combining interfaces and as a unary address-of operator.| may be useful for sum types, for alternatives in patterns, or as another
kind of bracket as in
Ruby's lambda notation.~ may be useful as a destructive move notation.^ may be useful as a postfix pointer dereference operator.Other motivations for switching to a different set of spellings include:
< and << are visually similar, analogous to
& and &&.&& and other punctuation based logical
operators and towards keywords like and. Bitwise operators could similarly
switch to keywords like bitand.However, moving substantially away from the C++ operator set comes with a set of concerns:
< and <<, the
contexts in which these operators are used are sufficiently different to
avoid any serious concerns.and instead of && doesn't apply for
bitwise operators: the logical operator represents control flow.
Separating logical and bitwise "and" operations more visibly seems
especially important because of this control flow semantic difference.
Without any control flow and with the keywords being significantly longer
for bitwise operations, the above considerations were the dominant ones that
led us to stick with familiar & and | spellings.We considered omitting the ^ operator, providing this functionality in some
other way, such as a named function or an xor keyword, while keeping the &
and | symbols for bitwise operations. We could take a similar approach for the
complement operation, such as by using a compl keyword. The primary motivation
was to avoid spending two precious operator characters on relatively uncommon
operations. However, we did not want to apply the same change to & and |,
and it seemed important to maintain consistency between the three binary bitwise
operators from C++.
Using ^ for both operations provides some of the benefits here, allowing us to
reclaim ~, without introducing the inconsistency that would result from using
keywords.
~, or some other symbol, for complementWe could follow C++ and use ~ as the complement operator. However, using ~
for this purpose spends a precious operator character on a relatively uncommon
operation, and ~ is often visually confusible with -, creating the potential
for readability problems. Also, in C++, ~ is overloaded to also refer to
destruction, and we may want to use the same notation for destruction or
destructive move semantics in Carbon.
We found ^ to be a satisfying alternative with a good rationale and mnemonic:
^ is a bit-flipping operator -- a ^ b flips the bits in a that are
specified in b -- and complement is an operator that flips all the bits.
^a is equivalent to a ^ n, where n is the all-one-bits value in the type
of a.
We also considered using ! for complement, like Rust does. Unlike in Rust,
this would not be a generalization of ! on bool, because we use not for
bool negation, and repurposing ! in this way compared to C++ seemed
confusing.
We could provide different operators for arithmetic right shift and logical right shift. This might allow programmers to better express their intent. However, it seems unnecessary, as using the type of the left operand is a strategy that doesn't appear to have caused significant problems in practice in the languages that have followed it.
Basing the kind of shift on the signedness of the left operand also follows from viewing a negative number as having an infinite number of leading 1 bits, which is the underlying mathematical model behind the two's complement representation.
We could provide bitwise rotation operators. However, there doesn't seem to be a sufficient need to justify adding another operator symbol for this purpose.
Logically, the behavior of shifts is meaningful for all values of the second operand:
Put another way, we can view the bits of the first operand as an N-bit window into an infinite sequence of bits, with infinitely many leading sign bits (all zeroes for an unsigned value) and infinitely many trailing zero bits after a notional binary point, and a shift moves that window around. Or equivalently, a shift is always a multiplication by 2N followed by rounding and wrapping.
We could provide the correct result for all shifts, regardless of the magnitude of the second operand. This is the approach taken by Python, except that Python rejects negative shift counts. The primary reason we do not do this is lack of hardware support. For example, x86 does not have an instruction to perform this operation. Rather, x86 provides shift instructions that mask off all bits of the second operand except for the bottom 5 or 6, meaning that a left shift of a 64-bit operand by 64 will return the operand unchanged rather than producing zero.
We could instead provide x86-like behavior, guaranteeing to consider only the
lowest N bits of the second operand when the first operand is an iN or uN.
This is the approach taken by Java for its 32-bit int type and 64-bit long
type, where the second operand is taken modulo 32 or 64, respectively, and in
JavaScript, where operands of bitwise and bit-shift operators are treated as
32-bit integers and the second operand of a shift is taken modulo 32. This
approach would provide an operation that can be implemented by a single
instruction on x86 platforms when N is 32 or 64, and for all smaller types and
for all other platforms the operation can be implemented with two instructions:
a mask and a shift. For larger types, single-instruction support may not be
available, but nonetheless the performance will be close to optimal, requiring
at most one additional mask. There is still some performance cost in some cases,
but the primary reason we do not do this is the same reason we choose to not
define signed integer overflow: this masked result is unlikely to be the value
that the developer actually wanted.
Instead of the above options, Carbon treats a second operand that is not in the interval [0, N) as a programming error, just like signed integer overflow:
i32 shift to be implemented by an x86 64-bit
shift that will produce 0 if the second operand is in [32, 63) but that will
treat a second operand of, say, 64 or -64 the same as 0.We considered various ways to support
var a: i32 = ...;
var b: i32 = 1 << a;
var c: i32 = 1234 >> a;
with no explicit type specified for the first operand of a bit-shift operator. We considered the following options:
We could find a way to defer picking the type in which the operation is
performed until later. For example, we could treat 1 << a as a value of a
new type that carries its left-hand operand as a type parameter and its
right-hand operand as runtime state, and allow that type to be converted in
the same way as its integer constant. However, this would introduce
substantial complexity: reasonable and expected uses such as
var mask: u32 = (1 << a) - 1;
would require a second new type for a shifted value plus an offset, and
general support would require a facility analogous to
expression templates.
Further, this facility would allow implicit conversions that notionally
overflow, such as would happen in the above example when a is greater
than 32.
In the absence of a good approach, we disallow such conversions for now. The above example can be written as:
var a: i32 = ...;
var b: i32 = (1 as i32) << a;
var c: i32 = (1234 as i32) >> a;
We view an integer constant has having infinitely many high-order sign bits followed by some number of lower-order value bits. As a consequence, the complement of a positive integer constant is negative. As a result, some important forms of initialization use a negative integer constant initializer for an unsigned type:
// Initializer here is the integer value -8.
var mask: u32 = ^7;
We considered some options for handling this:
On balance, our preferred option was to permit implicit conversions from negative literals to unsigned types so long as we only discard sign bits.