Selaa lähdekoodia

Carbon/C++ Interop: Primitive Types proposal (#5448)

A proposal for Primitive Types mapping between Carbon and C++.

Part of #5263
Ivana Ivanovska 10 kuukautta sitten
vanhempi
sitoutus
44b2f60c90
1 muutettua tiedostoa jossa 452 lisäystä ja 0 poistoa
  1. 452 0
      proposals/p5448.md

+ 452 - 0
proposals/p5448.md

@@ -0,0 +1,452 @@
+# Carbon <-> C++ Interop: Primitive Types
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/5448)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+    -   [Data models](#data-models)
+    -   [Carbon Primitive Types](#carbon-primitive-types)
+    -   [C++ Fundamental Types](#c-fundamental-types)
+        -   [void](#void)
+        -   [std::nullptr_t](#stdnullptr_t)
+        -   [std::byte](#stdbyte)
+        -   [Character types](#character-types)
+        -   [Signed integer types](#signed-integer-types)
+        -   [Unsigned integer types](#unsigned-integer-types)
+        -   [Floating-point types](#floating-point-types)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [C++ -> Carbon mapping details](#c---carbon-mapping-details)
+    -   [Carbon -> C++ mapping details](#carbon---c-mapping-details)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+-   [Open questions](#open-questions)
+
+<!-- tocstop -->
+
+## Abstract
+
+Define the type mapping of the primitive types between Carbon and C++.
+
+## Problem
+
+Interoperability of Carbon with C++ is one of the Carbon language goals (see
+[Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)).
+Providing
+[unsurprising mappings between C++ and Carbon types](/docs/design/interoperability/philosophy_and_goals.md#unsurprising-mappings-between-c-and-carbon-types)
+is one of it's sub goals.
+
+This proposal addresses the type mapping between the two languages to support
+achieving this goal.
+
+## Background
+
+### Data models
+
+The following data models are widely accepted:
+
+-   32-bit systems:
+    -   `LP32` (Win16 API): `int` 16-bit; `long` 32-bit; `pointer` 32-bit.
+    -   `ILP32` (Win32 API; Unix and Unix-like systems): `int` 32-bit; `long`
+        32-bit; `pointer` 32-bit.
+-   64-bit systems:
+    -   `LLP64` (Win32 API: 64-bit ARM or x86-64): `int` 32-bit; `long` 32-bit;
+        `pointer` 64-bit.
+    -   `LP64` (Unix and Unix-like systems (Linux, macOS)): `int` 32-bit; `long`
+        64-bit; `pointer` 64-bit.
+
+[Carbon supported platforms](/docs/project/principles/success_criteria.md#modern-os-platforms-hardware-architectures-and-environments)
+
+Carbon will prioritize supporting modern OS, 64-bit little endian platforms (for
+example [LLP64](/proposals/p5448.md#data-models),
+[LP64](/proposals/p5448.md#data-models)). Historic platforms like
+[LP32](/proposals/p5448.md#data-models) won't be supported.
+
+For clarity, the text below omits [LP32](/proposals/p5448.md#data-models)
+relevant information and focuses only on the Carbon supported platforms.
+
+### Carbon Primitive Types
+
+Carbon has the following
+[primitive types](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/README.md#primitive-types):
+
+-   `bool`: boolean type taking `true` or `false`
+-   integer types:
+    -   signed integer types: `iN` (`N` - bit width, a positive multiple of 8)
+        -   `i8`, `i16`, `i32`, `i64`, `i128`, `i256`
+    -   unsigned integer types: `uN` (`N` - bit width, a positive multiple of 8)
+        -   `u8`, `u16`, `u32`, `u64`, `u128`, `u256`
+-   floating-point types: `fN` (`N` - bit width, a positive multiple of 8),
+    IEEE-754 format
+    -   `f16`, `f32`, and `f64` - always available
+    -   `f80`, `f128`, or `f256` may be available, depending on the platform
+
+### C++ Fundamental Types
+
+C++ calls the primitive types
+[fundamental types](https://en.cppreference.com/w/cpp/language/types). The
+following fundamental types exist in C++:
+
+-   `void`
+-   `std::nullptr_t`
+-   `std::byte`
+-   integral types (also integer types):
+    -   `bool`
+    -   character types:
+        -   narrow character types: `signed char`, `unsigned char`, `char`,
+            `char8_t` (c++20)
+        -   wide character types: `char16_t`, `char32_t`, `wchar_t`
+    -   signed integer types:
+        -   standard signed integer types: `signed char`, `short`, `int`,
+            `long`, `long long`
+        -   extended signed integer types (implementation-defined)
+    -   unsigned integer types:
+        -   standard unsigned integer types: `unsigned char`, `unsigned short`,
+            `unsigned int`, `unsigned long`, `unsigned long long`
+        -   extended unsigned integer types
+-   floating-point types:
+    -   standard floating-point types: `float`, `double`, `long double`
+    -   extended floating-point types:
+        -   fixed width floating-point types (since C++23): `float16_t`,
+            `float32_t`, `float64_t`, `float128_t`, `bfloat16_t`
+        -   other implementation-defined extended floating-point types
+
+#### void
+
+Objects of type `void` are not allowed, neither are arrays of `void`, nor
+references to `void`. Pointers to `void` and functions returning `void` are
+allowed.
+
+#### std::nullptr_t
+
+The type of `nullptr` (the null pointer literal). It's a distinct type that is
+not itself a pointer type.
+
+#### std::byte
+
+| Type        | Width in bits | Notes                                                                                                                  |
+| ----------- | ------------- | ---------------------------------------------------------------------------------------------------------------------- |
+| `std::byte` | 8-bit         | can be used to access raw memory, same as `unsigned char`, but it's not a character type and is not an arithmetic type |
+
+#### Character types
+
+| Type            | Width in bits                      | Notes                                                                                                                                                         |
+| --------------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `char`          | 8-bit                              | multibyte characters; same representation, alignment and signedness as either `signed char` or `unsigned char` (platform-dependent), but it's a distinct type |
+| `signed char`   | 8-bit                              | signed character representation                                                                                                                               |
+| `unsigned char` | 8-bit                              | unsigned character representation; raw memory access                                                                                                          |
+| `char8_t`       | 8-bit                              | UTF-8 character representation; same size, alignment and signedness as `unsigned char`, but a distinct type                                                   |
+| `char16_t`      | 16-bit                             | UTF-16 character representation; same size, alignment and signedness as `std::uint_least16_t`, but a distinct type                                            |
+| `char32_t`      | 32-bit                             | UTF-32 character representation; same size, alignment and signedness as `std::uint_least32_t`, but a distinct type                                            |
+| `wchar_t`       | 32-bit on Linux, 16-bit on Windows | wide character representation, holds UTF-32 on Linux and other non-Windows platforms, UTF-16 on Windows.                                                      |
+
+#### Signed integer types
+
+**Standard signed integer types**
+
+| Type          | Width in bits                                                                                     |
+| ------------- | ------------------------------------------------------------------------------------------------- |
+| `signed char` | 8-bit                                                                                             |
+| `short`       | 16-bit                                                                                            |
+| `int`         | 32-bit                                                                                            |
+| `long`        | [LLP64](/proposals/p5448.md#data-models): 32-bit; [LP64](/proposals/p5448.md#data-models): 64-bit |
+| `long long`   | 64-bit                                                                                            |
+
+**Exact-width integer types**
+
+Typically aliases of the standard integer types.
+
+| Type           | Width in bits | Defined as                                                                   |
+| -------------- | ------------- | ---------------------------------------------------------------------------- |
+| `std::int8_t`  | 8-bit         | `typedef signed char int8_t`                                                 |
+| `std::int16_t` | 16-bit        | `typedef signed short int16_t`                                               |
+| `std::int32_t` | 32-bit        | `typedef signed int int32_t`                                                 |
+| `std::int64_t` | 64-bit        | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long int64_t` |
+|                |               | [LP64](/proposals/p5448.md#data-models): `typedef signed long int64_t`       |
+
+**Fastest minimum-width integer types**
+
+Integer types that are usually fastest to operate with among all integer types
+that have the minimum specified width.
+
+| Type                | Width in bits | Defined as                                                                        |
+| ------------------- | ------------- | --------------------------------------------------------------------------------- |
+| `std::int_fast8_t`  | >=8-bit       | `typedef signed char int_fast8_t`                                                 |
+| `std::int_fast16_t` | >=16-bit      | implementation dependent                                                          |
+| `std::int_fast32_t` | >=32-bit      | implementation dependent                                                          |
+| `std::int_fast64_t` | >=64-bit      | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long int_fast64_t` |
+|                     |               | [LP64](/proposals/p5448.md#data-models): `typedef signed long int_fast64_t`       |
+
+**Minimum-width integer types**
+
+Smallest signed integer type with width of at least N-bits.
+
+| Type                 | Width in bits | Defined as                                                                         |
+| -------------------- | ------------- | ---------------------------------------------------------------------------------- |
+| `std::int_least8_t`  | >=8-bit       | `typedef signed char int_least8_t`                                                 |
+| `std::int_least16_t` | >=16-bit      | `typedef short int_least16_t`                                                      |
+| `std::int_least32_t` | >=32-bit      | `typedef int int_least32_t`                                                        |
+| `std::int_least64_t` | >=64-bit      | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long int_least64_t` |
+|                      |               | [LP64](/proposals/p5448.md#data-models): `typedef signed long int_least64_t`       |
+
+**Greatest-width integer types**
+
+Maximum-width signed integer type.
+
+| Type            | Width in bits | Defined as                                                                    |
+| --------------- | ------------- | ----------------------------------------------------------------------------- |
+| `std::intmax_t` | >=32-bit      | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long intmax_t` |
+|                 |               | [LP64](/proposals/p5448.md#data-models): `typedef signed long intmax_t`       |
+
+**Integer types capable of holding object pointers**
+
+Signed integer type, capable of holding any pointer.
+
+| Type            | Width in bits | Defined as                                                           |
+| --------------- | ------------- | -------------------------------------------------------------------- |
+| `std::intptr_t` | >=16-bit      | most platforms: `typedef long intptr_t`                              |
+|                 |               | some [ILP32](/proposals/p5448.md#data-models):`typedef int intptr_t` |
+
+**Other signed integer types**
+
+| Type        | Width in bits | Defined as                                        |
+| ----------- | ------------- | ------------------------------------------------- |
+| `ptrdiff_t` | >=16-bit      | most platforms: `typedef std::intptr_t ptrdiff_t` |
+|             |               | Holds the result of subtracting two pointers.     |
+
+#### Unsigned integer types
+
+The unsigned integer types have the same sizes as their
+[signed counterparts](/proposals/p5448.md#signed-integer-types).
+
+| Type     | Width in bits | Defined as                                 |
+| -------- | ------------- | ------------------------------------------ |
+| `size_t` | >=16-bit      | most platforms: `typedef uintptr_t size_t` |
+|          |               | Holds the result of the `sizeof` operator. |
+
+#### Floating-point types
+
+**Standard floating-point types**
+
+| Type          | Format                                                                                                              | Width in bits    | Note                                                                        |
+| ------------- | ------------------------------------------------------------------------------------------------------------------- | ---------------- | --------------------------------------------------------------------------- |
+| `float`       | usually [IEEE-754 binary32](https://en.wikipedia.org/wiki/Single-precision_floating-point_format)                   | 32-bits          | The format or the size can vary depending on the compiler and the platform. |
+| `double`      | usually [IEEE-754 binary64](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)                   | 64-bits          | The format or the size can vary depending on the compiler and the platform. |
+| `long double` | [IEEE-754 binary128](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format)                       | 128-bit          | used by some SPARC, MIPS, ARM64 implementations.                            |
+|               | [IEEE-754 binary64-extended format](https://en.wikipedia.org/wiki/Extended_precision)                               | 80-bit or 64-bit | 80-bit (most x86 and x86-64 implementations); 64-bit used by MSVC.          |
+|               | [`double-double`](https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic) | 128-bit          | used on PowerPC.                                                            |
+
+**Fixed-width floating-point types (C++23)**
+
+They aren’t aliases to the standard floating-point types (`float`, `double`,
+`long double`), but to an extended floating-point type.
+
+| Type              | Width in bits | Defined as                     |
+| ----------------- | ------------- | ------------------------------ |
+| `std::float16_t`  | 16-bit        | `using float16_t = _Float16`   |
+| `std::float32_t`  | 32-bit        | `using float32_t = _Float32`   |
+| `std::float64_t`  | 64-bit        | `using float64_t = _Float64`   |
+| `std::float128_t` | 128-bit       | `using float128_t = _Float128` |
+| `std::bfloat16_t` | 16-bit        |                                |
+
+## Proposal
+
+-   The C++ fixed-width integer types `intN_t` will be the same type as Carbon
+    integer types `iN`. Likewise for `uintN_t` <-> `uN`.
+-   A C++ `builtin type` will be available in Carbon as `Cpp.builtin_type`, for
+    the standard C++ signed/unsigned integer and floating-point types.
+-   A C++ integer `builtin type` that is not the same as `intN_t` or `uintN_t`
+    for any N, will be nameable in Carbon only as `Cpp.builtin_type`.
+-   Different C++ types will be considered different in Carbon, so C++ overload
+    resolution can be handled without issues.
+
+## Details
+
+The table of Carbon <-> C++ mappings is as follows:
+
+| Carbon                   | C++                                    |
+| ------------------------ | -------------------------------------- |
+| `()` as a return type    | `void`                                 |
+| `bool`                   | `bool`                                 |
+| `i8`                     | `int8_t`                               |
+| `i16`                    | `int16_t`                              |
+| `i32`                    | `int32_t`                              |
+| `i64`                    | `int64_t`                              |
+| `i128`                   | `int128_t`                             |
+| `u8`                     | `uint8_t`                              |
+| `u16`                    | `uint16_t`                             |
+| `u32`                    | `uint32_t`                             |
+| `u64`                    | `uint64_t`                             |
+| `u128`                   | `uint128_t`                            |
+| `Cpp.signed_char`        | `signed char`                          |
+| `Cpp.short`              | `short`                                |
+| `Cpp.int`                | `int`                                  |
+| `Cpp.long`               | `long`                                 |
+| `Cpp.long_long`          | `long long`                            |
+| `Cpp.unsigned_char`      | `unsigned char`                        |
+| `Cpp.unsigned_short`     | `unsigned short`                       |
+| `Cpp.unsigned_int`       | `unsigned int`                         |
+| `Cpp.unsigned_long`      | `unsigned long`                        |
+| `Cpp.unsigned_long_long` | `unsigned long long`                   |
+| `Cpp.float`              | `float`                                |
+| `Cpp.double`             | `double`                               |
+| `Cpp.long_double`        | `long double`                          |
+| `f16`                    | `std::float16_t (_Float16)`            |
+| `f128`                   | `std::float128_t (_Float128)`          |
+| TBD                      | `float32_t`, `float64_t`, `bfloat16_t` |
+| TBD                      | `char`, `charN_t`, `wchar_t`           |
+| TBD                      | `std::byte`                            |
+| TBD                      | `std::nullptr_t`                       |
+
+In addition to the exact mappings above, the following are expected to be the
+same type due to the different spellings of the types in C++ being the same:
+
+| Carbon type | C++ type                                |
+| ----------- | --------------------------------------- |
+| `i8`        | `signed char`                           |
+| `u8`        | `unsigned char`                         |
+| `i16`       | `short`                                 |
+| `u16`       | `unsigned short`                        |
+| `i32`       | `int`                                   |
+| `u32`       | `unsigned int`                          |
+| `i64`       | `long` or `long long`                   |
+| `u64`       | `unsigned long` or `unsigned long long` |
+
+### C++ -> Carbon mapping details
+
+-   C++ `intN_t` type will be considered the same type as Carbon's `iN` type.
+    Likewise for `uintN_t` <-> `uN`.
+-   C++ `builtin type` will be available in Carbon inside the `Cpp` namespace
+    under the name `Cpp.builtin_type`, for the standard signed/unsigned integer
+    and floating-point types.
+
+    -   The names will follow the pattern:
+
+        -   `Cpp.[unsigned_](long_long|long|int|short|double|float)`
+
+        that is signedness, then size keyword(s), then a type keyword only if
+        there are no size keywords. For example `Cpp.unsigned_int` not
+        `Cpp.unsigned`, `Cpp.long` not `Cpp.long_int`.
+
+    -   They will be available when an `import Cpp` declaration is present.
+
+    -   Name collision: This naming may cause name collisions if such a name
+        already exist in the unnamed C++ namespace. We consider this not to be a
+        common case and would not support such cases, for the benefit of having
+        the C++-specific stuff in the package `Cpp`.
+
+    -   `Cpp.builtin_type` will be the same type as `iN`/`uN`, if the
+        corresponding C++ `builtin type` is the same as `intN_t`/`uintN_t` on
+        that platform. Otherwise it will be available in Carbon as a new,
+        distinct type that is compatible with some of the `iN`/`uN` types. For
+        example:
+
+        -   If `int32_t` is the same type as `int`, then `Cpp.int` will be the
+            same type as `i32`.
+
+        -   If `int64_t` is the same type as `long`, then `Cpp.long` will be the
+            same type as `i64`. `Cpp.long_long` will be a different type,
+            compatible with `i64`.
+
+    -   `Cpp.float` and `Cpp.double` will be the same type as `f32` and `f64`
+        correspondingly.
+
+-   The type aliases `[u]int_fastN_t`, `[u]int_leastN_t`, `[u]intmax_t`,
+    `[u]intptr_t`, `ptrdiff_t` and `size_t` will be available in Carbon in the
+    `Cpp` namespace if the C++ header declaring them is imported (for example
+    `<stdint.h>`, `<cstdint>` etc), with names like `Cpp.[u]int_fastN_t`,
+    `Cpp.[u]int_leastN_t`, `Cpp.size_t` etc. No special support will be
+    provided.
+
+### Carbon -> C++ mapping details
+
+-   Same as above, Carbon `iN`/`uN` types will map to the C++ `intN_t`/`uintN_t`
+    types.
+-   `f32`/`f64` will map to `float`/`double` correspondingly.
+-   `f16`/`f128` will map to
+    `std::float16_t (_Float16)`/`std::float128_t (_Float128)` correspondingly.
+-   Some Carbon types may not have direct mappings in C++: `i256`, `u256` ,
+    `f80`, `f256`.
+
+## Rationale
+
+One of Carbon's goals is seamless interoperability with C++ (see
+[Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)),
+calling for clarity of the calls and high performance.
+
+The proposal maps the Carbon types to their direct equivalents in C++, with zero
+overhead, supporting the request for unsurprising mappings between C++ and
+Carbon types with high performance.
+
+## Alternatives considered
+
+Naming of new types:
+
+-   Allow all keyword permutations.
+    -   Reason not to do this: unnecessary and complicated.
+-   Only include the keywords, and provide some syntax for combining them (eg,
+    `Cpp.unsigned` & `Cpp.long` or `Cpp.unsigned(Cpp.long)`).
+    -   Reason to do this: avoids taking any identifiers from Cpp that are not
+        C++ keywords.
+    -   Reason not to do this: overly complicated.
+-   Use `Core.Cpp.T` instead of `Cpp.T`.
+    -   Reason to do this: avoid name collisions with C++ code.
+    -   Reason not to do this: The name collisions should not be a problem in
+        practice, and would prefer to keep C++-specific stuff in package Cpp.
+
+`long`
+
+-   `Cpp.long` and `Cpp.long_long` both map to Carbon types that are distinct
+    from `iN` for any `N`, but are compatible with either `i32` or `i64` as
+    appropriate.
+    -   Reason to not do this: unnecessary conversions and handling `long` and
+        `long long` differently than the other C++ types.
+-   Provide platform-dependent conversion functions for `long`.
+    -   Reason to do this: the conversions will be clearly outlined.
+    -   Reason not to do this: performance overhead for certain platforms.
+-   Map `long` always to a fixed-sized Carbon type depending on the platform
+    (for example to either `i32` or `i64`)
+    -   Reason to do this: all the code will be using fixed-sized types.
+    -   Reason not to do this: the same C++ function may map differently on
+        different platforms and the Carbon code should compensate for that to
+        make the code compile.
+
+`float32_t`, `float64_t`
+
+-   Map `f32` <-> `float32_t` and `f64` <-> `float64_t`
+    -   Reason to do this: follow the same analogy as for the integer types
+        (`iN` <-> `intN_t`)
+    -   Reason not to do this:
+        -   `float32_t`, `float64_t` are new types since C++23, so this won't be
+            directly achievable, but the corresponding `_FloatN` types will need
+            to be used for the older C++ versions.
+        -   they are not aliases for the standard floating-point types (`float`,
+            `double`, `long double`), but for extended floating-point types, so
+            type conversions will be needed for the standard types.
+
+## Open questions
+
+The mapping of the following types remains open and will be discussed at a later
+point:
+
+-   `char`, `char8_t`, `char16_t`, `char32_t`, `wchar_t`
+    -   Carbon still doesn't have character types, so the mapping of these types
+        will be discussed once they are available.
+    -   These are all distinct types in C++, which should be taken into account
+        to prevent any issues for overloading.
+-   `std::byte`
+-   `std::nullptr_t`
+-   `void*`
+-   `Cpp.long_double` - details of this new type is still to be discussed.
+-   `float32_t`, `float64_t`, `bfloat16_t`.