|
|
@@ -0,0 +1,187 @@
|
|
|
+# C++ Interop: Mapping `std::string_view` to `Core.Str`
|
|
|
+
|
|
|
+<!--
|
|
|
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
|
|
|
+Exceptions. See /LICENSE for license information.
|
|
|
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
|
+-->
|
|
|
+
|
|
|
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6177)
|
|
|
+
|
|
|
+<!-- toc -->
|
|
|
+
|
|
|
+## Table of contents
|
|
|
+
|
|
|
+- [Abstract](#abstract)
|
|
|
+- [Problem](#problem)
|
|
|
+- [Background](#background)
|
|
|
+- [Proposal](#proposal)
|
|
|
+- [Details](#details)
|
|
|
+- [Rationale](#rationale)
|
|
|
+- [Alternatives considered](#alternatives-considered)
|
|
|
+ - [1. Provide a Wrapper Type](#1-provide-a-wrapper-type)
|
|
|
+ - [2. Do Nothing](#2-do-nothing)
|
|
|
+
|
|
|
+<!-- tocstop -->
|
|
|
+
|
|
|
+## Abstract
|
|
|
+
|
|
|
+This proposal defines a direct, zero-cost mapping between C++'s
|
|
|
+`std::string_view` and Carbon's `Core.Str` for C++ interoperability. The goal is
|
|
|
+to make C++ APIs that use `std::string_view` feel native and seamless when used
|
|
|
+from Carbon. This mapping relies on the two types having an identical memory
|
|
|
+representation, a condition that we will work to ensure across all supported
|
|
|
+platforms.
|
|
|
+
|
|
|
+## Problem
|
|
|
+
|
|
|
+Seamless interoperability with C++ is a core goal for Carbon. `std::string_view`
|
|
|
+has become a ubiquitous, fundamental type in modern C++ for passing non-owning
|
|
|
+string data. Without a direct mapping, Carbon developers would be forced to work
|
|
|
+with ABI-incompatible wrapper types or manually unpack `std::string_view`
|
|
|
+instances into pointers and sizes.
|
|
|
+
|
|
|
+This creates significant friction:
|
|
|
+
|
|
|
+1. **Ergonomics:** C++ APIs would not feel idiomatic. Developers would need to
|
|
|
+ perform constant, boilerplate conversions.
|
|
|
+2. **Performance:** Any wrapper-based solution would break the zero-cost
|
|
|
+ abstraction principle, potentially introducing overhead at the boundary.
|
|
|
+3. **Adoption:** The lack of seamless integration for such a basic type would
|
|
|
+ be a significant barrier to migrating or integrating C++ codebases.
|
|
|
+
|
|
|
+To provide a truly smooth migration path and interoperability story, Carbon must
|
|
|
+treat `std::string_view` as a first-class citizen, ideally as the same type as
|
|
|
+its native string view.
|
|
|
+
|
|
|
+## Background
|
|
|
+
|
|
|
+Carbon's `Core.Str` is, per
|
|
|
+[#5969](https://github.com/carbon-language/carbon-lang/issues/5969), the
|
|
|
+language's fundamental non-owning view of a sequence of bytes. It is
|
|
|
+[currently implemented as a pair of a pointer to the data and a 64-bit integer representing the size in bytes](https://github.com/carbon-language/carbon-lang/blob/7c13bddc92be8ceac758189df76ebbb048e1a9d5/core/prelude/types/string.carbon#L19-L22).
|
|
|
+The assumed memory layout is the pointer followed by the size.
|
|
|
+
|
|
|
+C++'s `std::string_view` serves the same purpose. However, its memory layout is
|
|
|
+not standardized and varies between standard library implementations. This is
|
|
|
+critical for ABI compatibility.
|
|
|
+
|
|
|
+The layouts for major C++ standard library implementations are as follows:
|
|
|
+
|
|
|
+| Standard Library | Platform/Compiler | Member Order | Size Type (`size_t`) | Notes | Source |
|
|
|
+| ---------------- | -------------------- | -------------------- | -------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
|
+| **libc++** | Clang (macOS, etc.) | `__data_`, `__size_` | 64-bit (on 64-bit) | Pointer first, then size. | [`string_view`](https://github.com/llvm/llvm-project/blob/fd5bc6033e521b946f04cb9c473d9cca3da2da9b/libcxx/include/string_view#L711-L712) |
|
|
|
+| **MSVC STL** | Microsoft Visual C++ | `_Mydata`, `_Mysize` | 64-bit (on 64-bit) | Pointer first, then size. | [`__msvc_string_view.hpp`](https://github.com/microsoft/STL/blob/ba64eaaa8592c700949f3c09a0d8570b932828f5/stl/inc/__msvc_string_view.hpp#L1924-L1925) |
|
|
|
+| **libstdc++** | GCC (Linux, etc.) | `_M_len`, `_M_str` | 64-bit (on 64-bit) | **Size first, then pointer.** | [`string_view`](https://github.com/gcc-mirror/gcc/blob/6b999bf40090f356c5bb5ff8a82e7e0dc4c4ae05/libstdc%2B%2B-v3/include/std/string_view#L590-L591) |
|
|
|
+
|
|
|
+As the table shows, there is a key difference in member ordering, with
|
|
|
+`libstdc++` being the outlier compared to the assumed layout of `Core.Str`.
|
|
|
+
|
|
|
+## Proposal
|
|
|
+
|
|
|
+We propose to map C++ `std::string_view` directly to Carbon's `Core.Str` when
|
|
|
+importing C++ headers. This means the Carbon compiler will treat
|
|
|
+`std::string_view` as a type alias for `Core.Str` at the ABI level.
|
|
|
+
|
|
|
+To achieve this, the following conditions must be met:
|
|
|
+
|
|
|
+1. **Identical Representation:** The memory layout (sequence of fields, size,
|
|
|
+ and alignment) of `std::string_view` and `Core.Str` must be identical for
|
|
|
+ the target platform and C++ standard library.
|
|
|
+2. **Platform-Wide Compatibility:** The ultimate goal is for this mapping to
|
|
|
+ work seamlessly across all Carbon-supported architectures.
|
|
|
+
|
|
|
+Initially, this direct mapping will be enabled only for targets where the
|
|
|
+representation is known to match. For other platforms, we will pursue one of two
|
|
|
+strategies:
|
|
|
+
|
|
|
+- Work with standard library vendors (for example, `libstdc++`) to align on a
|
|
|
+ consistent representation for `std::string_view` across architectures,
|
|
|
+ falling back to providing a patched version of these libraries if necessary.
|
|
|
+- Adapt the memory layout of Carbon's `Core.Str` on a per-platform basis to
|
|
|
+ match the target's native `std::string_view` ABI.
|
|
|
+
|
|
|
+This ensures that while the initial implementation may be constrained, the
|
|
|
+long-term design is for universal, zero-cost compatibility.
|
|
|
+
|
|
|
+## Details
|
|
|
+
|
|
|
+The initial implementation will assume `Core.Str` has a `(pointer, size)`
|
|
|
+layout. This means the direct mapping will work out-of-the-box on platforms
|
|
|
+using `libc++` (Clang) and MSVC STL.
|
|
|
+
|
|
|
+For platforms using `libstdc++`, the current `(size, pointer)` layout is
|
|
|
+incompatible. The direct mapping will be disabled on these platforms by default
|
|
|
+until compatibility is achieved. Our strategy is to first align `Core.Str`'s
|
|
|
+layout with the dominant `(pointer, size)` convention used by Clang and MSVC. We
|
|
|
+will then engage with the `libstdc++` community to explore standardizing this
|
|
|
+layout for better cross-compiler compatibility.
|
|
|
+
|
|
|
+Furthermore, `Core.Str` is defined with a 64-bit size field. C++
|
|
|
+`std::string_view` uses `size_t`, which is 32-bit on 32-bit targets. Therefore,
|
|
|
+this direct mapping will initially be restricted to 64-bit targets, which are
|
|
|
+Carbon's primary focus.
|
|
|
+
|
|
|
+It is essential to recognize that both `Core.Str` and `std::string_view` are
|
|
|
+fundamentally views over bytes, not Unicode characters. This proposal maintains
|
|
|
+that semantic alignment. If Carbon requires a string type that understands
|
|
|
+character boundaries, it should be a separate, distinct type in the standard
|
|
|
+library and not interfere with this fundamental C++ interoperability mechanism.
|
|
|
+
|
|
|
+## Rationale
|
|
|
+
|
|
|
+This proposal directly supports the following Carbon goals:
|
|
|
+
|
|
|
+- **Interoperability with and migration from existing C++ code:**
|
|
|
+ `std::string_view` is one of the most common types in modern C++ interface
|
|
|
+ design. A seamless mapping is not a luxury but a requirement for effective
|
|
|
+ interoperability.
|
|
|
+- **Performance-critical software:** By ensuring a direct, zero-cost mapping,
|
|
|
+ we avoid any performance penalties at the C++/Carbon boundary for string
|
|
|
+ data. This is critical for systems programming where such overhead is
|
|
|
+ unacceptable.
|
|
|
+- **Code that is easy to read, understand, and write:** A direct mapping
|
|
|
+ allows developers to think of `std::string_view` and `Core.Str` as the same
|
|
|
+ concept, reducing cognitive load and eliminating the need for manual
|
|
|
+ conversions.
|
|
|
+- **Naming of `Core.Str`:** The choice of `Str` over `String` for the
|
|
|
+ non-owning view type is intentional. It avoids confusion with owning types
|
|
|
+ like C++'s `std::string`. `Str` is introduced as a new term of art for
|
|
|
+ Carbon, providing a concise and readable name for this fundamental type. The
|
|
|
+ shorter name is preferred for its clarity and reduced verbosity, especially
|
|
|
+ for a type that will be used frequently. This is based on the decision in
|
|
|
+ leads question
|
|
|
+ [#5969](https://github.com/carbon-language/carbon-lang/issues/5969).
|
|
|
+
|
|
|
+By defining a clear path toward universal ABI compatibility for this type, we
|
|
|
+are building a solid foundation for deep and performant integration with the
|
|
|
+existing C++ ecosystem.
|
|
|
+
|
|
|
+## Alternatives considered
|
|
|
+
|
|
|
+### 1. Provide a Wrapper Type
|
|
|
+
|
|
|
+We could choose to always import `std::string_view` as an opaque Carbon struct,
|
|
|
+for example, `Core.Cpp.string_view`.
|
|
|
+
|
|
|
+- **Advantages:** This would be ABI-safe on all platforms immediately.
|
|
|
+- **Disadvantages:** This approach is not seamless. It would require explicit
|
|
|
+ conversions between `Core.Str` and `Core.Cpp.string_view`, adding
|
|
|
+ boilerplate and potential performance overhead. It violates the goal of
|
|
|
+ making C++ APIs feel native to Carbon.
|
|
|
+
|
|
|
+### 2. Do Nothing
|
|
|
+
|
|
|
+We could leave it to the developer to manually handle `std::string_view` by
|
|
|
+accepting it as an opaque type and using C++ helper functions to extract the
|
|
|
+pointer and size.
|
|
|
+
|
|
|
+- **Advantages:** Simplest to implement in the compiler.
|
|
|
+- **Disadvantages:** This provides a terrible developer experience and runs
|
|
|
+ directly counter to Carbon's core goal of excellent C++ interoperability. It
|
|
|
+ would make using a vast number of modern C++ libraries prohibitively
|
|
|
+ difficult.
|
|
|
+
|
|
|
+The proposed approach of a direct mapping is superior as it prioritizes the
|
|
|
+long-term goals of performance and ergonomics, even if it requires a phased
|
|
|
+implementation to achieve full platform support.
|