std::string_view to Core.StrThis proposal defines a direct, zero-cost mapping between C++'s
std::string_view and Carbon's Core.Str for C++ interoperability. The goal is
to make C++ APIs that use std::string_view feel native and seamless when used
from Carbon. This mapping relies on the two types having an identical memory
representation, a condition that we will work to ensure across all supported
platforms.
Seamless interoperability with C++ is a core goal for Carbon. std::string_view
has become a ubiquitous, fundamental type in modern C++ for passing non-owning
string data. Without a direct mapping, Carbon developers would be forced to work
with ABI-incompatible wrapper types or manually unpack std::string_view
instances into pointers and sizes.
This creates significant friction:
To provide a truly smooth migration path and interoperability story, Carbon must
treat std::string_view as a first-class citizen, ideally as the same type as
its native string view.
Carbon's Core.Str is, per
#5969, the
language's fundamental non-owning view of a sequence of bytes. It is
currently implemented as a pair of a pointer to the data and a 64-bit integer representing the size in bytes.
The assumed memory layout is the pointer followed by the size.
C++'s std::string_view serves the same purpose. However, its memory layout is
not standardized and varies between standard library implementations. This is
critical for ABI compatibility.
The layouts for major C++ standard library implementations are as follows:
| Standard Library | Platform/Compiler | Member Order | Size Type (size_t) |
Notes | Source |
|---|---|---|---|---|---|
| libc++ | Clang (macOS, etc.) | __data_, __size_ |
64-bit (on 64-bit) | Pointer first, then size. | string_view |
| MSVC STL | Microsoft Visual C++ | _Mydata, _Mysize |
64-bit (on 64-bit) | Pointer first, then size. | __msvc_string_view.hpp |
| libstdc++ | GCC (Linux, etc.) | _M_len, _M_str |
64-bit (on 64-bit) | Size first, then pointer. | string_view |
As the table shows, there is a key difference in member ordering, with
libstdc++ being the outlier compared to the assumed layout of Core.Str.
We propose to map C++ std::string_view directly to Carbon's Core.Str when
importing C++ headers. This means the Carbon compiler will treat
std::string_view as a type alias for Core.Str at the ABI level.
To achieve this, the following conditions must be met:
std::string_view and Core.Str must be identical for
the target platform and C++ standard library.Initially, this direct mapping will be enabled only for targets where the representation is known to match. For other platforms, we will pursue one of two strategies:
libstdc++) to align on a
consistent representation for std::string_view across architectures,
falling back to providing a patched version of these libraries if necessary.Core.Str on a per-platform basis to
match the target's native std::string_view ABI.This ensures that while the initial implementation may be constrained, the long-term design is for universal, zero-cost compatibility.
The initial implementation will assume Core.Str has a (pointer, size)
layout. This means the direct mapping will work out-of-the-box on platforms
using libc++ (Clang) and MSVC STL.
For platforms using libstdc++, the current (size, pointer) layout is
incompatible. The direct mapping will be disabled on these platforms by default
until compatibility is achieved. Our strategy is to first align Core.Str's
layout with the dominant (pointer, size) convention used by Clang and MSVC. We
will then engage with the libstdc++ community to explore standardizing this
layout for better cross-compiler compatibility.
Furthermore, Core.Str is defined with a 64-bit size field. C++
std::string_view uses size_t, which is 32-bit on 32-bit targets. Therefore,
this direct mapping will initially be restricted to 64-bit targets, which are
Carbon's primary focus.
It is essential to recognize that both Core.Str and std::string_view are
fundamentally views over bytes, not Unicode characters. This proposal maintains
that semantic alignment. If Carbon requires a string type that understands
character boundaries, it should be a separate, distinct type in the standard
library and not interfere with this fundamental C++ interoperability mechanism.
This proposal directly supports the following Carbon goals:
std::string_view is one of the most common types in modern C++ interface
design. A seamless mapping is not a luxury but a requirement for effective
interoperability.std::string_view and Core.Str as the same
concept, reducing cognitive load and eliminating the need for manual
conversions.Core.Str: The choice of Str over String for the
non-owning view type is intentional. It avoids confusion with owning types
like C++'s std::string. Str is introduced as a new term of art for
Carbon, providing a concise and readable name for this fundamental type. The
shorter name is preferred for its clarity and reduced verbosity, especially
for a type that will be used frequently. This is based on the decision in
leads question
#5969.By defining a clear path toward universal ABI compatibility for this type, we are building a solid foundation for deep and performant integration with the existing C++ ecosystem.
We could choose to always import std::string_view as an opaque Carbon struct,
for example, Core.Cpp.string_view.
Core.Str and Core.Cpp.string_view, adding
boilerplate and potential performance overhead. It violates the goal of
making C++ APIs feel native to Carbon.We could leave it to the developer to manually handle std::string_view by
accepting it as an opaque type and using C++ helper functions to extract the
pointer and size.
The proposed approach of a direct mapping is superior as it prioritizes the long-term goals of performance and ergonomics, even if it requires a phased implementation to achieve full platform support.