Jelajahi Sumber

C++ Interop: Toolchain Implementation for Function Calls (#6254)

This proposal details the toolchain implementation for calling imported
C++
functions from Carbon. It covers how C++ overload sets are handled, the
process
of overload resolution leveraging Clang, and the generation of "thunks"
(intermediate functions) when necessary to bridge Application Binary
Interface
(ABI) differences between Carbon and C++.
Boaz Brickner 3 bulan lalu
induk
melakukan
3c70a9f59b
1 mengubah file dengan 252 tambahan dan 0 penghapusan
  1. 252 0
      proposals/p6254.md

+ 252 - 0
proposals/p6254.md

@@ -0,0 +1,252 @@
+# C++ Interop: Toolchain Implementation for Function Calls
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/6254)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Abstract](#abstract)
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [Importing C++ functions](#importing-c-functions)
+    -   [Overload resolution](#overload-resolution)
+    -   [Direct calls versus thunks](#direct-calls-versus-thunks)
+    -   [Thunk generation](#thunk-generation)
+    -   [Parameter and return value handling](#parameter-and-return-value-handling)
+    -   [Member function calls](#member-function-calls)
+    -   [Operator calls](#operator-calls)
+-   [Rationale](#rationale)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Require manual C++ wrappers](#require-manual-c-wrappers)
+    -   [Mandate Carbon ABI compatibility with C++](#mandate-carbon-abi-compatibility-with-c)
+
+<!-- tocstop -->
+
+## Abstract
+
+This proposal details the toolchain implementation for calling imported C++
+functions from Carbon. It covers how C++ overload sets are handled, the process
+of overload resolution leveraging Clang, and the generation of "thunks"
+(intermediate functions) when necessary to bridge Application Binary Interface
+(ABI) differences between Carbon and C++.
+
+## Problem
+
+Seamless, high-performance interoperability with C++
+[is a fundamental goal of Carbon](https://github.com/carbon-language/carbon-lang/blob/f9bd01536b97961039257cc10fb20b495f7a9b33/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code).
+The Carbon language design for C++ interoperability, particularly for function
+calls, is described in the
+[Carbon calling convention design](https://docs.google.com/document/d/1KUxumZtNe3mY3TsjW2s_ZADOlAaFlrtsLKHVILtqIaM).
+To implement that design, the Carbon toolchain must be able to translate
+Carbon-side calls to C++ functions into instructions the C++ side can
+understand. Several challenges arise at the toolchain level:
+
+-   C++ supports function overloading, requiring the toolchain to resolve calls
+    to the correct C++ function within an overload set.
+-   C++ types do not always have identical representations or ABIs to their
+    Carbon counterparts (see
+    [Carbon <-> C++ Interop: Primitive Types](https://github.com/carbon-language/carbon-lang/blob/44b2f60c90df5c1b0ce86f97bb0ece2a94eb50ea/proposals/p5448.md)).
+    For example, parameter passing conventions (by value, by pointer) or return
+    value handling (direct return versus return slot) might differ. This may
+    require the toolchain to synthesize adapter code.
+-   C++ member functions require special handling of the `this` pointer.
+-   C++ supports features like default arguments which need a defined mapping.
+
+A clear, robust implementation strategy is needed to handle these complexities,
+ensuring both correctness and performance.
+
+## Background
+
+[Carbon's C++ interoperability philosophy](https://github.com/carbon-language/carbon-lang/blob/01e12111a8a685694ccd2c9deb2779f907917543/docs/design/interoperability/philosophy_and_goals.md)
+aims to minimize bridge code and provide unsurprising mappings. When Carbon code
+imports a C++ header, the functions declared within become potentially callable
+entities. C++ overload resolution rules are complex, and replicating them
+perfectly within Carbon would be difficult and likely divergent over time.
+Furthermore, direct calls are only possible when the ABI conventions of the
+Carbon call site precisely match the expectations of the C++ callee.
+
+## Proposal
+
+1.  **Import:** C++ functions and methods, including overload sets, are imported
+    into Carbon and represented internally (conceptually, as specific overload
+    set instructions in SemIR).
+2.  **Overload Resolution:** When a call to an imported C++ function or overload
+    set occurs in Carbon, Carbon leverages Clang's overload resolution
+    mechanism. Carbon argument types are mapped to hypothetical C++ types /
+    expressions, and Clang's `Sema` determines the best viable function.
+3.  **ABI Bridging (Thunks):**
+    -   If the selected C++ function's ABI (parameter types, return type
+        handling, calling convention) matches the Carbon call site's ABI based
+        on defined type mappings, a direct call is generated.
+    -   If the ABIs mismatch, Carbon generates an intermediate function, called
+        a **C++ thunk**. This thunk has a "simple" ABI callable directly from
+        Carbon (typically using only pointers and basic integer types like
+        `i32`/`i64`). The thunk internally calls the actual C++ function,
+        performing necessary argument conversions (for example, loading a value
+        from a pointer) and handling return value conventions (for example,
+        managing a return slot).
+4.  **Call Execution:** The Carbon code either calls the C++ function directly
+    or calls the generated C++ thunk.
+
+## Details
+
+### Importing C++ functions
+
+When a C++ header is imported using `import Cpp`, declarations within that
+header are made available. Function declarations, including member functions and
+overloaded functions, are represented internally within Carbon's SemIR. An
+overload set from C++ is represented as a single callable entity in Carbon,
+associated with the set of C++ candidate functions.
+
+### Overload resolution
+
+To resolve a call like `Cpp.MyNamespace.MyFunc(arg1, arg2)` where `MyFunc` might
+be an overload set imported from C++:
+
+1.  **Map Arguments:** Carbon argument instructions (`arg1`, `arg2`) are mapped
+    to placeholder C++ expressions (conceptually similar to
+    [`clang::OpaqueValueExpr`](https://github.com/llvm/llvm-project/blob/1e99026b45b048a52f8372399ab83d488132842e/clang/include/clang/AST/Expr.h#L1178)).
+    The types of these expressions are determined by mapping the Carbon argument
+    types to corresponding C++ types
+    ([Carbon <-> C++ Interop: Primitive Types](https://github.com/carbon-language/carbon-lang/blob/44b2f60c90df5c1b0ce86f97bb0ece2a94eb50ea/proposals/p5448.md)).
+2.  **Invoke Clang Sema:** Carbon invokes Clang's overload resolution logic
+    ([`clang::OverloadCandidateSet::BestViableFunction()`](https://github.com/llvm/llvm-project/blob/1e99026b45b048a52f8372399ab83d488132842e/clang/include/clang/Sema/Overload.h#L1456))
+    with the mapped C++ name, the candidate functions from the imported overload
+    set, and the placeholder argument expressions.
+3.  **Select Candidate:** Clang determines the best viable C++ function based on
+    C++ rules (implicit conversions, template argument deduction if applicable
+    later, etc.). If resolution fails (no viable function, ambiguity), Clang's
+    diagnostics are surfaced as Carbon diagnostics.
+4.  **Access Check:** After selecting a function, Carbon checks if the function
+    is accessible based on C++ access specifiers (`public`, `protected`,
+    `private`) in the context of the call.
+
+### Direct calls versus thunks
+
+A direct call from Carbon to C++ is possible only if the ABI matches exactly. A
+**C++ thunk** is required if:
+
+-   **Type Representation Mismatch:** A parameter or the return type has a
+    different representation in Carbon than expected by the C++ ABI, requiring
+    conversion. For example, a Carbon `bool` (`i1`) passed to a C++ `bool`
+    (often `i8`), or complex struct types.
+-   **Return Convention Mismatch:** The C++ function returns a non-trivial type
+    by value, which typically requires a hidden return slot parameter in the
+    ABI, whereas Carbon might expect a direct return value.
+-   **Parameter Convention Mismatch:** C++ expects a parameter by way of
+    pointer/reference where Carbon provides a value, or vice-versa.
+-   **Default Arguments:** The Carbon call omits arguments that have default
+    values in C++. The thunk provides the default values.
+-   **Variadic arguments:** (Future work) Calling
+    [C++ variadic arguments](https://en.cppreference.com/w/cpp/language/variadic_arguments.html)
+    functions.
+
+If a thunk is _not_ required, Carbon emits a direct call instruction targeting
+the mangled name of the C++ function.
+
+### Thunk generation
+
+If a thunk is required for a C++ function `CppOriginalFunc()`, Carbon generates
+a new internal function, conceptually `CppOriginalFunc__carbon_thunk()`:
+
+1.  **Signature:** The thunk has an ABI that is simple and directly callable
+    from Carbon.
+    -   Parameters corresponding to C++ parameters with complex ABIs are passed
+        by pointer (`T*`).
+    -   Parameters with simple ABIs (like `i32`, `i64`, raw pointers) are passed
+        directly.
+    -   If `CppOriginalFunc` uses a return slot, the thunk takes a pointer
+        parameter for the return slot. Its LLVM return type becomes `void`.
+    -   If `CppOriginalFunc` returns a simple type directly, the thunk returns
+        the same simple type directly.
+2.  **Body:** The thunk body performs the following:
+    -   Loads values from pointer arguments passed by Carbon where necessary.
+    -   Performs necessary type conversions between Carbon simple ABI types and
+        C++ expected types (for example, `i1` to `i8` for `bool`).
+    -   Calls `CppOriginalFunc` with the converted arguments, potentially
+        passing the return slot address.
+    -   If `CppOriginalFunc` returned directly, the thunk returns that value. If
+        it used a return slot, the thunk returns `void`.
+3.  **Attributes:** The thunk is typically marked `always_inline` to encourage
+    the optimizer to remove the indirection. It is given a predictable mangled
+    name based on the original function's mangled name plus a suffix.
+
+The Carbon call site then calls the thunk instead of the original C++ function.
+
+### Parameter and return value handling
+
+-   **Arguments:** When calling a C++ function (directly or by way of a thunk),
+    Carbon arguments undergo implicit conversions as needed to match the
+    parameter types determined by overload resolution. For calls requiring a
+    thunk, additional conversions might occur at the call site (for example,
+    taking the address of an object to pass by pointer to the thunk) and within
+    the thunk (for example, loading the object from the pointer).
+-   **Return Values:** If the C++ function returns `void`, the Carbon call
+    expression has type `()`. If it returns a simple type directly, the Carbon
+    call has the corresponding mapped Carbon type. If the C++ function uses a
+    return slot, the Carbon call is modeled as initializing the storage
+    designated by the return slot argument (often a temporary created at the
+    call site), and the overall call expression typically results in the
+    initialized value.
+
+### Member function calls
+
+-   **Instance Methods:** When `object.CppMethod()` is called, `object` becomes
+    the implicit `this` argument. Clang's overload resolution handles the
+    qualification (for example, `const`). The `this` pointer is passed as the
+    first argument, either directly or to the thunk.
+-   **Static Methods:** Calls like `CppClass::StaticMethod()` are treated like
+    free function calls; no `this` pointer is involved.
+
+### Operator calls
+
+Calls to overloaded C++ operators are handled similarly to function calls.
+Carbon identifies the operator call, looks up potential C++ operator functions
+(both member and non-member), and uses Clang's overload resolution to select the
+best candidate. Thunks may be generated if required by the selected operator
+function's ABI.
+
+## Rationale
+
+-   **Leverages Clang:** Reusing Clang's overload resolution avoids
+    reimplementing complex C++ rules and ensures consistency.
+-   **Performance:** Direct calls are used when possible. Thunks are designed to
+    be minimal and aggressively inlined, minimizing overhead.
+-   **Correctness:** Thunks handle ABI mismatches systematically, ensuring
+    correct data marshalling between Carbon and C++.
+-   **Developer Experience:** Aims for C++ calls to feel natural in Carbon,
+    hiding much of the complexity of ABI bridging.
+-   **Interop Goal:** Directly supports the core goal of seamless C++
+    interoperability.
+
+## Alternatives considered
+
+### Require manual C++ wrappers
+
+Instead of generating thunks automatically, Carbon could require developers to
+write C++ wrapper functions with simple C-like ABIs for any C++ function whose
+ABI doesn't directly match Carbon's expectations.
+
+-   **Rejected because:** This places a significant burden on the developer,
+    increases boilerplate, hinders rapid iteration, and makes C++ libraries feel
+    less integrated. It violates the goal of minimizing bridge code.
+
+### Mandate Carbon ABI compatibility with C++
+
+Carbon could define its types and calling conventions to always match a specific
+C++ ABI (for example, Itanium).
+
+-   **Rejected because:** This would heavily constrain Carbon's own evolution
+    and design choices. It wouldn't solve the problem entirely, as C++ ABIs
+    themselves vary (for example, between platforms, compilers, or even
+    libraries like libc++ vs libstdc++ for `string_view`). It conflicts with the
+    goal of software and language evolution.