Clang performs the equivalent of Carbon's lower progressively, interleaved
with parsing/semantic analysis. This is in conflict with Carbon's phase-based
approach and leads to bugs in missing functionality in Clang's generated IR
during Carbon/C++ interop.
We analyze different uses of Clang's APIs to better understand the tradeoffs between them, and propose a direction for Carbon.
There are several different users of Clang's APIs that take different approaches to address their needs (needs more or less similar to Carbon's), below is a survey of those approaches:
clang::CodeGeneratorImpl
is registered during clang’s parsing/semantic analysis, receiving callbacks
during that process rather than in a batch phase afterwards. Specifically, the
CodeGeneratorImpl has several virtual function callbacks that handle various
features. I tested clang by disabling or asserting in the various callbacks to
identify tests/examples that rely on the callback. I was then able to verify
that Carbon had missing functionality due to not implementing the callback using
a test like this:
test.cpp
// some code
test.carbon
import Cpp inline '''
#include "test.cpp"
''';
$ diff <(clang++ test.cpp -c && nm test.o) \
<(carbon compile interop.carbon --optimize=none && nm interop.o)
Any difference should be a bug in the Carbon compiler's interop support. Here
are some (non-exhaustive) examples I found, based on different callbacks in the
ASTConsumer API:
HandleCXXStaticMemberVarInstantiation handles instantiating C++ static
member variables in template contexts like this:
template<typename T>struct t3 {
static int i;
};
template<typename T>int t3<T>::i;
void f1() {
// Without the callback, t3<int>::i is not emitted. t3<int>::i = 42;
}
HandleTopLevelDecl this is the main callback that handles each top level
(nested only within namespaces - not within another class or function)
declarations for code generation
EmitDeferredDecls+HandleInlineFunctionDefinition for emitting inline
function definitions in certain situations, like this:
struct t2 {
// Without the callback, `func`'s definition is not emitted.
__attribute__((used)) void func() {}
};
HandleTagDeclDefinition updates types in the IR when a definition is
provided later (not relevant to Carbon or Swift since they only generate the
IR once the AST is complete anyway), eg:
struct S;
extern S a[10];
S(*b)[10] = &a;
struct S {
int x;
};
// Without the callback, this code still compiles,
// but uses a gep over a raw byte array, whereas
// with the callback it uses a gep over the `struct S` type.
int f() { return a[3].x; }
HandleTagDeclRequiredDefinition seems to be just for Microsoft debug info.
HandleTranslationUnit handles finishing things up after the translation
unit - Carbon can call this & get the same behavior.
AssignInheritanceModel related to the Microsoft inheritance attribute for.
CompleteTentativeDefinition seems to be only relevant to C code, not C++.
CompleteExternalDeclaration seems to be only relevant to the BPF target.
HandleVTable emits vtables as needed, eg:
struct t1 {
virtual void f1();
};
// Without the callback, the vtable is not emitted despite
// the appearance of this key function definition.
void t1::f1() { }
MyLib-Swift.h header file,
unmodified Clang can then parse that header for calling into the Swift
libraryclang::CodeGenerator itself (rather than by way of
clang-the-compiler-like use) in swift::IRGenModule
IRGenModule::emitClangDecl
whenever it needs a decl from Clang for a call from Swift.IRGenModule::emitClangDecl.clang::CodeGenerator::HandleTopLevelDeclclang::ParseAST with the clang::CodeGenerator already registered
check used a Clang Tooling
based API, clang::tooling::buildASTFromCodeWithArgscheck does things to the AST, trigger template instantiation, etclower uses clang::CreateLLVMCodeGen to create a code generator for the
ASTclang::CodeGeneratorHandleTopLevelDecl callback.Since PR5543, several changes (especially PR6237) have been made to Clang for related but incremental reasons. This has resulted in what was indivisible work that motivated PR5543's multithreading to be inlined into Carbon.
With that code inlined, we're now able to address the underlying desire - have a
clang::CodeGenerator attached to Clang's Sema throughout
Carbonscheckphase - allowing Clang to lower as it does in the nativeclang`
compilation. This avoids the divergence without the multithreading complexity.
The main cost is the inherent difference between Clang's continuous lowering and Carbon's phase based lowering, though that seems to be an acceptable cost to avoid friction trying to otherwise wedge Clang into Carbon's phase based approach.