Skip to content

Commit 9e7ce77

Browse files
authored
[Clang]: Support opt-in speculative devirtualization (#159685)
This patch adds Clang support for speculative devirtualization and integrates the related pass into the pass pipeline. It's building on the LLVM backend implementation from PR #159048. Speculative devirtualization transforms an indirect call (the virtual function) to a guarded direct call. It is guarded by a comparison of the virtual function pointer to the expected target. This optimization is still safe without LTO because it doesn't do direct calls, it's conditional according to the function ptr. This optimization: - Opt-in: Disabled by default, enabled via `-fdevirtualize-speculatively` - Works in non-LTO mode - Handles publicly-visible objects. - Uses guarded devirtualization with fallback to indirect calls when the speculation is incorrect. For this C++ example: ``` class Base { public: __attribute__((noinline)) virtual void virtual_function1() { asm volatile("NOP"); } virtual void virtual_function2() { asm volatile("NOP"); } }; class Derived : public Base { public: void virtual_function2() override { asm volatile("NOP"); } }; __attribute__((noinline)) void foo(Base *BV) { BV->virtual_function1(); } void bar() { Base *b = new Derived(); foo(b); } ``` Here is the IR without enabling speculative devirtualization: ``` define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 { entry: %vtable = load ptr, ptr %BV, align 8, !tbaa !6 %0 = load ptr, ptr %vtable, align 8 tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV) ret void } ``` IR after enabling speculative devirtualization: ``` define dso_local void @_Z3fooP4Base(ptr noundef %BV) local_unnamed_addr #0 { entry: %vtable = load ptr, ptr %BV, align 8, !tbaa !12 %0 = load ptr, ptr %vtable, align 8 %1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15 if.true.direct_targ: ; preds = %entry tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV) br label %if.end.icp if.false.orig_indirect: ; preds = %entry tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV) br label %if.end.icp if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ ret void } ```
1 parent 1226a6d commit 9e7ce77

File tree

17 files changed

+379
-38
lines changed

17 files changed

+379
-38
lines changed

clang/docs/ReleaseNotes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -343,6 +343,7 @@ Modified Compiler Flags
343343
-----------------------
344344
- The `-gkey-instructions` compiler flag is now enabled by default when DWARF is emitted for plain C/C++ and optimizations are enabled. (#GH149509)
345345
- The `-fconstexpr-steps` compiler flag now accepts value `0` to opt out of this limit. (#GH160440)
346+
- The `-fdevirtualize-speculatively` compiler flag is now supported to enable speculative devirtualization of virtual function calls, it's disabled by default. (#GH159685)
346347

347348
Removed Compiler Flags
348349
-------------------------

clang/docs/UsersManual.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2352,6 +2352,56 @@ are listed below.
23522352
pure ThinLTO, as all split regular LTO modules are merged and LTO linked
23532353
with regular LTO.
23542354

2355+
.. option:: -fdevirtualize-speculatively
2356+
2357+
Enable speculative devirtualization optimization where a virtual call
2358+
can be transformed into a direct call under the assumption that its
2359+
object is of a particular type. A runtime check is inserted to validate
2360+
the assumption before making the direct call, and if the check fails,
2361+
the original virtual call is made instead. This optimization can enable
2362+
more inlining opportunities and better optimization of the direct call.
2363+
This is different from whole program devirtualization optimization
2364+
that rely on global analysis and hidden visibility of the objects to prove
2365+
that the object is always of a particular type at a virtual call site.
2366+
This optimization doesn't require global analysis or hidden visibility.
2367+
This optimization doesn't devirtualize all virtual calls, but only
2368+
when there's a single implementation of the virtual function in the module.
2369+
There could be a single implementation of the virtual function
2370+
either because the function is not overridden in any derived class,
2371+
or because all objects are instances of the same class/type.
2372+
2373+
Ex of IR before the optimization:
2374+
2375+
.. code-block:: llvm
2376+
2377+
%vtable = load ptr, ptr %BV, align 8, !tbaa !6
2378+
%0 = tail call i1 @llvm.public.type.test(ptr %vtable, metadata !"_ZTS4Base")
2379+
tail call void @llvm.assume(i1 %0)
2380+
%0 = load ptr, ptr %vtable, align 8
2381+
tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
2382+
ret void
2383+
2384+
IR after the optimization:
2385+
2386+
.. code-block:: llvm
2387+
2388+
%vtable = load ptr, ptr %BV, align 8, !tbaa !12
2389+
%0 = load ptr, ptr %vtable, align 8
2390+
%1 = icmp eq ptr %0, @_ZN4Base17virtual_function1Ev
2391+
br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect, !prof !15
2392+
if.true.direct_targ: ; preds = %entry
2393+
tail call void @_ZN4Base17virtual_function1Ev(ptr noundef nonnull align 8 dereferenceable(8) %BV)
2394+
br label %if.end.icp
2395+
if.false.orig_indirect: ; preds = %entry
2396+
tail call void %0(ptr noundef nonnull align 8 dereferenceable(8) %BV)
2397+
br label %if.end.icp
2398+
if.end.icp: ; preds = %if.false.orig_indirect, %if.true.direct_targ
2399+
ret void
2400+
2401+
This feature is temporarily ignored at the LLVM side when LTO is enabled.
2402+
TODO: Update the comment when the LLVM side supports this feature for LTO.
2403+
This feature is turned off by default.
2404+
23552405
.. option:: -f[no-]unique-source-file-names
23562406

23572407
When enabled, allows the compiler to assume that each object file
@@ -5216,6 +5266,8 @@ Execute ``clang-cl /?`` to see a list of supported options:
52165266
-fstandalone-debug Emit full debug info for all types used by the program
52175267
-fstrict-aliasing Enable optimizations based on strict aliasing rules
52185268
-fsyntax-only Run the preprocessor, parser and semantic analysis stages
5269+
-fdevirtualize-speculatively
5270+
Enables speculative devirtualization optimization.
52195271
-fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto
52205272
-gcodeview-ghash Emit type record hashes in a .debug$H section
52215273
-gcodeview Generate CodeView debug information

clang/include/clang/Basic/CodeGenOptions.def

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,8 @@ VALUE_CODEGENOPT(WarnStackSize , 32, UINT_MAX, Benign) ///< Set via -fwarn-s
364364
CODEGENOPT(NoStackArgProbe, 1, 0, Benign) ///< Set when -mno-stack-arg-probe is used
365365
CODEGENOPT(EmitLLVMUseLists, 1, 0, Benign) ///< Control whether to serialize use-lists.
366366

367+
CODEGENOPT(DevirtualizeSpeculatively, 1, 0, Benign) ///< Whether to apply the speculative
368+
/// devirtualization optimization.
367369
CODEGENOPT(WholeProgramVTables, 1, 0, Benign) ///< Whether to apply whole-program
368370
/// vtable optimization.
369371

clang/include/clang/Options/Options.td

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4522,6 +4522,13 @@ defm new_infallible : BoolFOption<"new-infallible",
45224522
BothFlags<[], [ClangOption, CC1Option],
45234523
" treating throwing global C++ operator new as always returning valid memory "
45244524
"(annotates with __attribute__((returns_nonnull)) and throw()). This is detectable in source.">>;
4525+
defm devirtualize_speculatively
4526+
: BoolFOption<"devirtualize-speculatively",
4527+
CodeGenOpts<"DevirtualizeSpeculatively">, DefaultFalse,
4528+
PosFlag<SetTrue, [], [],
4529+
"Enables speculative devirtualization optimization.">,
4530+
NegFlag<SetFalse>,
4531+
BothFlags<[], [ClangOption, CLOption, CC1Option]>>;
45254532
defm whole_program_vtables : BoolFOption<"whole-program-vtables",
45264533
CodeGenOpts<"WholeProgramVTables">, DefaultFalse,
45274534
PosFlag<SetTrue, [], [ClangOption, CC1Option],
@@ -7132,9 +7139,8 @@ defm variable_expansion_in_unroller : BooleanFFlag<"variable-expansion-in-unroll
71327139
Group<clang_ignored_gcc_optimization_f_Group>;
71337140
defm web : BooleanFFlag<"web">, Group<clang_ignored_gcc_optimization_f_Group>;
71347141
defm whole_program : BooleanFFlag<"whole-program">, Group<clang_ignored_gcc_optimization_f_Group>;
7135-
defm devirtualize : BooleanFFlag<"devirtualize">, Group<clang_ignored_gcc_optimization_f_Group>;
7136-
defm devirtualize_speculatively : BooleanFFlag<"devirtualize-speculatively">,
7137-
Group<clang_ignored_gcc_optimization_f_Group>;
7142+
defm devirtualize : BooleanFFlag<"devirtualize">,
7143+
Group<clang_ignored_gcc_optimization_f_Group>;
71387144

71397145
// Generic gfortran options.
71407146
def A_DASH : Joined<["-"], "A-">, Group<gfortran_Group>;

clang/lib/CodeGen/BackendUtil.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -946,6 +946,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
946946
// non-integrated assemblers don't recognize .cgprofile section.
947947
PTO.CallGraphProfile = !CodeGenOpts.DisableIntegratedAS;
948948
PTO.UnifiedLTO = CodeGenOpts.UnifiedLTO;
949+
PTO.DevirtualizeSpeculatively = CodeGenOpts.DevirtualizeSpeculatively;
949950

950951
LoopAnalysisManager LAM;
951952
FunctionAnalysisManager FAM;

clang/lib/CodeGen/CGClass.cpp

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2827,10 +2827,15 @@ void CodeGenFunction::EmitTypeMetadataCodeForVCall(const CXXRecordDecl *RD,
28272827
SourceLocation Loc) {
28282828
if (SanOpts.has(SanitizerKind::CFIVCall))
28292829
EmitVTablePtrCheckForCall(RD, VTable, CodeGenFunction::CFITCK_VCall, Loc);
2830-
else if (CGM.getCodeGenOpts().WholeProgramVTables &&
2831-
// Don't insert type test assumes if we are forcing public
2832-
// visibility.
2833-
!CGM.AlwaysHasLTOVisibilityPublic(RD)) {
2830+
// Emit the intrinsics of (type_test and assume) for the features of WPD and
2831+
// speculative devirtualization. For WPD, emit the intrinsics only for the
2832+
// case of non_public LTO visibility.
2833+
// TODO: refactor this condition and similar ones into a function (e.g.,
2834+
// ShouldEmitDevirtualizationMD) to encapsulate the details of the different
2835+
// types of devirtualization.
2836+
else if ((CGM.getCodeGenOpts().WholeProgramVTables &&
2837+
!CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
2838+
CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
28342839
CanQualType Ty = CGM.getContext().getCanonicalTagType(RD);
28352840
llvm::Metadata *MD = CGM.CreateMetadataIdentifierForType(Ty);
28362841
llvm::Value *TypeId =
@@ -2988,8 +2993,9 @@ void CodeGenFunction::EmitVTablePtrCheck(const CXXRecordDecl *RD,
29882993
}
29892994

29902995
bool CodeGenFunction::ShouldEmitVTableTypeCheckedLoad(const CXXRecordDecl *RD) {
2991-
if (!CGM.getCodeGenOpts().WholeProgramVTables ||
2992-
!CGM.HasHiddenLTOVisibility(RD))
2996+
if ((!CGM.getCodeGenOpts().WholeProgramVTables ||
2997+
!CGM.HasHiddenLTOVisibility(RD)) &&
2998+
!CGM.getCodeGenOpts().DevirtualizeSpeculatively)
29932999
return false;
29943000

29953001
if (CGM.getCodeGenOpts().VirtualFunctionElimination)

clang/lib/CodeGen/CGVTables.cpp

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1363,10 +1363,12 @@ llvm::GlobalObject::VCallVisibility CodeGenModule::GetVCallVisibilityLevel(
13631363
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
13641364
llvm::GlobalVariable *VTable,
13651365
const VTableLayout &VTLayout) {
1366-
// Emit type metadata on vtables with LTO or IR instrumentation.
1366+
// Emit type metadata on vtables with LTO or IR instrumentation or
1367+
// speculative devirtualization.
13671368
// In IR instrumentation, the type metadata is used to find out vtable
13681369
// definitions (for type profiling) among all global variables.
1369-
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr())
1370+
if (!getCodeGenOpts().LTOUnit && !getCodeGenOpts().hasProfileIRInstr() &&
1371+
!getCodeGenOpts().DevirtualizeSpeculatively)
13701372
return;
13711373

13721374
CharUnits ComponentWidth = GetTargetTypeStoreSize(getVTableComponentType());

clang/lib/CodeGen/ItaniumCXXABI.cpp

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -716,10 +716,14 @@ CGCallee ItaniumCXXABI::EmitLoadOfMemberFunctionPointer(
716716

717717
bool ShouldEmitVFEInfo = CGM.getCodeGenOpts().VirtualFunctionElimination &&
718718
CGM.HasHiddenLTOVisibility(RD);
719+
// TODO: Update this name not to be restricted to WPD only
720+
// as we now emit the vtable info info for speculative devirtualization as
721+
// well.
719722
bool ShouldEmitWPDInfo =
720-
CGM.getCodeGenOpts().WholeProgramVTables &&
721-
// Don't insert type tests if we are forcing public visibility.
722-
!CGM.AlwaysHasLTOVisibilityPublic(RD);
723+
(CGM.getCodeGenOpts().WholeProgramVTables &&
724+
// Don't insert type tests if we are forcing public visibility.
725+
!CGM.AlwaysHasLTOVisibilityPublic(RD)) ||
726+
CGM.getCodeGenOpts().DevirtualizeSpeculatively;
723727
llvm::Value *VirtualFn = nullptr;
724728

725729
{
@@ -2110,17 +2114,20 @@ void ItaniumCXXABI::emitVTableDefinitions(CodeGenVTables &CGVT,
21102114

21112115
// Always emit type metadata on non-available_externally definitions, and on
21122116
// available_externally definitions if we are performing whole program
2113-
// devirtualization. For WPD we need the type metadata on all vtable
2114-
// definitions to ensure we associate derived classes with base classes
2115-
// defined in headers but with a strong definition only in a shared library.
2117+
// devirtualization or speculative devirtualization. We need the type metadata
2118+
// on all vtable definitions to ensure we associate derived classes with base
2119+
// classes defined in headers but with a strong definition only in a shared
2120+
// library.
21162121
if (!VTable->isDeclarationForLinker() ||
2117-
CGM.getCodeGenOpts().WholeProgramVTables) {
2122+
CGM.getCodeGenOpts().WholeProgramVTables ||
2123+
CGM.getCodeGenOpts().DevirtualizeSpeculatively) {
21182124
CGM.EmitVTableTypeMetadata(RD, VTable, VTLayout);
21192125
// For available_externally definitions, add the vtable to
21202126
// @llvm.compiler.used so that it isn't deleted before whole program
21212127
// analysis.
21222128
if (VTable->isDeclarationForLinker()) {
2123-
assert(CGM.getCodeGenOpts().WholeProgramVTables);
2129+
assert(CGM.getCodeGenOpts().WholeProgramVTables ||
2130+
CGM.getCodeGenOpts().DevirtualizeSpeculatively);
21242131
CGM.addCompilerUsedGlobal(VTable);
21252132
}
21262133
}

clang/lib/Driver/ToolChains/Clang.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7745,6 +7745,11 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
77457745

77467746
addOpenMPHostOffloadingArgs(C, JA, Args, CmdArgs);
77477747

7748+
if (Args.hasFlag(options::OPT_fdevirtualize_speculatively,
7749+
options::OPT_fno_devirtualize_speculatively,
7750+
/*Default value*/ false))
7751+
CmdArgs.push_back("-fdevirtualize-speculatively");
7752+
77487753
bool VirtualFunctionElimination =
77497754
Args.hasFlag(options::OPT_fvirtual_function_elimination,
77507755
options::OPT_fno_virtual_function_elimination, false);
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
// Test that Clang emits vtable metadata when speculative devirtualization is enabled.
2+
// RUN: %clang_cc1 -triple x86_64-unknown-linux -fdevirtualize-speculatively -emit-llvm -o - %s | FileCheck --check-prefix=CHECK %s
3+
4+
struct A {
5+
A();
6+
virtual void f();
7+
};
8+
9+
struct B : virtual A {
10+
B();
11+
virtual void g();
12+
virtual void h();
13+
};
14+
15+
namespace {
16+
17+
struct D : B {
18+
D();
19+
virtual void f();
20+
virtual void h();
21+
};
22+
23+
}
24+
25+
A::A() {}
26+
B::B() {}
27+
D::D() {}
28+
29+
void A::f() {
30+
}
31+
32+
void B::g() {
33+
}
34+
35+
void D::f() {
36+
}
37+
38+
void D::h() {
39+
}
40+
41+
void af(A *a) {
42+
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1A")
43+
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
44+
a->f();
45+
}
46+
47+
void dg1(D *d) {
48+
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.public.type.test(ptr [[VT:%[^ ]*]], metadata !"_ZTS1B")
49+
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
50+
d->g();
51+
}
52+
53+
void df1(D *d) {
54+
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !11)
55+
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
56+
d->f();
57+
}
58+
59+
void dh1(D *d) {
60+
// CHECK: [[P:%[^ ]*]] = call i1 @llvm.type.test(ptr [[VT:%[^ ]*]], metadata !11)
61+
// CHECK-NEXT: call void @llvm.assume(i1 [[P]])
62+
d->h();
63+
}
64+
65+
66+
D d;
67+
68+
void foo() {
69+
dg1(&d);
70+
df1(&d);
71+
dh1(&d);
72+
73+
74+
struct FA : A {
75+
void f() {}
76+
} fa;
77+
af(&fa);
78+
}

0 commit comments

Comments
 (0)