Skip to content

[MemProf] Allow promotion if target is a declaration #115555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 9, 2024

Conversation

teresajohnson
Copy link
Contributor

@teresajohnson teresajohnson commented Nov 8, 2024

Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (flag is off which means the new
promotion is enabled by default) to simplify debugging or disabling it if
this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend

Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (enabled by default) to simplify
debugging or disabling it if this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Nov 8, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 8, 2024

@llvm/pr-subscribers-lto

Author: Teresa Johnson (teresajohnson)

Changes

Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (enabled by default) to simplify
debugging or disabling it if this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend


Full diff: https://github.com/llvm/llvm-project/pull/115555.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+12-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+110-24)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index f176a76164698a..353dc00c9928e1 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -132,6 +132,11 @@ cl::opt<bool> EnableMemProfContextDisambiguation(
 cl::opt<bool> SupportsHotColdNew(
     "supports-hot-cold-new", cl::init(false), cl::Hidden,
     cl::desc("Linking with hot/cold operator new interfaces"));
+
+cl::opt<bool> MemProfRequireDefinitionForPromotion(
+    "memprof-require-definition-for-promotion", cl::init(false), cl::Hidden,
+    cl::desc(
+        "Require target function definition when promoting indirect calls"));
 } // namespace llvm
 
 extern cl::opt<bool> MemProfReportHintedSizes;
@@ -4602,7 +4607,13 @@ void MemProfContextDisambiguation::performICP(
       // target (or version of the code), and we need to be conservative
       // (similar to what is done in the ICP pass).
       Function *TargetFunction = Symtab->getFunction(Candidate.Value);
-      if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
+      if (TargetFunction == nullptr ||
+          // Any ThinLTO global dead symbol removal should have already
+          // occurred, so it should be safe to promote when the target is a
+          // declaration.
+          // TODO: Remove internal option once more fully tested.
+          (MemProfRequireDefinitionForPromotion &&
+           TargetFunction->isDeclaration())) {
         ORE.emit([&]() {
           return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", CB)
                  << "Memprof cannot promote indirect call: target with md5sum "
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index d4d0e21b4bd7db..3e2912da576f46 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -93,6 +93,8 @@
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
 ; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
 ; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
 ; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
 ; RUN:  -r=%t/main.o,_Znwm, \
 ; RUN:  -r=%t/main.o,_ZdlPvm, \
@@ -113,9 +115,9 @@
 ; RUN:  -pass-remarks=. -save-temps \
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
 ; RUN:  --check-prefix=STATS-BE --check-prefix=REMARKS-MAIN \
-; RUN:  --check-prefix=REMARKS-FOO
+; RUN:  --check-prefix=REMARKS-FOO --check-prefix=REMARKS-FOO-IMPORT
 
-; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR --check-prefix=IR-IMPORT
 
 ;; Try again but with distributed ThinLTO
 ; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
@@ -124,6 +126,8 @@
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
 ; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
 ; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
 ; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
 ; RUN:  -r=%t/main.o,_Znwm, \
 ; RUN:  -r=%t/main.o,_ZdlPvm, \
@@ -147,8 +151,9 @@
 ; RUN:  -enable-memprof-indirect-call-support=true \
 ; RUN:  -summary-file=%t/foo.o.thinlto.bc -memprof-import-summary=%t/foo.o.thinlto.bc \
 ; RUN:  -enable-import-metadata -stats -pass-remarks=. \
-; RUN:  %t/foo.o -S 2>&1 | FileCheck %s --check-prefix=IR \
-; RUN:  --check-prefix=STATS-BE-DISTRIB --check-prefix=REMARKS-FOO
+; RUN:  %t/foo.o -S 2>&1 | FileCheck %s --check-prefix=IR --check-prefix=IR-IMPORT \
+; RUN:  --check-prefix=STATS-BE-DISTRIB --check-prefix=REMARKS-FOO \
+; RUN:	--check-prefix=REMARKS-FOO-IMPORT
 
 ;; Retry with the ICP-disabled object file, and make sure we disable it again
 ;; so we don't look for the synthesized callsite records when applying imports.
@@ -159,6 +164,8 @@
 ; RUN:  -r=%t/foo.noicp.o,_Z3fooR2B0j,plx \
 ; RUN:  -r=%t/foo.noicp.o,_ZN2B03barEj.abc,plx \
 ; RUN:  -r=%t/foo.noicp.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.noicp.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.noicp.o,_ZN1B3barEj, \
 ; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
 ; RUN:  -r=%t/main.o,_Znwm, \
 ; RUN:  -r=%t/main.o,_ZdlPvm, \
@@ -184,6 +191,74 @@
 ;; metadata.
 ; RUN: llvm-dis %t.noicp.out.2.4.opt.bc -o - | FileCheck %s --implicit-check-not "_Z3fooR2B0j.memprof" --implicit-check-not "!callsite"
 
+;; Run in-process ThinLTO again, but with importing disabled by setting the
+;; instruction limit to 0. Ensure that the existing declarations of B::bar
+;; and B0::bar are sufficient to allow for the promotion and cloning.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN:	-import-instr-limit=0 \
+; RUN:	-enable-memprof-indirect-call-support=true \
+; RUN:  -supports-hot-cold-new \
+; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
+; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
+; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
+; RUN:  -r=%t/main.o,_Znwm, \
+; RUN:  -r=%t/main.o,_ZdlPvm, \
+; RUN:  -r=%t/main.o,_Z8externalPi, \
+; RUN:  -r=%t/main.o,main,plx \
+; RUN:  -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN:  -r=%t/main.o,_ZN1B3barEj,plx \
+; RUN:  -r=%t/main.o,_ZTV1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI1B,plx \
+; RUN:  -r=%t/main.o,_ZTV2B0,plx \
+; RUN:	-thinlto-threads=1 \
+; RUN:  -memprof-verify-ccg -memprof-verify-nodes -stats \
+; RUN:  -pass-remarks=. -save-temps \
+; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
+; RUN:  --check-prefix=STATS-BE-NOIMPORT --check-prefix=REMARKS-MAIN \
+; RUN:  --check-prefix=REMARKS-FOO
+
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR --check-prefix=IR-NOIMPORT
+
+;; Run it gain but with -memprof-require-definition-for-promotion, and confirm
+;; that no promotions occur.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN:	-import-instr-limit=0 \
+; RUN:	-memprof-require-definition-for-promotion \
+; RUN:	-enable-memprof-indirect-call-support=true \
+; RUN:  -supports-hot-cold-new \
+; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
+; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
+; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
+; RUN:  -r=%t/main.o,_Znwm, \
+; RUN:  -r=%t/main.o,_ZdlPvm, \
+; RUN:  -r=%t/main.o,_Z8externalPi, \
+; RUN:  -r=%t/main.o,main,plx \
+; RUN:  -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN:  -r=%t/main.o,_ZN1B3barEj,plx \
+; RUN:  -r=%t/main.o,_ZTV1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI1B,plx \
+; RUN:  -r=%t/main.o,_ZTV2B0,plx \
+; RUN:	-thinlto-threads=1 \
+; RUN:  -memprof-verify-ccg -memprof-verify-nodes \
+; RUN:  -pass-remarks=. \
+; RUN:  -o %t.out 2>&1 | FileCheck %s --implicit-check-not Promote
+
 ; REMARKS-MAIN: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
 ; REMARKS-MAIN: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
 ; REMARKS-MAIN: created clone _ZN2B03barEj.memprof.1
@@ -208,30 +283,36 @@
 ; REMARKS-FOO: call in clone _Z3fooR2B0j promoted and assigned to call function clone _ZN2B03barEj
 ; REMARKS-FOO: Promote indirect call to _ZN2B03barEj with count 2 out of 2
 ; REMARKS-FOO: call in clone _Z3fooR2B0j.memprof.1 promoted and assigned to call function clone _ZN2B03barEj.memprof.1
-; REMARKS-FOO: created clone _ZN2B03barEj.memprof.1
-; REMARKS-FOO: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
-; REMARKS-FOO: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
-; REMARKS-FOO: created clone _ZN1B3barEj.memprof.1
-; REMARKS-FOO: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
-; REMARKS-FOO: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
+; REMARKS-FOO-IMPORT: created clone _ZN2B03barEj.memprof.1
+; REMARKS-FOO-IMPORT: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
+; REMARKS-FOO-IMPORT: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
+; REMARKS-FOO-IMPORT: created clone _ZN1B3barEj.memprof.1
+; REMARKS-FOO-IMPORT: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
+; REMARKS-FOO-IMPORT: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
 
 ; STATS: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during whole program analysis
 ; STATS-BE: 8 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
+; STATS-BE-NOIMPORT: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during whole program analysis
 ; STATS-BE: 8 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
+; STATS-BE-NOIMPORT: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS: 3 memprof-context-disambiguation - Number of function clones created during whole program analysis
 ; STATS-BE: 5 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
+; STATS-BE-NOIMPORT: 3 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
 
+; IR-NOIMPORT: foo
 ; IR: define {{.*}} @_Z3fooR2B0j(
-; IR:   %1 = icmp eq ptr %0, @_ZN1B3barEj
-; IR:   br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect
+; IR:   %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
+; IR:   br i1 %[[R1]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-NOIMPORT: call {{.*}} @_ZN1B3barEj(
 ; IR: if.false.orig_indirect:
-; IR:   %2 = icmp eq ptr %0, @_ZN2B03barEj
-; IR:   br i1 %2, label %if.true.direct_targ1, label %if.false.orig_indirect2
+; IR:   %[[R2:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
+; IR:   br i1 %[[R2]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-NOIMPORT: call {{.*}} @_ZN2B03barEj(
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
@@ -239,20 +320,22 @@
 ;; We should still compare against the original versions of bar since that is
 ;; what is in the vtable. However, we should have called the cloned versions
 ;; that perform cold allocations, which were subsequently inlined.
-; IR:   %1 = icmp eq ptr %0, @_ZN1B3barEj
-; IR:   br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect
+; IR:   %[[R3:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
+; IR:   br i1 %[[R3]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-NOIMPORT: call {{.*}} @_ZN1B3barEj.memprof.1(
 ; IR: if.false.orig_indirect:
-; IR:   %2 = icmp eq ptr %0, @_ZN2B03barEj
-; IR:   br i1 %2, label %if.true.direct_targ1, label %if.false.orig_indirect2
+; IR:   %[[R4:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
+; IR:   br i1 %[[R4]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-NOIMPORT: call {{.*}} @_ZN2B03barEj.memprof.1(
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
-; IR: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
-; IR: attributes #[[COLD]] = {{.*}} "memprof"="cold"
+; IR-IMPORT: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
+; IR-IMPORT: attributes #[[COLD]] = {{.*}} "memprof"="cold"
 
 ; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
@@ -272,6 +355,9 @@ define i32 @_ZN2B03barEj.abc(ptr %this, i32 %s) {
   ret i32 0
 }
 
+declare i32 @_ZN2B03barEj(ptr %this, i32 %s)
+declare i32 @_ZN1B3barEj(ptr %this, i32 %s)
+
 define i32 @_Z3fooR2B0j(ptr %b) {
 entry:
   %0 = load ptr, ptr %b, align 8

@llvmbot
Copy link
Member

llvmbot commented Nov 8, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Teresa Johnson (teresajohnson)

Changes

Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (enabled by default) to simplify
debugging or disabling it if this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend


Full diff: https://github.com/llvm/llvm-project/pull/115555.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp (+12-1)
  • (modified) llvm/test/ThinLTO/X86/memprof-icp.ll (+110-24)
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index f176a76164698a..353dc00c9928e1 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -132,6 +132,11 @@ cl::opt<bool> EnableMemProfContextDisambiguation(
 cl::opt<bool> SupportsHotColdNew(
     "supports-hot-cold-new", cl::init(false), cl::Hidden,
     cl::desc("Linking with hot/cold operator new interfaces"));
+
+cl::opt<bool> MemProfRequireDefinitionForPromotion(
+    "memprof-require-definition-for-promotion", cl::init(false), cl::Hidden,
+    cl::desc(
+        "Require target function definition when promoting indirect calls"));
 } // namespace llvm
 
 extern cl::opt<bool> MemProfReportHintedSizes;
@@ -4602,7 +4607,13 @@ void MemProfContextDisambiguation::performICP(
       // target (or version of the code), and we need to be conservative
       // (similar to what is done in the ICP pass).
       Function *TargetFunction = Symtab->getFunction(Candidate.Value);
-      if (TargetFunction == nullptr || TargetFunction->isDeclaration()) {
+      if (TargetFunction == nullptr ||
+          // Any ThinLTO global dead symbol removal should have already
+          // occurred, so it should be safe to promote when the target is a
+          // declaration.
+          // TODO: Remove internal option once more fully tested.
+          (MemProfRequireDefinitionForPromotion &&
+           TargetFunction->isDeclaration())) {
         ORE.emit([&]() {
           return OptimizationRemarkMissed(DEBUG_TYPE, "UnableToFindTarget", CB)
                  << "Memprof cannot promote indirect call: target with md5sum "
diff --git a/llvm/test/ThinLTO/X86/memprof-icp.ll b/llvm/test/ThinLTO/X86/memprof-icp.ll
index d4d0e21b4bd7db..3e2912da576f46 100644
--- a/llvm/test/ThinLTO/X86/memprof-icp.ll
+++ b/llvm/test/ThinLTO/X86/memprof-icp.ll
@@ -93,6 +93,8 @@
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
 ; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
 ; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
 ; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
 ; RUN:  -r=%t/main.o,_Znwm, \
 ; RUN:  -r=%t/main.o,_ZdlPvm, \
@@ -113,9 +115,9 @@
 ; RUN:  -pass-remarks=. -save-temps \
 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
 ; RUN:  --check-prefix=STATS-BE --check-prefix=REMARKS-MAIN \
-; RUN:  --check-prefix=REMARKS-FOO
+; RUN:  --check-prefix=REMARKS-FOO --check-prefix=REMARKS-FOO-IMPORT
 
-; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR --check-prefix=IR-IMPORT
 
 ;; Try again but with distributed ThinLTO
 ; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
@@ -124,6 +126,8 @@
 ; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
 ; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
 ; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
 ; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
 ; RUN:  -r=%t/main.o,_Znwm, \
 ; RUN:  -r=%t/main.o,_ZdlPvm, \
@@ -147,8 +151,9 @@
 ; RUN:  -enable-memprof-indirect-call-support=true \
 ; RUN:  -summary-file=%t/foo.o.thinlto.bc -memprof-import-summary=%t/foo.o.thinlto.bc \
 ; RUN:  -enable-import-metadata -stats -pass-remarks=. \
-; RUN:  %t/foo.o -S 2>&1 | FileCheck %s --check-prefix=IR \
-; RUN:  --check-prefix=STATS-BE-DISTRIB --check-prefix=REMARKS-FOO
+; RUN:  %t/foo.o -S 2>&1 | FileCheck %s --check-prefix=IR --check-prefix=IR-IMPORT \
+; RUN:  --check-prefix=STATS-BE-DISTRIB --check-prefix=REMARKS-FOO \
+; RUN:	--check-prefix=REMARKS-FOO-IMPORT
 
 ;; Retry with the ICP-disabled object file, and make sure we disable it again
 ;; so we don't look for the synthesized callsite records when applying imports.
@@ -159,6 +164,8 @@
 ; RUN:  -r=%t/foo.noicp.o,_Z3fooR2B0j,plx \
 ; RUN:  -r=%t/foo.noicp.o,_ZN2B03barEj.abc,plx \
 ; RUN:  -r=%t/foo.noicp.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.noicp.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.noicp.o,_ZN1B3barEj, \
 ; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
 ; RUN:  -r=%t/main.o,_Znwm, \
 ; RUN:  -r=%t/main.o,_ZdlPvm, \
@@ -184,6 +191,74 @@
 ;; metadata.
 ; RUN: llvm-dis %t.noicp.out.2.4.opt.bc -o - | FileCheck %s --implicit-check-not "_Z3fooR2B0j.memprof" --implicit-check-not "!callsite"
 
+;; Run in-process ThinLTO again, but with importing disabled by setting the
+;; instruction limit to 0. Ensure that the existing declarations of B::bar
+;; and B0::bar are sufficient to allow for the promotion and cloning.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN:	-import-instr-limit=0 \
+; RUN:	-enable-memprof-indirect-call-support=true \
+; RUN:  -supports-hot-cold-new \
+; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
+; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
+; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
+; RUN:  -r=%t/main.o,_Znwm, \
+; RUN:  -r=%t/main.o,_ZdlPvm, \
+; RUN:  -r=%t/main.o,_Z8externalPi, \
+; RUN:  -r=%t/main.o,main,plx \
+; RUN:  -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN:  -r=%t/main.o,_ZN1B3barEj,plx \
+; RUN:  -r=%t/main.o,_ZTV1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI1B,plx \
+; RUN:  -r=%t/main.o,_ZTV2B0,plx \
+; RUN:	-thinlto-threads=1 \
+; RUN:  -memprof-verify-ccg -memprof-verify-nodes -stats \
+; RUN:  -pass-remarks=. -save-temps \
+; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=STATS \
+; RUN:  --check-prefix=STATS-BE-NOIMPORT --check-prefix=REMARKS-MAIN \
+; RUN:  --check-prefix=REMARKS-FOO
+
+; RUN: llvm-dis %t.out.2.4.opt.bc -o - | FileCheck %s --check-prefix=IR --check-prefix=IR-NOIMPORT
+
+;; Run it gain but with -memprof-require-definition-for-promotion, and confirm
+;; that no promotions occur.
+; RUN: llvm-lto2 run %t/main.o %t/foo.o -enable-memprof-context-disambiguation \
+; RUN:	-import-instr-limit=0 \
+; RUN:	-memprof-require-definition-for-promotion \
+; RUN:	-enable-memprof-indirect-call-support=true \
+; RUN:  -supports-hot-cold-new \
+; RUN:  -r=%t/foo.o,_Z3fooR2B0j,plx \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj.abc,plx \
+; RUN:  -r=%t/foo.o,_Z3xyzR2B0j, \
+; RUN:  -r=%t/foo.o,_ZN2B03barEj, \
+; RUN:  -r=%t/foo.o,_ZN1B3barEj, \
+; RUN:  -r=%t/main.o,_Z3fooR2B0j, \
+; RUN:  -r=%t/main.o,_Znwm, \
+; RUN:  -r=%t/main.o,_ZdlPvm, \
+; RUN:  -r=%t/main.o,_Z8externalPi, \
+; RUN:  -r=%t/main.o,main,plx \
+; RUN:  -r=%t/main.o,_ZN2B03barEj,plx \
+; RUN:  -r=%t/main.o,_ZN1B3barEj,plx \
+; RUN:  -r=%t/main.o,_ZTV1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv120__si_class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS1B,plx \
+; RUN:  -r=%t/main.o,_ZTVN10__cxxabiv117__class_type_infoE,plx \
+; RUN:  -r=%t/main.o,_ZTS2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI2B0,plx \
+; RUN:  -r=%t/main.o,_ZTI1B,plx \
+; RUN:  -r=%t/main.o,_ZTV2B0,plx \
+; RUN:	-thinlto-threads=1 \
+; RUN:  -memprof-verify-ccg -memprof-verify-nodes \
+; RUN:  -pass-remarks=. \
+; RUN:  -o %t.out 2>&1 | FileCheck %s --implicit-check-not Promote
+
 ; REMARKS-MAIN: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
 ; REMARKS-MAIN: call in clone main assigned to call function clone _Z3fooR2B0j.memprof.1
 ; REMARKS-MAIN: created clone _ZN2B03barEj.memprof.1
@@ -208,30 +283,36 @@
 ; REMARKS-FOO: call in clone _Z3fooR2B0j promoted and assigned to call function clone _ZN2B03barEj
 ; REMARKS-FOO: Promote indirect call to _ZN2B03barEj with count 2 out of 2
 ; REMARKS-FOO: call in clone _Z3fooR2B0j.memprof.1 promoted and assigned to call function clone _ZN2B03barEj.memprof.1
-; REMARKS-FOO: created clone _ZN2B03barEj.memprof.1
-; REMARKS-FOO: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
-; REMARKS-FOO: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
-; REMARKS-FOO: created clone _ZN1B3barEj.memprof.1
-; REMARKS-FOO: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
-; REMARKS-FOO: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
+; REMARKS-FOO-IMPORT: created clone _ZN2B03barEj.memprof.1
+; REMARKS-FOO-IMPORT: call in clone _ZN2B03barEj marked with memprof allocation attribute notcold
+; REMARKS-FOO-IMPORT: call in clone _ZN2B03barEj.memprof.1 marked with memprof allocation attribute cold
+; REMARKS-FOO-IMPORT: created clone _ZN1B3barEj.memprof.1
+; REMARKS-FOO-IMPORT: call in clone _ZN1B3barEj marked with memprof allocation attribute notcold
+; REMARKS-FOO-IMPORT: call in clone _ZN1B3barEj.memprof.1 marked with memprof allocation attribute cold
 
 ; STATS: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during whole program analysis
 ; STATS-BE: 8 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
+; STATS-BE-NOIMPORT: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during whole program analysis
 ; STATS-BE: 8 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
+; STATS-BE-NOIMPORT: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS: 3 memprof-context-disambiguation - Number of function clones created during whole program analysis
 ; STATS-BE: 5 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
+; STATS-BE-NOIMPORT: 3 memprof-context-disambiguation - Number of function clones created during ThinLTO backend
 
+; IR-NOIMPORT: foo
 ; IR: define {{.*}} @_Z3fooR2B0j(
-; IR:   %1 = icmp eq ptr %0, @_ZN1B3barEj
-; IR:   br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect
+; IR:   %[[R1:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
+; IR:   br i1 %[[R1]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD:[0-9]+]]
+; IR-NOIMPORT: call {{.*}} @_ZN1B3barEj(
 ; IR: if.false.orig_indirect:
-; IR:   %2 = icmp eq ptr %0, @_ZN2B03barEj
-; IR:   br i1 %2, label %if.true.direct_targ1, label %if.false.orig_indirect2
+; IR:   %[[R2:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
+; IR:   br i1 %[[R2]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[NOTCOLD]]
+; IR-NOIMPORT: call {{.*}} @_ZN2B03barEj(
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
@@ -239,20 +320,22 @@
 ;; We should still compare against the original versions of bar since that is
 ;; what is in the vtable. However, we should have called the cloned versions
 ;; that perform cold allocations, which were subsequently inlined.
-; IR:   %1 = icmp eq ptr %0, @_ZN1B3barEj
-; IR:   br i1 %1, label %if.true.direct_targ, label %if.false.orig_indirect
+; IR:   %[[R3:[0-9]+]] = icmp eq ptr %0, @_ZN1B3barEj
+; IR:   br i1 %[[R3]], label %if.true.direct_targ, label %if.false.orig_indirect
 ; IR: if.true.direct_targ:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD:[0-9]+]]
+; IR-NOIMPORT: call {{.*}} @_ZN1B3barEj.memprof.1(
 ; IR: if.false.orig_indirect:
-; IR:   %2 = icmp eq ptr %0, @_ZN2B03barEj
-; IR:   br i1 %2, label %if.true.direct_targ1, label %if.false.orig_indirect2
+; IR:   %[[R4:[0-9]+]] = icmp eq ptr %0, @_ZN2B03barEj
+; IR:   br i1 %[[R4]], label %if.true.direct_targ1, label %if.false.orig_indirect2
 ; IR: if.true.direct_targ1:
-; IR:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-IMPORT:   call {{.*}} @_Znwm(i64 noundef 4) #[[COLD]]
+; IR-NOIMPORT: call {{.*}} @_ZN2B03barEj.memprof.1(
 ; IR: if.false.orig_indirect2:
 ; IR:   call {{.*}} %0
 
-; IR: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
-; IR: attributes #[[COLD]] = {{.*}} "memprof"="cold"
+; IR-IMPORT: attributes #[[NOTCOLD]] = {{.*}} "memprof"="notcold"
+; IR-IMPORT: attributes #[[COLD]] = {{.*}} "memprof"="cold"
 
 ; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
 ; STATS-BE-DISTRIB: 4 memprof-context-disambiguation - Number of not cold static allocations (possibly cloned) during ThinLTO backend
@@ -272,6 +355,9 @@ define i32 @_ZN2B03barEj.abc(ptr %this, i32 %s) {
   ret i32 0
 }
 
+declare i32 @_ZN2B03barEj(ptr %this, i32 %s)
+declare i32 @_ZN1B3barEj(ptr %this, i32 %s)
+
 define i32 @_Z3fooR2B0j(ptr %b) {
 entry:
   %0 = load ptr, ptr %b, align 8

; RUN: -thinlto-threads=1 \
; RUN: -memprof-verify-ccg -memprof-verify-nodes \
; RUN: -pass-remarks=. \
; RUN: -o %t.out 2>&1 | FileCheck %s --implicit-check-not Promote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused why this one has --implicit-check-not Promote where this is the one with flag=true while the previous one does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the flag enforces the more conservative behavior, requiring a definition.

@@ -132,6 +132,11 @@ cl::opt<bool> EnableMemProfContextDisambiguation(
cl::opt<bool> SupportsHotColdNew(
"supports-hot-cold-new", cl::init(false), cl::Hidden,
cl::desc("Linking with hot/cold operator new interfaces"));

cl::opt<bool> MemProfRequireDefinitionForPromotion(
"memprof-require-definition-for-promotion", cl::init(false), cl::Hidden,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this flag default to true to preserve the current behaviour (when the flag is not set on the cmd line)?

The commit message also mentioned enabled by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to change it to be more aggressive by default and enable promotion with the target is a declaration (that's what I meant by "enabled" - promotion enabled, not the flag).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description to be clearer about what I mean here, reread and it is definitely confusing

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@teresajohnson teresajohnson merged commit 3654183 into llvm:main Nov 9, 2024
9 of 11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 9, 2024

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot1 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/3597

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 87033 of 87034 tests, 88 workers --
Testing:  0.. 10
FAIL: Clang :: Interpreter/inline-virtual.cpp (12612 of 87033)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 6: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
RUN: at line 8: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      -Xcc -O2 | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation -Xcc -O2
JIT session error: In graph incr_module_25-jitted-objectbuffer, section .text.startup: relocation target "_ZTV1A" at address 0x79ec8702e000 is out of range of Delta32 fixup at 0x75ec8670f013 (<anonymous block> @ 0x75ec8670f010 + 0x3)
error: Failed to materialize symbols: { (main, { a2, $.incr_module_25.__inits.0, __orc_init_func.incr_module_25 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_25 }) }
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

Step 10 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 87033 of 87034 tests, 88 workers --
Testing:  0.. 10
FAIL: Clang :: Interpreter/inline-virtual.cpp (12612 of 87033)
******************** TEST 'Clang :: Interpreter/inline-virtual.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 6: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
RUN: at line 8: cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation      -Xcc -O2 | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ cat /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/clang-repl -Xcc -fno-rtti -Xcc -fno-sized-deallocation -Xcc -O2
JIT session error: In graph incr_module_25-jitted-objectbuffer, section .text.startup: relocation target "_ZTV1A" at address 0x79ec8702e000 is out of range of Delta32 fixup at 0x75ec8670f013 (<anonymous block> @ 0x75ec8670f010 + 0x3)
error: Failed to materialize symbols: { (main, { a2, $.incr_module_25.__inits.0, __orc_init_func.incr_module_25 }) }
error: Failed to materialize symbols: { (main, { __orc_init_func.incr_module_25 }) }
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp:26:11: error: CHECK: expected string not found in input
// CHECK: ~A(2)
          ^
<stdin>:1:262: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1)
                                                                                                                                                                                                                                                                     ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/clang/test/Interpreter/inline-virtual.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl... clang-repl> clang-repl... clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> ~A(1) 
check:26                                                                                                                                                                                                                                                                          X error: no match found
          2: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:26     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--


Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
Fixes an oversight in the MemProf ICP handling, that was blocking
promotion/cloning of indirect calls when the profiled target is a
declaration (i.e wasn't imported). There is no issue promoting in
that case, and in fact the comment mentions we should attempt to at
least import as declarations to enable more promotion.

Note that normal ICP currently requires that the target be a definition,
which is how this check ended up here. The comment there says that it
must be a definition because ThinLTO could remove declarations for
symbols found to be globally dead in the binary. However, here we are
always performing MemProf ICP in the ThinLTO backends, which is after
the globally dead symbols are removed (via dropDeadSymbols before
starting the optimization pipeline) [1].

For now, guard this with an option (flag is off which means the new
promotion is enabled by default) to simplify debugging or disabling it
if
this proves problematic.

[1] In fact we could also be more aggressive in regular ICP when invoked
in the ThinLTO backend
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants