[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075

shiltian · 2023-09-21T23:57:22Z

Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking
them together because CUDA kernels are not annotated with the attribute. This
patch removes the assumption and added a new counter for those non-OpenMP
kernels.

Fix #66687.

Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel" attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking them together because CUDA kernels are not annotated with the attribute. This patch removes the assumption and added a new counter for those non-OpenMP kernels.

llvmbot · 2023-09-21T23:58:20Z

@llvm/pr-subscribers-llvm-transforms

Changes

Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking
them together because CUDA kernels are not annotated with the attribute. This
patch removes the assumption and added a new counter for those non-OpenMP
kernels.

Full diff: https://github.com/llvm/llvm-project/pull/67075.diff

3 Files Affected:

(modified) llvm/include/llvm/Transforms/IPO/OpenMPOpt.h (+3-2)
(modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+18-11)
(added) llvm/test/Transforms/OpenMP/bug66687.ll (+29)

diff --git a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
index 4ab0035f3b42001..2499c2bbccf4554 100644
--- a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
+++ b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
@@ -29,8 +29,9 @@ bool containsOpenMP(Module &M);
 /// Helper to determine if \p M is a OpenMP target offloading device module.
 bool isOpenMPDevice(Module &M);
 
-/// Return true iff \p Fn is a GPU kernel; \p Fn has the "kernel" attribute.
-bool isKernel(Function &Fn);
+/// Return true iff \p Fn is an OpenMP GPU kernel; \p Fn has the "kernel"
+/// attribute.
+bool isOpenMPKernel(Function &Fn);
 
 /// Get OpenMP device kernels in \p M.
 KernelSet getDeviceKernels(Module &M);
diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
index f945de52920ccfe..14f9c925c2fd33c 100644
--- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
+++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -158,6 +158,8 @@ STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,
           "Number of OpenMP runtime function uses identified");
 STATISTIC(NumOpenMPTargetRegionKernels,
           "Number of OpenMP target region entry points (=kernels) identified");
+STATISTIC(NumNonOpenMPTargetRegionKernels,
+          "Number of non-OpenMP target region kernels identified");
 STATISTIC(NumOpenMPTargetRegionKernelsSPMD,
           "Number of OpenMP target region entry points (=kernels) executed in "
           "SPMD-mode instead of generic-mode");
@@ -989,7 +991,7 @@ struct OpenMPOpt {
   /// Print OpenMP GPU kernels for testing.
   void printKernels() const {
     for (Function *F : SCC) {
-      if (!omp::isKernel(*F))
+      if (!omp::isOpenMPKernel(*F))
         continue;
 
       auto Remark = [&](OptimizationRemarkAnalysis ORA) {
@@ -2030,7 +2032,7 @@ Kernel OpenMPOpt::getUniqueKernelFor(Function &F) {
     // TODO: We should use an AA to create an (optimistic and callback
     //       call-aware) call graph. For now we stick to simple patterns that
     //       are less powerful, basically the worst fixpoint.
-    if (isKernel(F)) {
+    if (isOpenMPKernel(F)) {
       CachedKernel = Kernel(&F);
       return *CachedKernel;
     }
@@ -2721,7 +2723,7 @@ struct AAExecutionDomainFunction : public AAExecutionDomain {
       HandleAlignedBarrier(CB);
 
     // Handle the "kernel end barrier" for kernels too.
-    if (omp::isKernel(*getAnchorScope()))
+    if (omp::isOpenMPKernel(*getAnchorScope()))
       HandleAlignedBarrier(nullptr);
 
     return Changed;
@@ -2974,7 +2976,7 @@ bool AAExecutionDomainFunction::handleCallees(Attributor &A,
   } else {
     // We could not find all predecessors, so this is either a kernel or a
     // function with external linkage (or with some other weird uses).
-    if (omp::isKernel(*getAnchorScope())) {
+    if (omp::isOpenMPKernel(*getAnchorScope())) {
       EntryBBED.IsExecutedByInitialThreadOnly = false;
       EntryBBED.IsReachedFromAlignedBarrierOnly = true;
       EntryBBED.EncounteredNonLocalSideEffect = false;
@@ -3028,7 +3030,7 @@ ChangeStatus AAExecutionDomainFunction::updateImpl(Attributor &A) {
 
   Function *F = getAnchorScope();
   BasicBlock &EntryBB = F->getEntryBlock();
-  bool IsKernel = omp::isKernel(*F);
+  bool IsKernel = omp::isOpenMPKernel(*F);
 
   SmallVector<Instruction *> SyncInstWorklist;
   for (auto &RIt : *RPOT) {
@@ -4167,7 +4169,7 @@ struct AAKernelInfoFunction : AAKernelInfo {
       auto *CB = cast<CallBase>(Kernel->user_back());
       Kernel = CB->getCaller();
     }
-    assert(omp::isKernel(*Kernel) && "Expected kernel function!");
+    assert(omp::isOpenMPKernel(*Kernel) && "Expected kernel function!");
 
     // Check if the kernel is already in SPMD mode, if so, return success.
     ConstantStruct *ExistingKernelEnvC =
@@ -5804,7 +5806,9 @@ PreservedAnalyses OpenMPOptCGSCCPass::run(LazyCallGraph::SCC &C,
   return PreservedAnalyses::all();
 }
 
-bool llvm::omp::isKernel(Function &Fn) { return Fn.hasFnAttribute("kernel"); }
+bool llvm::omp::isOpenMPKernel(Function &Fn) {
+  return Fn.hasFnAttribute("kernel");
+}
 
 KernelSet llvm::omp::getDeviceKernels(Module &M) {
   // TODO: Create a more cross-platform way of determining device kernels.
@@ -5826,10 +5830,13 @@ KernelSet llvm::omp::getDeviceKernels(Module &M) {
     if (!KernelFn)
       continue;
 
-    assert(isKernel(*KernelFn) && "Inconsistent kernel function annotation");
-    ++NumOpenMPTargetRegionKernels;
-
-    Kernels.insert(KernelFn);
+    // We are only interested in OpenMP target regions. Others, such as kernels
+    // generated by CUDA but linked together, are not interesting to this pass.
+    if (isOpenMPKernel(*KernelFn)) {
+      ++NumOpenMPTargetRegionKernels;
+      Kernels.insert(KernelFn);
+    } else
+      ++NumNonOpenMPTargetRegionKernels;
   }
 
   return Kernels;
diff --git a/llvm/test/Transforms/OpenMP/bug66687.ll b/llvm/test/Transforms/OpenMP/bug66687.ll
new file mode 100644
index 000000000000000..e0a9b825a880415
--- /dev/null
+++ b/llvm/test/Transforms/OpenMP/bug66687.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 3
+; RUN: opt < %s -S -passes=openmp-opt | FileCheck %s
+
+source_filename = "bug66687.ll"
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+define weak void @openmp_kernel() "kernel" {
+; CHECK-LABEL: define weak void @openmp_kernel(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    ret void
+;
+  ret void
+}
+
+define weak_odr void @non_openmp_kernel() {
+; CHECK-LABEL: define weak_odr void @non_openmp_kernel() {
+; CHECK-NEXT:    ret void
+;
+  ret void
+}
+
+!llvm.module.flags = !{!0, !1}
+!nvvm.annotations = !{!2, !3}
+
+!0 = !{i32 7, !"openmp", i32 51}
+!1 = !{i32 7, !"openmp-device", i32 51}
+!2 = !{ptr @openmp_kernel, !"kernel", i32 1}
+!3 = !{ptr @non_openmp_kernel, !"kernel", i32 1}

jhuber6 · 2023-09-22T00:00:19Z

Is the desire to not run OpenMPOpt on kernels generated from CUDA? It would probably be much easier to make the CUDA metadata match OpenMP's otherwise.

Also unrelated complaint, I really wish we could unify the kernel annotations between AMDGPU and NVPTX.

shiltian · 2023-09-22T00:02:34Z

Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels.

jhuber6 · 2023-09-22T02:30:12Z

Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels.

We won't run OpenMPOpt if there's no openmp in the module, so this only occurs during LTO linking of merged OpenMP and CUDA modules. Wouldn't this cause issues for regular functions? Are we assume a function shared between an OpenMP and CUDA kernel cannot be optimized?

shiltian · 2023-09-22T03:52:51Z

Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels.

We won't run OpenMPOpt if there's no openmp in the module, so this only occurs during LTO linking of merged OpenMP and CUDA modules. Wouldn't this cause issues for regular functions? Are we assume a function shared between an OpenMP and CUDA kernel cannot be optimized?

We don't assume that, but kernel is different because it is the entry point, so we can ignore them. We can have functions from CUDA and used in OpenMP. That is totally fine, even without the patch.

jdoerfert · 2023-09-23T15:10:58Z

I'm fine with this. @jhuber6 ?

Local branch amd-gfx 7ad30b5 Merged main:3f0bddb56ac3 into amd-gfx:32727405140d Remote branch main 186a4b3 [LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (llvm#67075)

shiltian requested review from jdoerfert and a team September 21, 2023 23:57

llvmbot added the llvm:transforms label Sep 21, 2023

jhuber6 approved these changes Sep 23, 2023

View reviewed changes

shiltian merged commit 186a4b3 into llvm:main Sep 24, 2023

shiltian deleted the shiltian/issue66687 branch September 24, 2023 02:34

phoebewang mentioned this pull request Sep 25, 2023

[X86][RFC] Support AVX10 options #67278

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075

[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075

Uh oh!

shiltian commented Sep 21, 2023 •

edited

Loading

Uh oh!

llvmbot commented Sep 21, 2023

Uh oh!

jhuber6 commented Sep 22, 2023

Uh oh!

shiltian commented Sep 22, 2023

Uh oh!

jhuber6 commented Sep 22, 2023

Uh oh!

shiltian commented Sep 22, 2023

Uh oh!

jdoerfert commented Sep 23, 2023

Uh oh!

Uh oh!

[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075

[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075

Uh oh!

Conversation

shiltian commented Sep 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 21, 2023

Uh oh!

jhuber6 commented Sep 22, 2023

Uh oh!

shiltian commented Sep 22, 2023

Uh oh!

jhuber6 commented Sep 22, 2023

Uh oh!

shiltian commented Sep 22, 2023

Uh oh!

jdoerfert commented Sep 23, 2023

Uh oh!

Uh oh!

shiltian commented Sep 21, 2023 •

edited

Loading