Skip to content

[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2023

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Sep 21, 2023

Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking
them together because CUDA kernels are not annotated with the attribute. This
patch removes the assumption and added a new counter for those non-OpenMP
kernels.

Fix #66687.

Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking
them together because CUDA kernels are not annotated with the attribute. This
patch removes the assumption and added a new counter for those non-OpenMP
kernels.
@shiltian shiltian requested review from jdoerfert and a team September 21, 2023 23:57
@llvmbot
Copy link
Member

llvmbot commented Sep 21, 2023

@llvm/pr-subscribers-llvm-transforms

Changes

Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking
them together because CUDA kernels are not annotated with the attribute. This
patch removes the assumption and added a new counter for those non-OpenMP
kernels.


Full diff: https://github.com/llvm/llvm-project/pull/67075.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Transforms/IPO/OpenMPOpt.h (+3-2)
  • (modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+18-11)
  • (added) llvm/test/Transforms/OpenMP/bug66687.ll (+29)
diff --git a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
index 4ab0035f3b42001..2499c2bbccf4554 100644
--- a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
+++ b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
@@ -29,8 +29,9 @@ bool containsOpenMP(Module &M);
 /// Helper to determine if \p M is a OpenMP target offloading device module.
 bool isOpenMPDevice(Module &M);
 
-/// Return true iff \p Fn is a GPU kernel; \p Fn has the "kernel" attribute.
-bool isKernel(Function &Fn);
+/// Return true iff \p Fn is an OpenMP GPU kernel; \p Fn has the "kernel"
+/// attribute.
+bool isOpenMPKernel(Function &Fn);
 
 /// Get OpenMP device kernels in \p M.
 KernelSet getDeviceKernels(Module &M);
diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
index f945de52920ccfe..14f9c925c2fd33c 100644
--- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
+++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -158,6 +158,8 @@ STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,
           "Number of OpenMP runtime function uses identified");
 STATISTIC(NumOpenMPTargetRegionKernels,
           "Number of OpenMP target region entry points (=kernels) identified");
+STATISTIC(NumNonOpenMPTargetRegionKernels,
+          "Number of non-OpenMP target region kernels identified");
 STATISTIC(NumOpenMPTargetRegionKernelsSPMD,
           "Number of OpenMP target region entry points (=kernels) executed in "
           "SPMD-mode instead of generic-mode");
@@ -989,7 +991,7 @@ struct OpenMPOpt {
   /// Print OpenMP GPU kernels for testing.
   void printKernels() const {
     for (Function *F : SCC) {
-      if (!omp::isKernel(*F))
+      if (!omp::isOpenMPKernel(*F))
         continue;
 
       auto Remark = [&](OptimizationRemarkAnalysis ORA) {
@@ -2030,7 +2032,7 @@ Kernel OpenMPOpt::getUniqueKernelFor(Function &F) {
     // TODO: We should use an AA to create an (optimistic and callback
     //       call-aware) call graph. For now we stick to simple patterns that
     //       are less powerful, basically the worst fixpoint.
-    if (isKernel(F)) {
+    if (isOpenMPKernel(F)) {
       CachedKernel = Kernel(&F);
       return *CachedKernel;
     }
@@ -2721,7 +2723,7 @@ struct AAExecutionDomainFunction : public AAExecutionDomain {
       HandleAlignedBarrier(CB);
 
     // Handle the "kernel end barrier" for kernels too.
-    if (omp::isKernel(*getAnchorScope()))
+    if (omp::isOpenMPKernel(*getAnchorScope()))
       HandleAlignedBarrier(nullptr);
 
     return Changed;
@@ -2974,7 +2976,7 @@ bool AAExecutionDomainFunction::handleCallees(Attributor &A,
   } else {
     // We could not find all predecessors, so this is either a kernel or a
     // function with external linkage (or with some other weird uses).
-    if (omp::isKernel(*getAnchorScope())) {
+    if (omp::isOpenMPKernel(*getAnchorScope())) {
       EntryBBED.IsExecutedByInitialThreadOnly = false;
       EntryBBED.IsReachedFromAlignedBarrierOnly = true;
       EntryBBED.EncounteredNonLocalSideEffect = false;
@@ -3028,7 +3030,7 @@ ChangeStatus AAExecutionDomainFunction::updateImpl(Attributor &A) {
 
   Function *F = getAnchorScope();
   BasicBlock &EntryBB = F->getEntryBlock();
-  bool IsKernel = omp::isKernel(*F);
+  bool IsKernel = omp::isOpenMPKernel(*F);
 
   SmallVector<Instruction *> SyncInstWorklist;
   for (auto &RIt : *RPOT) {
@@ -4167,7 +4169,7 @@ struct AAKernelInfoFunction : AAKernelInfo {
       auto *CB = cast<CallBase>(Kernel->user_back());
       Kernel = CB->getCaller();
     }
-    assert(omp::isKernel(*Kernel) && "Expected kernel function!");
+    assert(omp::isOpenMPKernel(*Kernel) && "Expected kernel function!");
 
     // Check if the kernel is already in SPMD mode, if so, return success.
     ConstantStruct *ExistingKernelEnvC =
@@ -5804,7 +5806,9 @@ PreservedAnalyses OpenMPOptCGSCCPass::run(LazyCallGraph::SCC &C,
   return PreservedAnalyses::all();
 }
 
-bool llvm::omp::isKernel(Function &Fn) { return Fn.hasFnAttribute("kernel"); }
+bool llvm::omp::isOpenMPKernel(Function &Fn) {
+  return Fn.hasFnAttribute("kernel");
+}
 
 KernelSet llvm::omp::getDeviceKernels(Module &M) {
   // TODO: Create a more cross-platform way of determining device kernels.
@@ -5826,10 +5830,13 @@ KernelSet llvm::omp::getDeviceKernels(Module &M) {
     if (!KernelFn)
       continue;
 
-    assert(isKernel(*KernelFn) && "Inconsistent kernel function annotation");
-    ++NumOpenMPTargetRegionKernels;
-
-    Kernels.insert(KernelFn);
+    // We are only interested in OpenMP target regions. Others, such as kernels
+    // generated by CUDA but linked together, are not interesting to this pass.
+    if (isOpenMPKernel(*KernelFn)) {
+      ++NumOpenMPTargetRegionKernels;
+      Kernels.insert(KernelFn);
+    } else
+      ++NumNonOpenMPTargetRegionKernels;
   }
 
   return Kernels;
diff --git a/llvm/test/Transforms/OpenMP/bug66687.ll b/llvm/test/Transforms/OpenMP/bug66687.ll
new file mode 100644
index 000000000000000..e0a9b825a880415
--- /dev/null
+++ b/llvm/test/Transforms/OpenMP/bug66687.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 3
+; RUN: opt < %s -S -passes=openmp-opt | FileCheck %s
+
+source_filename = "bug66687.ll"
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+define weak void @openmp_kernel() "kernel" {
+; CHECK-LABEL: define weak void @openmp_kernel(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    ret void
+;
+  ret void
+}
+
+define weak_odr void @non_openmp_kernel() {
+; CHECK-LABEL: define weak_odr void @non_openmp_kernel() {
+; CHECK-NEXT:    ret void
+;
+  ret void
+}
+
+!llvm.module.flags = !{!0, !1}
+!nvvm.annotations = !{!2, !3}
+
+!0 = !{i32 7, !"openmp", i32 51}
+!1 = !{i32 7, !"openmp-device", i32 51}
+!2 = !{ptr @openmp_kernel, !"kernel", i32 1}
+!3 = !{ptr @non_openmp_kernel, !"kernel", i32 1}

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 22, 2023

Is the desire to not run OpenMPOpt on kernels generated from CUDA? It would probably be much easier to make the CUDA metadata match OpenMP's otherwise.

Also unrelated complaint, I really wish we could unify the kernel annotations between AMDGPU and NVPTX.

@shiltian
Copy link
Contributor Author

Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels.

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 22, 2023

Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels.

We won't run OpenMPOpt if there's no openmp in the module, so this only occurs during LTO linking of merged OpenMP and CUDA modules. Wouldn't this cause issues for regular functions? Are we assume a function shared between an OpenMP and CUDA kernel cannot be optimized?

@shiltian
Copy link
Contributor Author

Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels.

We won't run OpenMPOpt if there's no openmp in the module, so this only occurs during LTO linking of merged OpenMP and CUDA modules. Wouldn't this cause issues for regular functions? Are we assume a function shared between an OpenMP and CUDA kernel cannot be optimized?

We don't assume that, but kernel is different because it is the entry point, so we can ignore them. We can have functions from CUDA and used in OpenMP. That is totally fine, even without the patch.

@jdoerfert
Copy link
Member

I'm fine with this. @jhuber6 ?

@shiltian shiltian merged commit 186a4b3 into llvm:main Sep 24, 2023
@shiltian shiltian deleted the shiltian/issue66687 branch September 24, 2023 02:34
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Sep 28, 2023
Local branch amd-gfx 7ad30b5 Merged main:3f0bddb56ac3 into amd-gfx:32727405140d
Remote branch main 186a4b3 [LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (llvm#67075)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[OpenMP] LTO on CUDA target fails with "inconsistent kernel function annotation" assertion
4 participants