-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions #67075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel" attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking them together because CUDA kernels are not annotated with the attribute. This patch removes the assumption and added a new counter for those non-OpenMP kernels.
@llvm/pr-subscribers-llvm-transforms ChangesCurrent OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel" Full diff: https://github.com/llvm/llvm-project/pull/67075.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
index 4ab0035f3b42001..2499c2bbccf4554 100644
--- a/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
+++ b/llvm/include/llvm/Transforms/IPO/OpenMPOpt.h
@@ -29,8 +29,9 @@ bool containsOpenMP(Module &M);
/// Helper to determine if \p M is a OpenMP target offloading device module.
bool isOpenMPDevice(Module &M);
-/// Return true iff \p Fn is a GPU kernel; \p Fn has the "kernel" attribute.
-bool isKernel(Function &Fn);
+/// Return true iff \p Fn is an OpenMP GPU kernel; \p Fn has the "kernel"
+/// attribute.
+bool isOpenMPKernel(Function &Fn);
/// Get OpenMP device kernels in \p M.
KernelSet getDeviceKernels(Module &M);
diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
index f945de52920ccfe..14f9c925c2fd33c 100644
--- a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
+++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -158,6 +158,8 @@ STATISTIC(NumOpenMPRuntimeFunctionUsesIdentified,
"Number of OpenMP runtime function uses identified");
STATISTIC(NumOpenMPTargetRegionKernels,
"Number of OpenMP target region entry points (=kernels) identified");
+STATISTIC(NumNonOpenMPTargetRegionKernels,
+ "Number of non-OpenMP target region kernels identified");
STATISTIC(NumOpenMPTargetRegionKernelsSPMD,
"Number of OpenMP target region entry points (=kernels) executed in "
"SPMD-mode instead of generic-mode");
@@ -989,7 +991,7 @@ struct OpenMPOpt {
/// Print OpenMP GPU kernels for testing.
void printKernels() const {
for (Function *F : SCC) {
- if (!omp::isKernel(*F))
+ if (!omp::isOpenMPKernel(*F))
continue;
auto Remark = [&](OptimizationRemarkAnalysis ORA) {
@@ -2030,7 +2032,7 @@ Kernel OpenMPOpt::getUniqueKernelFor(Function &F) {
// TODO: We should use an AA to create an (optimistic and callback
// call-aware) call graph. For now we stick to simple patterns that
// are less powerful, basically the worst fixpoint.
- if (isKernel(F)) {
+ if (isOpenMPKernel(F)) {
CachedKernel = Kernel(&F);
return *CachedKernel;
}
@@ -2721,7 +2723,7 @@ struct AAExecutionDomainFunction : public AAExecutionDomain {
HandleAlignedBarrier(CB);
// Handle the "kernel end barrier" for kernels too.
- if (omp::isKernel(*getAnchorScope()))
+ if (omp::isOpenMPKernel(*getAnchorScope()))
HandleAlignedBarrier(nullptr);
return Changed;
@@ -2974,7 +2976,7 @@ bool AAExecutionDomainFunction::handleCallees(Attributor &A,
} else {
// We could not find all predecessors, so this is either a kernel or a
// function with external linkage (or with some other weird uses).
- if (omp::isKernel(*getAnchorScope())) {
+ if (omp::isOpenMPKernel(*getAnchorScope())) {
EntryBBED.IsExecutedByInitialThreadOnly = false;
EntryBBED.IsReachedFromAlignedBarrierOnly = true;
EntryBBED.EncounteredNonLocalSideEffect = false;
@@ -3028,7 +3030,7 @@ ChangeStatus AAExecutionDomainFunction::updateImpl(Attributor &A) {
Function *F = getAnchorScope();
BasicBlock &EntryBB = F->getEntryBlock();
- bool IsKernel = omp::isKernel(*F);
+ bool IsKernel = omp::isOpenMPKernel(*F);
SmallVector<Instruction *> SyncInstWorklist;
for (auto &RIt : *RPOT) {
@@ -4167,7 +4169,7 @@ struct AAKernelInfoFunction : AAKernelInfo {
auto *CB = cast<CallBase>(Kernel->user_back());
Kernel = CB->getCaller();
}
- assert(omp::isKernel(*Kernel) && "Expected kernel function!");
+ assert(omp::isOpenMPKernel(*Kernel) && "Expected kernel function!");
// Check if the kernel is already in SPMD mode, if so, return success.
ConstantStruct *ExistingKernelEnvC =
@@ -5804,7 +5806,9 @@ PreservedAnalyses OpenMPOptCGSCCPass::run(LazyCallGraph::SCC &C,
return PreservedAnalyses::all();
}
-bool llvm::omp::isKernel(Function &Fn) { return Fn.hasFnAttribute("kernel"); }
+bool llvm::omp::isOpenMPKernel(Function &Fn) {
+ return Fn.hasFnAttribute("kernel");
+}
KernelSet llvm::omp::getDeviceKernels(Module &M) {
// TODO: Create a more cross-platform way of determining device kernels.
@@ -5826,10 +5830,13 @@ KernelSet llvm::omp::getDeviceKernels(Module &M) {
if (!KernelFn)
continue;
- assert(isKernel(*KernelFn) && "Inconsistent kernel function annotation");
- ++NumOpenMPTargetRegionKernels;
-
- Kernels.insert(KernelFn);
+ // We are only interested in OpenMP target regions. Others, such as kernels
+ // generated by CUDA but linked together, are not interesting to this pass.
+ if (isOpenMPKernel(*KernelFn)) {
+ ++NumOpenMPTargetRegionKernels;
+ Kernels.insert(KernelFn);
+ } else
+ ++NumNonOpenMPTargetRegionKernels;
}
return Kernels;
diff --git a/llvm/test/Transforms/OpenMP/bug66687.ll b/llvm/test/Transforms/OpenMP/bug66687.ll
new file mode 100644
index 000000000000000..e0a9b825a880415
--- /dev/null
+++ b/llvm/test/Transforms/OpenMP/bug66687.ll
@@ -0,0 +1,29 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 3
+; RUN: opt < %s -S -passes=openmp-opt | FileCheck %s
+
+source_filename = "bug66687.ll"
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+define weak void @openmp_kernel() "kernel" {
+; CHECK-LABEL: define weak void @openmp_kernel(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: ret void
+;
+ ret void
+}
+
+define weak_odr void @non_openmp_kernel() {
+; CHECK-LABEL: define weak_odr void @non_openmp_kernel() {
+; CHECK-NEXT: ret void
+;
+ ret void
+}
+
+!llvm.module.flags = !{!0, !1}
+!nvvm.annotations = !{!2, !3}
+
+!0 = !{i32 7, !"openmp", i32 51}
+!1 = !{i32 7, !"openmp-device", i32 51}
+!2 = !{ptr @openmp_kernel, !"kernel", i32 1}
+!3 = !{ptr @non_openmp_kernel, !"kernel", i32 1}
|
Is the desire to not run OpenMPOpt on kernels generated from CUDA? It would probably be much easier to make the CUDA metadata match OpenMP's otherwise. Also unrelated complaint, I really wish we could unify the kernel annotations between AMDGPU and NVPTX. |
Given this pass is called OpenMPOpt, IMO we want to ignore CUDA kernels. |
We won't run OpenMPOpt if there's no |
We don't assume that, but kernel is different because it is the entry point, so we can ignore them. We can have functions from CUDA and used in OpenMP. That is totally fine, even without the patch. |
I'm fine with this. @jhuber6 ? |
Local branch amd-gfx 7ad30b5 Merged main:3f0bddb56ac3 into amd-gfx:32727405140d Remote branch main 186a4b3 [LLVM][OpenMP] Allow OpenMPOpt to handle non-OpenMP target regions (llvm#67075)
Current OpenMPOpt assumes all kernels are OpenMP kernels (aka. with "kernel"
attribute). This doesn't hold if we mix OpenMP code and CUDA code by lingking
them together because CUDA kernels are not annotated with the attribute. This
patch removes the assumption and added a new counter for those non-OpenMP
kernels.
Fix #66687.