-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM][NVPTX]: Add aligned versions of cluster barriers #77940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PTX Doc for these intrinsics: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster This patch adds the '.aligned' variants of the barrier.cluster intrinsics. lit tests are added to verify the generated PTX. Signed-off-by: Durgadoss R <[email protected]>
@llvm/pr-subscribers-llvm-ir Author: Durgadoss R (durga4github) ChangesPTX Doc link for these intrinsics: This patch adds the '.aligned' variants of the Full diff: https://github.com/llvm/llvm-project/pull/77940.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index cf50f2a59f602f..4665a1169ef4ee 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -1372,6 +1372,14 @@ let TargetPrefix = "nvvm" in {
def int_nvvm_barrier_cluster_wait :
Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>;
+ // 'aligned' versions of the above barrier.cluster.* intrinsics
+ def int_nvvm_barrier_cluster_arrive_aligned :
+ Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>;
+ def int_nvvm_barrier_cluster_arrive_relaxed_aligned :
+ Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>;
+ def int_nvvm_barrier_cluster_wait_aligned :
+ Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>;
+
// Membar
def int_nvvm_membar_cta : ClangBuiltin<"__nvvm_membar_cta">,
Intrinsic<[], [], [IntrNoCallback]>;
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 6b062a7f39127f..c5dbe350e44472 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -132,6 +132,7 @@ def INT_BARRIER_SYNC_CNT_II : NVPTXInst<(outs), (ins i32imm:$id, i32imm:$cnt),
"barrier.sync \t$id, $cnt;",
[(int_nvvm_barrier_sync_cnt imm:$id, imm:$cnt)]>,
Requires<[hasPTX<60>, hasSM<30>]>;
+
class INT_BARRIER_CLUSTER<string variant, Intrinsic Intr,
list<Predicate> Preds = [hasPTX<78>, hasSM<90>]>:
NVPTXInst<(outs), (ins), "barrier.cluster."# variant #";", [(Intr)]>,
@@ -145,6 +146,15 @@ def barrier_cluster_arrive_relaxed:
def barrier_cluster_wait:
INT_BARRIER_CLUSTER<"wait", int_nvvm_barrier_cluster_wait>;
+// 'aligned' versions of the cluster barrier intrinsics
+def barrier_cluster_arrive_aligned:
+ INT_BARRIER_CLUSTER<"arrive.aligned", int_nvvm_barrier_cluster_arrive_aligned>;
+def barrier_cluster_arrive_relaxed_aligned:
+ INT_BARRIER_CLUSTER<"arrive.relaxed.aligned",
+ int_nvvm_barrier_cluster_arrive_relaxed_aligned, [hasPTX<80>, hasSM<90>]>;
+def barrier_cluster_wait_aligned:
+ INT_BARRIER_CLUSTER<"wait.aligned", int_nvvm_barrier_cluster_wait_aligned>;
+
class SHFL_INSTR<bit sync, string mode, string reg, bit return_pred,
bit offset_imm, bit mask_imm, bit threadmask_imm>
: NVPTXInst<(outs), (ins), "?", []> {
diff --git a/llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll b/llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll
index a157616db9fb4f..181fbf21129102 100644
--- a/llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll
+++ b/llvm/test/CodeGen/NVPTX/intrinsics-sm90.ll
@@ -133,6 +133,16 @@ define void @test_barrier_cluster() {
ret void
}
+; CHECK-LABEL: test_barrier_cluster_aligned(
+define void @test_barrier_cluster_aligned() {
+; CHECK: barrier.cluster.arrive.aligned;
+ call void @llvm.nvvm.barrier.cluster.arrive.aligned()
+; CHECK: barrier.cluster.arrive.relaxed.aligned;
+ call void @llvm.nvvm.barrier.cluster.arrive.relaxed.aligned()
+; CHECK: barrier.cluster.wait.aligned;
+ call void @llvm.nvvm.barrier.cluster.wait.aligned()
+ ret void
+}
declare i1 @llvm.nvvm.isspacep.shared.cluster(ptr %p);
declare ptr @llvm.nvvm.mapa(ptr %p, i32 %r);
@@ -153,4 +163,7 @@ declare i1 @llvm.nvvm.is_explicit_cluster()
declare void @llvm.nvvm.barrier.cluster.arrive()
declare void @llvm.nvvm.barrier.cluster.arrive.relaxed()
declare void @llvm.nvvm.barrier.cluster.wait()
+declare void @llvm.nvvm.barrier.cluster.arrive.aligned()
+declare void @llvm.nvvm.barrier.cluster.arrive.relaxed.aligned()
+declare void @llvm.nvvm.barrier.cluster.wait.aligned()
declare void @llvm.nvvm.fence.sc.cluster()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. But I would wait @Artem-B to review
Could one of you please merge it? |
PTX Doc link for these intrinsics:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster
This patch adds the '.aligned' variants of the
barrier.cluster intrinsics. lit tests are added
to verify the generated PTX.