Skip to content

Commit 4d71cbb

Browse files
[LLVM][NVPTX] Add support for griddepcontrol instruction
This commit adds support for griddepcontrol PTX instruction with tests under griddepcontrol.ll
1 parent 6ab9daf commit 4d71cbb

File tree

4 files changed

+56
-0
lines changed

4 files changed

+56
-0
lines changed

llvm/docs/NVPTXUsage.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -911,6 +911,29 @@ including that ``wgmma.mma_async`` instruction is undefined behavior.
911911
For more information, refer PTX ISA
912912
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#asynchronous-warpgroup-level-matrix-instructions-wgmma-wait-group>`_.
913913

914+
'``llvm.nvvm.griddepcontrol.*``'
915+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
916+
917+
Syntax:
918+
"""""""
919+
920+
.. code-block:: llvm
921+
922+
declare void @llvm.nvvm.griddepcontrol.launch_dependents()
923+
declare void @llvm.nvvm.griddepcontrol.wait()
924+
925+
Overview:
926+
"""""""""
927+
928+
The ``griddepcontrol`` intrinsics allows the dependent grids and prerequisite grids as defined by the runtime, to control execution in the following way:
929+
930+
``griddepcontrol.launch_dependents`` intrinsic signals that the dependents can be scheduled, before the current grid completes. The intrinsic can be invoked by multiple threads in the current CTA and repeated invocations of the intrinsic will have no additional side effects past that of the first invocation.
931+
932+
``griddepcontrol.wait`` intrinsic causes the executing thread to wait until all prerequisite grids in flight have completed and all the memory operations from the prerequisite grids are performed and made visible to the current grid.
933+
934+
For more information, refer
935+
`PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__.
936+
914937
Other Intrinsics
915938
----------------
916939

llvm/include/llvm/IR/IntrinsicsNVVM.td

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5033,4 +5033,8 @@ def int_nvvm_cp_async_bulk_shared_cta_to_global
50335033
NoCapture<ArgIndex<0>>, NoCapture<ArgIndex<1>>,
50345034
ImmArg<ArgIndex<4>>]>;
50355035

5036+
5037+
def int_nvvm_griddepcontrol_launch_dependents: Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects]>;
5038+
def int_nvvm_griddepcontrol_wait: Intrinsic<[], [], [IntrNoMem, IntrHasSideEffects]>;
5039+
50365040
} // let TargetPrefix = "nvvm"

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7557,4 +7557,16 @@ def INT_NVVM_WGMMA_WAIT_GROUP_SYNC_ALIGNED : NVPTXInst<(outs), (ins i64imm:$n),
75577557
[(int_nvvm_wgmma_wait_group_sync_aligned timm:$n)]>, Requires<[hasSM90a, hasPTX<80>]>;
75587558
} // isConvergent = true
75597559

7560+
def GRIDDEPCONTROL_LAUNCH_DEPENDENTS :
7561+
NVPTXInst<(outs), (ins),
7562+
"griddepcontrol.launch_dependents;",
7563+
[(int_nvvm_griddepcontrol_launch_dependents)]>,
7564+
Requires<[hasSM<90>, hasPTX<78>]>;
7565+
7566+
def GRIDDEPCONTROL_WAIT :
7567+
NVPTXInst<(outs), (ins),
7568+
"griddepcontrol.wait;",
7569+
[(int_nvvm_griddepcontrol_wait)]>,
7570+
Requires<[hasSM<90>, hasPTX<78>]>;
7571+
75607572
def INT_EXIT : NVPTXInst<(outs), (ins), "exit;", [(int_nvvm_exit)]>;
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc < %s -mcpu=sm_90 -march=nvptx64 | FileCheck %s
3+
; RUN: %if ptxas-11.8 %{ llc < %s -mcpu=sm_90 -march=nvptx64 | %ptxas-verify %}
4+
5+
define void @griddepcontrol() {
6+
; CHECK-LABEL: griddepcontrol(
7+
; CHECK: {
8+
; CHECK-EMPTY:
9+
; CHECK-EMPTY:
10+
; CHECK-NEXT: // %bb.0:
11+
; CHECK-NEXT: griddepcontrol.launch_dependents;
12+
; CHECK-NEXT: griddepcontrol.wait;
13+
; CHECK-NEXT: ret;
14+
call void @llvm.nvvm.griddepcontrol.launch.dependents()
15+
call void @llvm.nvvm.griddepcontrol.wait()
16+
ret void
17+
}

0 commit comments

Comments
 (0)