Skip to content

[MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops #95245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 14, 2024

Conversation

schwarzschild-radius
Copy link
Contributor

@schwarzschild-radius schwarzschild-radius commented Jun 12, 2024

This commit adds support for gpu.cluster_dim_blocks and gpu.cluster_block_id Ops to represent number of blocks per cluster and block id inside a cluster respectively. Also, fixed the description of gpu.cluster_dim Op and updated the cga_cluster.mlir test file to use gpu.cluster_dim_blocks

@llvmbot
Copy link
Member

llvmbot commented Jun 12, 2024

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir-gpu

Author: Pradeep Kumar (schwarzschild-radius)

Changes

This commit adds support for gpu.cluster_dim_blocks Op to represent number of blocks per cluster and updated the description of gpu.cluster_dim Op. Also, updated the cga_cluster.mlir test file to use gpu.cluster_dim_blocks


Full diff: https://github.com/llvm/llvm-project/pull/95245.diff

5 Files Affected:

  • (modified) mlir/include/mlir/Dialect/GPU/IR/GPUOps.td (+14-1)
  • (modified) mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td (+3-3)
  • (modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+2)
  • (modified) mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp (+5)
  • (modified) mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir (+3-3)
diff --git a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
index eb81b6469746f..e7e55ab42d51f 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
@@ -70,7 +70,7 @@ class GPU_IndexOp<string mnemonic, list<Trait> traits = []> :
 
 def GPU_ClusterDimOp : GPU_IndexOp<"cluster_dim"> {
   let description = [{
-    Returns the number of thread blocks in the cluster along
+    Returns the number of cluster identifiers per grid along
     the x, y, or z `dimension`.
 
     Example:
@@ -81,6 +81,19 @@ def GPU_ClusterDimOp : GPU_IndexOp<"cluster_dim"> {
   }];
 }
 
+def GPU_ClusterDimBlocksOp : GPU_IndexOp<"cluster_dim_blocks"> {
+  let description = [{
+    Returns the number of thread blocks in the cluster along
+    the x, y, or z `dimension`.
+
+    Example:
+
+    ```mlir
+    %cDimBlocksX = gpu.cluster_dim_blocks x
+    ```
+  }];
+}
+
 def GPU_ClusterIdOp : GPU_IndexOp<"cluster_id"> {
   let description = [{
     Returns the cluster id, i.e. the index of the current cluster within the
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 4daeeab093863..4d48b3de7a57e 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -160,9 +160,9 @@ def NVVM_ClusterDimZOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.nclusterid.z">;
 def NVVM_BlockInClusterIdXOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.ctaid.x">;
 def NVVM_BlockInClusterIdYOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.ctaid.y">;
 def NVVM_BlockInClusterIdZOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.ctaid.z">;
-def NVVM_GridInClusterDimXOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.nctaid.x">;
-def NVVM_GridInClusterDimYOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.nctaid.y">;
-def NVVM_GridInClusterDimZOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.nctaid.z">;
+def NVVM_ClusterDimBlocksXOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.nctaid.x">;
+def NVVM_ClusterDimBlocksYOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.nctaid.y">;
+def NVVM_ClusterDimBlocksZOp : NVVM_SpecialRegisterOp<"read.ptx.sreg.cluster.nctaid.z">;
 
 //===----------------------------------------------------------------------===//
 // CTA index and across Cluster dimensions
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index b95fba20a00cb..811f9efb62951 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -344,6 +344,8 @@ void mlir::populateGpuToNVVMConversionPatterns(LLVMTypeConverter &converter,
                                   NVVM::ClusterIdYOp, NVVM::ClusterIdZOp>,
       GPUIndexIntrinsicOpLowering<gpu::ClusterDimOp, NVVM::ClusterDimXOp,
                                   NVVM::ClusterDimYOp, NVVM::ClusterDimZOp>,
+      GPUIndexIntrinsicOpLowering<gpu::ClusterDimBlocksOp, NVVM::ClusterDimBlocksXOp,
+                                  NVVM::ClusterDimBlocksYOp, NVVM::ClusterDimBlocksZOp>,
       GPUIndexIntrinsicOpLowering<gpu::BlockIdOp, NVVM::BlockIdXOp,
                                   NVVM::BlockIdYOp, NVVM::BlockIdZOp>,
       GPUIndexIntrinsicOpLowering<gpu::GridDimOp, NVVM::GridDimXOp,
diff --git a/mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp b/mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
index 69017efb9a0e6..80ea102c03bd2 100644
--- a/mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
+++ b/mlir/lib/Dialect/GPU/IR/InferIntRangeInterfaceImpls.cpp
@@ -86,6 +86,11 @@ static std::optional<uint64_t> getKnownLaunchDim(Op op, LaunchDims type) {
 
 void ClusterDimOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
                                      SetIntRangeFn setResultRange) {
+  setResultRange(getResult(), getIndexRange(1, kMaxDim));
+}
+
+void ClusterDimBlocksOp::inferResultRanges(ArrayRef<ConstantIntRanges>,
+                                     SetIntRangeFn setResultRange) {
   setResultRange(getResult(), getIndexRange(1, kMaxClusterDim));
 }
 
diff --git a/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir b/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
index 025282ec0d688..5c11d80178f72 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
+++ b/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
@@ -22,9 +22,9 @@ module attributes {gpu.container_module} {
       %cidX = gpu.cluster_id  x
       %cidY = gpu.cluster_id  y
       %cidZ = gpu.cluster_id  z
-      %cdimX = gpu.cluster_dim  x
-      %cdimY = gpu.cluster_dim  y
-      %cdimZ = gpu.cluster_dim  z
+      %cdimX = gpu.cluster_dim_blocks  x
+      %cdimY = gpu.cluster_dim_blocks  y
+      %cdimZ = gpu.cluster_dim_blocks  z
       %bidX = gpu.block_id  x
       %bidY = gpu.block_id  y
       %bidZ = gpu.block_id  z

Copy link

github-actions bot commented Jun 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@schwarzschild-radius
Copy link
Contributor Author

CC += @durga4github for viz

Copy link
Member

@grypp grypp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I left some comments

This commit adds support for `gpu.cluster_dim_blocks` and `gpu.cluster_block_id`
Ops to represent number of blocks per cluster and block id inside a cluster respectively.
Also, fixed the description of `gpu.cluster_dim` Op and updated the `cga_cluster.mlir`
test file to use `gpu.cluster_dim_blocks`
@schwarzschild-radius schwarzschild-radius changed the title [MLIR][GPU] Add gpu.cluster_dim_blocks Op to represent number of blocks per cluster [MLIR][GPU] Add gpu.cluster_dim_blocks and gpu.cluster_block_id Ops Jun 12, 2024
Copy link
Contributor

@durga4github durga4github left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@schwarzschild-radius
Copy link
Contributor Author

@durga4github can you please merge it?

@durga4github durga4github merged commit bd6568c into llvm:main Jun 14, 2024
7 checks passed
EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this pull request Aug 13, 2024
…lvm#95245)

This commit adds support for `gpu.cluster_dim_blocks` and
`gpu.cluster_block_id` Ops to represent number of blocks per cluster and
block id inside a cluster respectively. Also, fixed the description of
`gpu.cluster_dim` Op and updated the `cga_cluster.mlir` test file to use
`gpu.cluster_dim_blocks`

Co-authored-by: pradeepku <[email protected]>
Co-authored-by: Guray Ozen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants