Skip to content

[MLIR][NVGPU] Fix the cga_cluster.mlir test #112191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

durga4github
Copy link
Contributor

This patch fixes the sm90 cluster test by:

  • Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op conversion pattern should actually be for the
    ClusterDimBlocks Op. This addresses the compilation error for this test.
  • The grid-size should be (4,4,1) instead of (2,2,1). This passes the scf-if check against the threshold of 3 below and actually
    generates the required prints from the GPU.

@llvmbot
Copy link
Member

llvmbot commented Oct 14, 2024

@llvm/pr-subscribers-mlir-gpu

@llvm/pr-subscribers-mlir

Author: Durgadoss R (durga4github)

Changes

This patch fixes the sm90 cluster test by:

  • Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op conversion pattern should actually be for the
    ClusterDimBlocks Op. This addresses the compilation error for this test.
  • The grid-size should be (4,4,1) instead of (2,2,1). This passes the scf-if check against the threshold of 3 below and actually
    generates the required prints from the GPU.

Full diff: https://github.com/llvm/llvm-project/pull/112191.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp (+2-2)
  • (modified) mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir (+1-1)
diff --git a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
index e83574b7342725..8638ae603a0bea 100644
--- a/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
+++ b/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
@@ -373,8 +373,8 @@ void mlir::populateGpuToNVVMConversionPatterns(
       NVVM::BlockInClusterIdYOp, NVVM::BlockInClusterIdZOp>>(
       converter, IndexKind::Other, IntrType::Id);
   patterns.add<gpu::index_lowering::OpLowering<
-      gpu::ClusterDimOp, NVVM::ClusterDimXOp, NVVM::ClusterDimYOp,
-      NVVM::ClusterDimZOp>>(converter, IndexKind::Other, IntrType::Dim);
+      gpu::ClusterDimBlocksOp, NVVM::ClusterDimBlocksXOp, NVVM::ClusterDimBlocksYOp,
+      NVVM::ClusterDimBlocksZOp>>(converter, IndexKind::Other, IntrType::Dim);
   patterns.add<gpu::index_lowering::OpLowering<
       gpu::BlockIdOp, NVVM::BlockIdXOp, NVVM::BlockIdYOp, NVVM::BlockIdZOp>>(
       converter, IndexKind::Grid, IntrType::Id);
diff --git a/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir b/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
index 5c11d80178f727..c70c940564a264 100644
--- a/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
+++ b/mlir/test/Integration/GPU/CUDA/sm90/cga_cluster.mlir
@@ -18,7 +18,7 @@ module attributes {gpu.container_module} {
     return
   }
   gpu.module @gpumodule {
-    gpu.func @kernel_cluster() kernel attributes {gpu.known_block_size = array<i32: 1, 1, 1>, gpu.known_grid_size = array<i32: 2, 2, 1>} {
+    gpu.func @kernel_cluster() kernel attributes {gpu.known_block_size = array<i32: 1, 1, 1>, gpu.known_grid_size = array<i32: 4, 4, 1>} {
       %cidX = gpu.cluster_id  x
       %cidY = gpu.cluster_id  y
       %cidZ = gpu.cluster_id  z

@durga4github
Copy link
Contributor Author

@grypp, Please help review this change.

Copy link

github-actions bot commented Oct 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

This patch fixes the sm90 cluster test by:
* Fixing a typo in LowerGpuOpsToNVVMOps where
  one of the ClusterDim Op conversion pattern should
  actually be for the ClusterDimBlocks Op.
  This addresses the compilation error for this test.
* The grid-size should be (4,4,1) instead of (2,2,1).
  This passes the scf-if check against the threshold of
  3 below and actually generates the required prints
  from the GPU.

Signed-off-by: Durgadoss R <[email protected]>
@durga4github durga4github force-pushed the durgadossr/nvgpu_test_fix2 branch from fc903a4 to 56fcfdf Compare October 14, 2024 13:02
@durga4github
Copy link
Contributor Author

Addressed clang-format issues,

@durga4github
Copy link
Contributor Author

Builds are clean, merging this.

@durga4github durga4github merged commit a8b5115 into llvm:main Oct 14, 2024
8 checks passed
@durga4github durga4github deleted the durgadossr/nvgpu_test_fix2 branch October 14, 2024 14:14
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
This patch fixes the sm90 cluster test by:
* Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op
   conversion pattern should actually be for the
   ClusterDimBlocks Op. This addresses the compilation error for this test.
* The grid-size should be (4,4,1) instead of (2,2,1). This passes the
   scf-if check against the threshold of 3 below and actually
   generates the required prints from the GPU.

Signed-off-by: Durgadoss R <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants