Skip to content

[mlir][nvvm] Fix the PTX lowering of wgmma.mma_async #76150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 22, 2023

Conversation

apaszke
Copy link
Member

@apaszke apaszke commented Dec 21, 2023

The default layout of A and B matrices is row- and column-major respectively, meaning that the transpose flags have opposite meanings between those two operands.

@llvmbot
Copy link
Member

llvmbot commented Dec 21, 2023

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Adam Paszke (apaszke)

Changes

The default layout of A and B matrices is row- and column-major respectively, meaning that the transpose flags have opposite meanings between those two operands.


Full diff: https://github.com/llvm/llvm-project/pull/76150.diff

1 Files Affected:

  • (modified) mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp (+1-1)
diff --git a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
index 4f5d71e10f68c1..a4de89d928e1be 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
@@ -1003,7 +1003,7 @@ void NVVM::WgmmaMmaAsyncOp::getAsmValues(
         {makeConstantI32(rewriter, static_cast<int>(getLayoutA())),
          mlir::NVVM::PTXRegisterMod::Read});
     asmValues.push_back(
-        {makeConstantI32(rewriter, static_cast<int>(getLayoutB())),
+        {makeConstantI32(rewriter, 1 - static_cast<int>(getLayoutB())),
          mlir::NVVM::PTXRegisterMod::Read});
   }
 }

@joker-eph
Copy link
Collaborator

Can this be exercise by a unit-test?

@apaszke
Copy link
Member Author

apaszke commented Dec 21, 2023

Yeah seems like Conversion/NVVMToLLVM/nvvm-to-llvm.mlir is catching this change, so I'll need to update that as well.

The default layout of A and B matrices is row- and column-major
respectively, meaning that the transpose flags have opposite meanings
between those two operands.
@apaszke
Copy link
Member Author

apaszke commented Dec 21, 2023

Ok the test should be updated now. It does a col-col matmul, so the right transpose args are 1, 0, not 1, 1 as it did previously.

@grypp grypp self-requested a review December 21, 2023 21:19
@grypp
Copy link
Member

grypp commented Dec 21, 2023

Good catch.
We need to change the lowering nvgpu.warpgroup.mma as well.

@apaszke
Copy link
Member Author

apaszke commented Dec 22, 2023

Seems like the Windows failure is unrelated to this PR?

@grypp
Copy link
Member

grypp commented Dec 22, 2023

Yes, let me merge this

@grypp grypp merged commit 85b2327 into llvm:main Dec 22, 2023
grypp added a commit to grypp/llvm-project that referenced this pull request Jan 24, 2024
The llvm#76150 fixed meaning of `transposeB` in NVVM dialect which was initially implemented with opposite meaning.

This PR fixes the lowering of `nvgpu.warpgroup.mma` to NVVM dialect.

This will fix two integration tests:
gemm_f32_f16_f16_128x128x128.mlir
gemm_pred_f32_f16_f16_128x128x128.mlir
grypp added a commit that referenced this pull request Jan 25, 2024
The #76150 fixed meaning of `transposeB` in NVVM dialect which was
initially implemented with opposite meaning.

This PR fixes the lowering of `nvgpu.warpgroup.mma` to NVVM dialect.

This will fix two integration tests:
gemm_f32_f16_f16_128x128x128.mlir
gemm_pred_f32_f16_f16_128x128x128.mlir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants