-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[mlir][nvvm] Fix the PTX lowering of wgmma.mma_async #76150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5eef4c0
to
0aff6ae
Compare
@llvm/pr-subscribers-mlir-llvm @llvm/pr-subscribers-mlir Author: Adam Paszke (apaszke) ChangesThe default layout of A and B matrices is row- and column-major respectively, meaning that the transpose flags have opposite meanings between those two operands. Full diff: https://github.com/llvm/llvm-project/pull/76150.diff 1 Files Affected:
diff --git a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
index 4f5d71e10f68c1..a4de89d928e1be 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/NVVMDialect.cpp
@@ -1003,7 +1003,7 @@ void NVVM::WgmmaMmaAsyncOp::getAsmValues(
{makeConstantI32(rewriter, static_cast<int>(getLayoutA())),
mlir::NVVM::PTXRegisterMod::Read});
asmValues.push_back(
- {makeConstantI32(rewriter, static_cast<int>(getLayoutB())),
+ {makeConstantI32(rewriter, 1 - static_cast<int>(getLayoutB())),
mlir::NVVM::PTXRegisterMod::Read});
}
}
|
Can this be exercise by a unit-test? |
Yeah seems like |
The default layout of A and B matrices is row- and column-major respectively, meaning that the transpose flags have opposite meanings between those two operands.
0aff6ae
to
3c22209
Compare
Ok the test should be updated now. It does a col-col matmul, so the right transpose args are |
Good catch. |
Seems like the Windows failure is unrelated to this PR? |
Yes, let me merge this |
The llvm#76150 fixed meaning of `transposeB` in NVVM dialect which was initially implemented with opposite meaning. This PR fixes the lowering of `nvgpu.warpgroup.mma` to NVVM dialect. This will fix two integration tests: gemm_f32_f16_f16_128x128x128.mlir gemm_pred_f32_f16_f16_128x128x128.mlir
The #76150 fixed meaning of `transposeB` in NVVM dialect which was initially implemented with opposite meaning. This PR fixes the lowering of `nvgpu.warpgroup.mma` to NVVM dialect. This will fix two integration tests: gemm_f32_f16_f16_128x128x128.mlir gemm_pred_f32_f16_f16_128x128x128.mlir
The default layout of A and B matrices is row- and column-major respectively, meaning that the transpose flags have opposite meanings between those two operands.