[NVPTX] Make nvptx mma instructions convergent. #96521

weiweichen · 2024-06-24T17:38:44Z

We are running into NVPTX backend generating wrong code for an input:

%0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...)
if laneid == 0:
  ret
else:
  store %0

The backend reorder the instruction (as an effect of MachineSink pass) to

if laneid == 0:
  ret
else:
  %0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...)
  store %0

This is incorrect because mma is a warp instruction which needs all threads to sync before performing the operation instead of being guarded by a specific thread id. It should be similar as the shuffle instruction shfl in terms of warp level sync, and shfl is marked as isConvergent = true.

Apply isConvergent = true to mma instructions.

llvmbot · 2024-06-24T17:39:13Z

@llvm/pr-subscribers-backend-nvptx

Author: weiwei chen (weiweichen)

Changes

We are running into NVPTX backend generating wrong code for an input:

%0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...)
if laneid == 0:
  ret
else:
  store %0

The backend reorder the instruction (as an effect of MachineSink pass) to

if laneid == 0:
  ret
else:
  %0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...)
  store %0

This is incorrect because mma is a warp instruction which needs all threads to sync before performing the operation instead of being guarded by a specific thread id. It should be similar as the shuffle instruction shfl in terms of warp level sync, and shfl is marked as isConvergent = true.

Apply isConvergent = true to mma instructions.

Full diff: https://github.com/llvm/llvm-project/pull/96521.diff

1 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+4)

diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index a65170e56aa24..a19ec21826b82 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -6724,6 +6724,7 @@ class WMMA_MMA<WMMA_REGINFO FragA, WMMA_REGINFO FragB,
                   # FragC.regstring # ";";
 }
 
+let isConvergent = true in {
 defset list<WMMA_INSTR> WMMAs  = {
   foreach layout_a = ["row", "col"] in {
     foreach layout_b = ["row", "col"] in {
@@ -6745,6 +6746,7 @@ defset list<WMMA_INSTR> WMMAs  = {
     } // layout_b
   } // layout_a
 } // defset
+}
 
 // MMA
 class MMA<WMMA_REGINFO FragA, WMMA_REGINFO FragB,
@@ -6774,6 +6776,7 @@ class MMA<WMMA_REGINFO FragA, WMMA_REGINFO FragB,
                   # FragC.regstring # ";";
 }
 
+let isConvergent = true in {
 defset list<WMMA_INSTR> MMAs  = {
   foreach layout_a = ["row", "col"] in {
     foreach layout_b = ["row", "col"] in {
@@ -6793,6 +6796,7 @@ defset list<WMMA_INSTR> MMAs  = {
     } // layout_b
   } // layout_a
 } // defset
+}
 
 //
 // ldmatrix.sync.aligned.m8n8[|.trans][|.shared].b16

Mogball

This makes sense to me. These instructions can't be sunk across conditional boundaries. Please make sure to get a review from someone who normally touches the NVPTX backend!

qcolombet

LGTM.

isConvergent is indeed missing on these instructions.

Nice finding!

qcolombet · 2024-06-24T18:45:25Z

llvm/test/CodeGen/NVPTX/mma-no-sink-after-laneid-check.ll

+
+; COM: llvm.nvvm.mma should not sink to the next block and gets reordered to be after laneid check.
+; CHECK-LABEL: no_reorder_mma_and_laneid_check
+define dso_local void @no_reorder_mma_and_laneid_check(ptr %0, ptr %1, i64 %2) #0 {


Please get rid of the implicit variables (run opt -passes=instnamer on your input IR and update the file)

Yep, update! Thank you for the opt tip!

…eic/mark-nvvm-mma-with-side-effect

We are running into NVPTX backend generating wrong code for an input: ``` %0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...) if laneid == 0: ret else: store %0 ``` The backend reorder the instruction (as an effect of `MachineSink` pass) to ``` if laneid == 0: ret else: %0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...) store %0 ``` This is incorrect because `mma` is a warp instruction which needs all threads to sync before performing the operation instead of being guarded by a specific thread id. It should be similar as the shuffle instruction `shfl` in terms of warp level sync, and `shfl` is marked as `isConvergent = true`. Apply `isConvergent = true` to `mma` instructions.

Make nvptx mma instructions convergent.

2f63676

weiweichen added the backend:NVPTX label Jun 24, 2024

Add a test.

924a2fb

Mogball approved these changes Jun 24, 2024

View reviewed changes

Mogball reviewed Jun 24, 2024

View reviewed changes

justinfargnoli requested review from jlebar and durga4github June 24, 2024 18:31

justinfargnoli assigned weiweichen Jun 24, 2024

qcolombet approved these changes Jun 24, 2024

View reviewed changes

weiweichen added 2 commits June 24, 2024 14:57

Update test file to simplify it.

5d29fb2

Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…

f3a72b6

…eic/mark-nvvm-mma-with-side-effect

weiweichen merged commit b0e9b00 into llvm:main Jun 25, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Make nvptx mma instructions convergent. #96521

[NVPTX] Make nvptx mma instructions convergent. #96521

Uh oh!

weiweichen commented Jun 24, 2024

Uh oh!

llvmbot commented Jun 24, 2024

Uh oh!

Mogball left a comment

Uh oh!

qcolombet left a comment

Uh oh!

qcolombet Jun 24, 2024

Uh oh!

weiweichen Jun 24, 2024

Uh oh!

Uh oh!

Uh oh!

[NVPTX] Make nvptx mma instructions convergent. #96521

[NVPTX] Make nvptx mma instructions convergent. #96521

Uh oh!

Conversation

weiweichen commented Jun 24, 2024

Uh oh!

llvmbot commented Jun 24, 2024

Uh oh!

Mogball left a comment

Choose a reason for hiding this comment

Uh oh!

qcolombet left a comment

Choose a reason for hiding this comment

Uh oh!

qcolombet Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

weiweichen Jun 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!