AMDGPU: Teach isOperandLegal about SALU literal restrictions #127626

arsenm · 2025-02-18T13:00:11Z

isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.

This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.

isOperandLegal mostly implemented the VALU operand rules, and largely ignored SALU restrictions. This theoretically avoids folding literals into SALU insts which already have a literal operand. This issue is currently avoided due to a bug in SIFoldOperands; this change will allow using raw operand legality rules. This breaks the formation of s_fmaak_f32 in SIFoldOperands, but it probably should not have been forming there in the first place. TwoAddressInsts or RA should generally handle that, and this only worked by accident.

arsenm · 2025-02-18T13:00:29Z

AMDGPU: Teach isOperandLegal about SALU literal restrictions #127626 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-02-18T13:00:49Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

isOperandLegal mostly implemented the VALU operand rules, and
largely ignored SALU restrictions. This theoretically avoids
folding literals into SALU insts which already have a literal
operand. This issue is currently avoided due to a bug in
SIFoldOperands; this change will allow using raw operand
legality rules.

This breaks the formation of s_fmaak_f32 in SIFoldOperands,
but it probably should not have been forming there in the first
place. TwoAddressInsts or RA should generally handle that,
and this only worked by accident.

Full diff: https://github.com/llvm/llvm-project/pull/127626.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+21-7)
(modified) llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir (+6-3)
(modified) llvm/test/CodeGen/AMDGPU/fold-sgpr-multi-imm.mir (+199)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index ceab6c9dcca34..7dace11d208a0 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -5931,11 +5931,15 @@ bool SIInstrInfo::isOperandLegal(const MachineInstr &MI, unsigned OpIdx,
   if (!MO)
     MO = &MI.getOperand(OpIdx);
 
-  const MachineOperand *UsedLiteral = nullptr;
+  const bool IsInlineConst = !MO->isReg() && isInlineConstant(*MO, OpInfo);
 
-  int ConstantBusLimit = ST.getConstantBusLimit(MI.getOpcode());
-  int LiteralLimit = !isVOP3(MI) || ST.hasVOP3Literal() ? 1 : 0;
-  if (isVALU(MI) && usesConstantBus(MRI, *MO, OpInfo)) {
+  if (isVALU(MI) && !IsInlineConst && usesConstantBus(MRI, *MO, OpInfo)) {
+    const MachineOperand *UsedLiteral = nullptr;
+
+    int ConstantBusLimit = ST.getConstantBusLimit(MI.getOpcode());
+    int LiteralLimit = !isVOP3(MI) || ST.hasVOP3Literal() ? 1 : 0;
+
+    // TODO: Be more permissive with frame indexes.
     if (!MO->isReg() && !isInlineConstant(*MO, OpInfo)) {
       if (!LiteralLimit--)
         return false;
@@ -5974,9 +5978,19 @@ bool SIInstrInfo::isOperandLegal(const MachineInstr &MI, unsigned OpIdx,
           return false;
       }
     }
-  } else if (ST.hasNoF16PseudoScalarTransInlineConstants() && !MO->isReg() &&
-             isF16PseudoScalarTrans(MI.getOpcode()) &&
-             isInlineConstant(*MO, OpInfo)) {
+  } else if (!IsInlineConst && !MO->isReg() && isSALU(MI)) {
+    // There can be at most one literal operand, but it can be repeated.
+    for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
+      if (i == OpIdx)
+        continue;
+      const MachineOperand &Op = MI.getOperand(i);
+      if (!Op.isReg() && !Op.isFI() &&
+          !isInlineConstant(Op, InstDesc.operands()[i]) &&
+          !Op.isIdenticalTo(*MO))
+        return false;
+    }
+  } else if (IsInlineConst && ST.hasNoF16PseudoScalarTransInlineConstants() &&
+             isF16PseudoScalarTrans(MI.getOpcode())) {
     return false;
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir b/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir
index 08693ec9db1d4..2492eb2982aac 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir
+++ b/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir
@@ -133,7 +133,8 @@ body:             |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
-    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAMK_F32 [[COPY]], 1234567890, 1056964608, implicit $mode
+    ; CHECK-NEXT: %noninlinable:sreg_32 = S_MOV_B32 1234567890
+    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAAK_F32 %noninlinable, [[COPY]], 1056964608, implicit $mode
     ; CHECK-NEXT: $sgpr0 = COPY %fma
     %0:sreg_32 = COPY $sgpr0
     %inlinable:sreg_32 = S_MOV_B32 1056964608
@@ -152,7 +153,8 @@ body:             |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
-    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAMK_F32 [[COPY]], 1234567890, 1056964608, implicit $mode
+    ; CHECK-NEXT: %noninlinable:sreg_32 = S_MOV_B32 1234567890
+    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAAK_F32 [[COPY]], %noninlinable, 1056964608, implicit $mode
     ; CHECK-NEXT: $sgpr0 = COPY %fma
     %0:sreg_32 = COPY $sgpr0
     %inlinable:sreg_32 = S_MOV_B32 1056964608
@@ -210,7 +212,8 @@ body:             |
     ; CHECK: liveins: $sgpr0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
-    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAAK_F32 [[COPY]], 1056964608, 1234567890, implicit $mode
+    ; CHECK-NEXT: %noninlinable:sreg_32 = S_MOV_B32 1234567890
+    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAMK_F32 [[COPY]], 1056964608, %noninlinable, implicit $mode
     ; CHECK-NEXT: $sgpr0 = COPY %fma
     %0:sreg_32 = COPY $sgpr0
     %inlinable:sreg_32 = S_MOV_B32 1056964608
diff --git a/llvm/test/CodeGen/AMDGPU/fold-sgpr-multi-imm.mir b/llvm/test/CodeGen/AMDGPU/fold-sgpr-multi-imm.mir
index 5f985605c082d..c8afb89aa272a 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-sgpr-multi-imm.mir
+++ b/llvm/test/CodeGen/AMDGPU/fold-sgpr-multi-imm.mir
@@ -69,3 +69,202 @@ body: |
     %0:sreg_32 = S_MOV_B32 63
     %1:sreg_32 = S_ADD_I32 %stack.0, %0, implicit-def $scc
 ...
+
+# GCN-LABEL: name: test_no_fold_literal_already_inline_lhs{{$}}
+# GCN: %0:sreg_32 = S_MOV_B32 80
+# GCN-NEXT: %1:sreg_32 = S_ADD_I32 70, %0
+---
+name: test_no_fold_literal_already_inline_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    %1:sreg_32 = S_ADD_I32 70, %0, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_no_fold_literal_already_inline_rhs{{$}}
+# GCN: %0:sreg_32 = S_MOV_B32 80
+# GCN-NEXT: %1:sreg_32 = S_ADD_I32 %0, 70
+---
+name: test_no_fold_literal_already_inline_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    %1:sreg_32 = S_ADD_I32 %0, 70, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_fold_literal_inlineimm_lhs{{$}}
+# GCN: %1:sreg_32 = S_ADD_I32 64, 80
+---
+name: test_fold_literal_inlineimm_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    %1:sreg_32 = S_ADD_I32 64, %0, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_fold_literal_inlineimm_rhs{{$}}
+# GCN: %1:sreg_32 = S_ADD_I32 80, 64
+---
+name: test_fold_literal_inlineimm_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    %1:sreg_32 = S_ADD_I32 %0, 64, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_fold_same_literal_2x{{$}}
+# GCN: %2:sreg_32 = S_ADD_I32 70, %1
+---
+name: test_fold_same_literal_2x
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 70
+    %1:sreg_32 = S_MOV_B32 70
+    %2:sreg_32 = S_ADD_I32 %0, %1, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_fold_same_literal_lhs{{$}}
+# GCN: %1:sreg_32 = S_ADD_I32 70, %0
+---
+name: test_fold_same_literal_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 70
+    %1:sreg_32 = S_ADD_I32 70, %0, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_fold_same_literal_rhs{{$}}
+# GCN: %1:sreg_32 = S_ADD_I32 %0, 70
+---
+name: test_fold_same_literal_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 70
+    %1:sreg_32 = S_ADD_I32 %0, 70, implicit-def $scc
+...
+
+
+# GCN-LABEL: name: test_s_cselect_b32_2x_literal_fold{{$}}
+# GCN: %2:sreg_32 = S_CSELECT_B32 70, %1, implicit $scc
+---
+name: test_s_cselect_b32_2x_literal_fold
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 70
+    %1:sreg_32 = S_MOV_B32 80
+    $scc = IMPLICIT_DEF
+    %2:sreg_32 = S_CSELECT_B32 %0, %1, implicit $scc
+...
+
+# GCN-LABEL: name: test_s_cselect_b32_fold_literal_literal_lhs{{$}}
+# GCN: %1:sreg_32 = S_CSELECT_B32 70, %0, implicit $scc
+---
+name: test_s_cselect_b32_fold_literal_literal_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    $scc = IMPLICIT_DEF
+    %1:sreg_32 = S_CSELECT_B32 70, %0, implicit $scc
+...
+
+# GCN-LABEL: name: test_s_cselect_b32_fold_literal_literal_rhs{{$}}
+# GCN: %1:sreg_32 = S_CSELECT_B32 %0, 70, implicit $scc
+---
+name: test_s_cselect_b32_fold_literal_literal_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    $scc = IMPLICIT_DEF
+    %1:sreg_32 = S_CSELECT_B32 %0, 70, implicit $scc
+...
+
+# GCN-LABEL: name: test_s_cselect_b32_fold_literal_inlineimm_lhs{{$}}
+# GCN: %1:sreg_32 = S_CSELECT_B32 64, 80, implicit $scc
+---
+name: test_s_cselect_b32_fold_literal_inlineimm_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    $scc = IMPLICIT_DEF
+    %1:sreg_32 = S_CSELECT_B32 64, %0, implicit $scc
+...
+
+# GCN-LABEL: name: test_s_cselect_b32_fold_literal_inlineimm_rhs{{$}}
+# GCN: %1:sreg_32 = S_CSELECT_B32 80, 64, implicit $scc
+---
+name: test_s_cselect_b32_fold_literal_inlineimm_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    $scc = IMPLICIT_DEF
+    %1:sreg_32 = S_CSELECT_B32 %0, 64, implicit $scc
+...
+
+# GCN-LABEL: name: test_s_cmp_b32_2x_literal_fold{{$}}
+# GCN: S_CMP_EQ_U32 70, %1, implicit-def $scc
+---
+name: test_s_cmp_b32_2x_literal_fold
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 70
+    %1:sreg_32 = S_MOV_B32 80
+    $scc = IMPLICIT_DEF
+    S_CMP_EQ_U32 %0, %1, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_s_cmp_b32_literal_literal_lhs{{$}}
+# GCN: S_CMP_EQ_U32 70, %0, implicit-def $scc
+---
+name: test_s_cmp_b32_literal_literal_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    S_CMP_EQ_U32 70, %0, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_s_cmp_b32_literal_literal_rhs{{$}}
+# GCN: S_CMP_EQ_U32 %0, 70, implicit-def $scc
+---
+name: test_s_cmp_b32_literal_literal_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    S_CMP_EQ_U32 %0, 70, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_s_cmp_b32_literal_inlineimm_lhs{{$}}
+# GCN: S_CMP_EQ_U32 64, 80, implicit-def $scc
+---
+name: test_s_cmp_b32_literal_inlineimm_lhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    S_CMP_EQ_U32 64, %0, implicit-def $scc
+...
+
+# GCN-LABEL: name: test_s_cmp_b32_literal_inlineimm_rhs{{$}}
+# GCN: S_CMP_EQ_U32 80, 64, implicit-def $scc
+---
+name: test_s_cmp_b32_literal_inlineimm_rhs
+tracksRegLiveness: true
+body: |
+  bb.0:
+    %0:sreg_32 = S_MOV_B32 80
+    S_CMP_EQ_U32 %0, 64, implicit-def $scc
+...

arsenm added the backend:AMDGPU label Feb 18, 2025 — with Graphite App

arsenm requested a review from cdevadas February 18, 2025 13:00

arsenm requested review from jayfoad, jrbyrnes, kerbowa, mbrkusanin, perlfu, Pierre-vh, rampitec and shiltian February 18, 2025 13:00

arsenm marked this pull request as ready for review February 18, 2025 13:00

rampitec approved these changes Feb 18, 2025

View reviewed changes

arsenm merged commit 22d65d8 into main Feb 19, 2025
12 checks passed

arsenm deleted the users/arsenm/amdgpu/avoid-double-literal-salu-isOperandLegal branch February 19, 2025 03:53

joaosaffran mentioned this pull request Feb 26, 2025

[libc++] Set feature-test macro __cpp_lib_atomic_float (#127559) joaosaffran/llvm-project#1

Closed

jayfoad mentioned this pull request May 21, 2025

AMDGPU: Delete seemingly dead s_fmaak_f32/s_fmamk_f32 folding code #140580

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Teach isOperandLegal about SALU literal restrictions #127626

AMDGPU: Teach isOperandLegal about SALU literal restrictions #127626

Uh oh!

arsenm commented Feb 18, 2025

Uh oh!

arsenm commented Feb 18, 2025

Uh oh!

llvmbot commented Feb 18, 2025

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Teach isOperandLegal about SALU literal restrictions #127626

AMDGPU: Teach isOperandLegal about SALU literal restrictions #127626

Uh oh!

Conversation

arsenm commented Feb 18, 2025

Uh oh!

arsenm commented Feb 18, 2025

Uh oh!

llvmbot commented Feb 18, 2025

Uh oh!

Uh oh!

Uh oh!