Skip to content

AMDGPU: Do not try to commute instruction with same input register #127562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 27, 2025

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Feb 18, 2025

There's little point to trying to commute an instruction if the
two operands are already the same.

This avoids an assertion in a future patch, but this likely isn't the
correct fix. The worklist management in SIFoldOperands is dodgy, and
we should probably fix it to work like PeepholeOpt (i.e. stop looking
at use lists, and fold from users). This is an extension of the already
handled special case which it's trying to avoid folding an instruction
which is already being folded.

@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

There's little point to trying to commute an instruction if the
two operands are already the same.

This avoids an assertion in a future patch, but this likely isn't the
correct fix. The worklist management in SIFoldOperands is dodgy, and
we should probably fix it to work like PeepholeOpt (i.e. stop looking
at use lists, and fold from users). This is an extension of the already
handled special case which it's trying to avoid folding an instruction
which is already being folded.


Full diff: https://github.com/llvm/llvm-project/pull/127562.diff

6 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+11-1)
  • (modified) llvm/test/CodeGen/AMDGPU/dag-divergence.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+16-16)
  • (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+4-4)
  • (added) llvm/test/CodeGen/AMDGPU/si-fold-operands-commute-same-operands-assert.mir (+65)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 8492bb2c3518b..84773349e0ca0 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -691,11 +691,21 @@ bool SIFoldOperandsImpl::tryAddToFoldList(
     if (!CanCommute)
       return false;
 
+    MachineOperand &Op = MI->getOperand(OpNo);
+    MachineOperand &CommutedOp = MI->getOperand(CommuteOpNo);
+
     // One of operands might be an Imm operand, and OpNo may refer to it after
     // the call of commuteInstruction() below. Such situations are avoided
     // here explicitly as OpNo must be a register operand to be a candidate
     // for memory folding.
-    if (!MI->getOperand(OpNo).isReg() || !MI->getOperand(CommuteOpNo).isReg())
+    if (!Op.isReg() || !CommutedOp.isReg())
+      return false;
+
+    // The same situation with an immediate could reproduce if both inputs are
+    // the same register.
+    if (Op.isReg() && CommutedOp.isReg() &&
+        (Op.getReg() == CommutedOp.getReg() &&
+         Op.getSubReg() == CommutedOp.getSubReg()))
       return false;
 
     if (!TII->commuteInstruction(*MI, false, OpNo, CommuteOpNo))
diff --git a/llvm/test/CodeGen/AMDGPU/dag-divergence.ll b/llvm/test/CodeGen/AMDGPU/dag-divergence.ll
index dfc28539ea814..0f573fcc6deaa 100644
--- a/llvm/test/CodeGen/AMDGPU/dag-divergence.ll
+++ b/llvm/test/CodeGen/AMDGPU/dag-divergence.ll
@@ -37,8 +37,8 @@ define amdgpu_kernel void @flat_load_maybe_divergent(ptr addrspace(4) %k, ptr %f
 ; GCN-LABEL: {{^}}wide_carry_divergence_error:
 ; GCN: v_sub_u32_e32
 ; GCN: v_subb_u32_e32
-; GCN: v_subbrev_u32_e32
-; GCN: v_subbrev_u32_e32
+; GCN: v_subb_u32_e32
+; GCN: v_subb_u32_e32
 define <2 x i128> @wide_carry_divergence_error(i128 %arg) {
   %i = call i128 @llvm.ctlz.i128(i128 %arg, i1 false)
   %i1 = sub i128 0, %i
diff --git a/llvm/test/CodeGen/AMDGPU/div_i128.ll b/llvm/test/CodeGen/AMDGPU/div_i128.ll
index 59bc7f332bf1e..3d9043d30c1ce 100644
--- a/llvm/test/CodeGen/AMDGPU/div_i128.ll
+++ b/llvm/test/CodeGen/AMDGPU/div_i128.ll
@@ -65,8 +65,8 @@ define i128 @v_sdiv_i128_vv(i128 %lhs, i128 %rhs) {
 ; GFX9-NEXT:    v_cndmask_b32_e64 v7, v7, 0, vcc
 ; GFX9-NEXT:    v_sub_co_u32_e32 v2, vcc, v2, v3
 ; GFX9-NEXT:    v_subb_co_u32_e32 v3, vcc, v4, v7, vcc
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v4, vcc, 0, v5, vcc
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v5, vcc, 0, v5, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v4, vcc, 0, v5, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v5, vcc, 0, v5, vcc
 ; GFX9-NEXT:    s_mov_b64 s[6:7], 0x7f
 ; GFX9-NEXT:    v_cmp_lt_u64_e32 vcc, s[6:7], v[2:3]
 ; GFX9-NEXT:    v_mov_b32_e32 v18, v16
@@ -2355,8 +2355,8 @@ define i128 @v_udiv_i128_vv(i128 %lhs, i128 %rhs) {
 ; GFX9-NEXT:    v_sub_co_u32_e32 v12, vcc, v8, v9
 ; GFX9-NEXT:    v_subb_co_u32_e32 v13, vcc, v10, v13, vcc
 ; GFX9-NEXT:    v_mov_b32_e32 v8, 0
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v14, vcc, 0, v8, vcc
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v15, vcc, 0, v8, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v14, vcc, 0, v8, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v15, vcc, 0, v8, vcc
 ; GFX9-NEXT:    v_cmp_lt_u64_e32 vcc, s[6:7], v[12:13]
 ; GFX9-NEXT:    v_or_b32_e32 v10, v13, v15
 ; GFX9-NEXT:    v_cndmask_b32_e64 v8, 0, 1, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/div_v2i128.ll b/llvm/test/CodeGen/AMDGPU/div_v2i128.ll
index 41999b249a0e8..a58c1e7883b0b 100644
--- a/llvm/test/CodeGen/AMDGPU/div_v2i128.ll
+++ b/llvm/test/CodeGen/AMDGPU/div_v2i128.ll
@@ -66,10 +66,10 @@ define <2 x i128> @v_sdiv_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v2, vcc, v2, v10
 ; SDAG-NEXT:    v_subb_u32_e32 v3, vcc, v8, v9, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v8, 0x7f, v2
-; SDAG-NEXT:    v_subbrev_u32_e32 v10, vcc, 0, v18, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v10, vcc, 0, v18, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[10:11], v[2:3]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v11, vcc, 0, v18, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v11, vcc, 0, v18, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v8, v8, v10
 ; SDAG-NEXT:    v_or_b32_e32 v9, v3, v11
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[10:11]
@@ -263,10 +263,10 @@ define <2 x i128> @v_sdiv_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v4, vcc, v4, v13
 ; SDAG-NEXT:    v_subb_u32_e32 v5, vcc, v9, v12, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v9, 0x7f, v4
-; SDAG-NEXT:    v_subbrev_u32_e32 v10, vcc, 0, v8, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v10, vcc, 0, v8, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[10:11], v[4:5]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v12, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v11, vcc, 0, v8, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v11, vcc, 0, v8, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v8, v9, v10
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[10:11]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v13, 0, 1, vcc
@@ -872,10 +872,10 @@ define <2 x i128> @v_udiv_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v22, vcc, v16, v18
 ; SDAG-NEXT:    v_subb_u32_e32 v23, vcc, v20, v17, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v16, 0x7f, v22
-; SDAG-NEXT:    v_subbrev_u32_e32 v24, vcc, 0, v28, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v24, vcc, 0, v28, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[8:9], v[22:23]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v18, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v25, vcc, 0, v28, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v25, vcc, 0, v28, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v16, v16, v24
 ; SDAG-NEXT:    v_or_b32_e32 v17, v23, v25
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[24:25]
@@ -1047,10 +1047,10 @@ define <2 x i128> @v_udiv_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v0, vcc, v0, v2
 ; SDAG-NEXT:    v_subb_u32_e32 v1, vcc, v8, v1, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v2, 0x7f, v0
-; SDAG-NEXT:    v_subbrev_u32_e32 v20, vcc, 0, v24, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v20, vcc, 0, v24, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[8:9], v[0:1]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v8, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v21, vcc, 0, v24, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v21, vcc, 0, v24, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v2, v2, v20
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[20:21]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v9, 0, 1, vcc
@@ -1619,10 +1619,10 @@ define <2 x i128> @v_srem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v10, vcc, v8, v10
 ; SDAG-NEXT:    v_subb_u32_e32 v11, vcc, v11, v18, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v8, 0x7f, v10
-; SDAG-NEXT:    v_subbrev_u32_e32 v18, vcc, 0, v19, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v18, vcc, 0, v19, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[10:11], v[10:11]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v19, vcc, 0, v19, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v19, vcc, 0, v19, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v8, v8, v18
 ; SDAG-NEXT:    v_or_b32_e32 v9, v11, v19
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[18:19]
@@ -1814,10 +1814,10 @@ define <2 x i128> @v_srem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v10, vcc, v10, v19
 ; SDAG-NEXT:    v_subb_u32_e32 v11, vcc, v13, v12, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v14, 0x7f, v10
-; SDAG-NEXT:    v_subbrev_u32_e32 v12, vcc, 0, v18, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v12, vcc, 0, v18, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[10:11], v[10:11]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v13, vcc, 0, v18, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v13, vcc, 0, v18, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v14, v14, v12
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[12:13]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v18, 0, 1, vcc
@@ -2502,10 +2502,10 @@ define <2 x i128> @v_urem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v18, vcc, v16, v18
 ; SDAG-NEXT:    v_subb_u32_e32 v19, vcc, v20, v17, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v16, 0x7f, v18
-; SDAG-NEXT:    v_subbrev_u32_e32 v20, vcc, 0, v28, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v20, vcc, 0, v28, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[8:9], v[18:19]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v21, vcc, 0, v28, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v21, vcc, 0, v28, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v16, v16, v20
 ; SDAG-NEXT:    v_or_b32_e32 v17, v19, v21
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[20:21]
@@ -2677,10 +2677,10 @@ define <2 x i128> @v_urem_v2i128_vv(<2 x i128> %lhs, <2 x i128> %rhs) {
 ; SDAG-NEXT:    v_sub_i32_e32 v16, vcc, v16, v18
 ; SDAG-NEXT:    v_subb_u32_e32 v17, vcc, v20, v17, vcc
 ; SDAG-NEXT:    v_xor_b32_e32 v18, 0x7f, v16
-; SDAG-NEXT:    v_subbrev_u32_e32 v20, vcc, 0, v28, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v20, vcc, 0, v28, vcc
 ; SDAG-NEXT:    v_cmp_lt_u64_e64 s[4:5], s[8:9], v[16:17]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[4:5]
-; SDAG-NEXT:    v_subbrev_u32_e32 v21, vcc, 0, v28, vcc
+; SDAG-NEXT:    v_subb_u32_e32 v21, vcc, 0, v28, vcc
 ; SDAG-NEXT:    v_or_b32_e32 v18, v18, v20
 ; SDAG-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[20:21]
 ; SDAG-NEXT:    v_cndmask_b32_e64 v23, 0, 1, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/rem_i128.ll b/llvm/test/CodeGen/AMDGPU/rem_i128.ll
index afe1f33d15e42..d25226d15a029 100644
--- a/llvm/test/CodeGen/AMDGPU/rem_i128.ll
+++ b/llvm/test/CodeGen/AMDGPU/rem_i128.ll
@@ -66,8 +66,8 @@ define i128 @v_srem_i128_vv(i128 %lhs, i128 %rhs) {
 ; GFX9-NEXT:    v_cndmask_b32_e64 v11, v11, 0, vcc
 ; GFX9-NEXT:    v_sub_co_u32_e32 v6, vcc, v6, v7
 ; GFX9-NEXT:    v_subb_co_u32_e32 v7, vcc, v8, v11, vcc
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v8, vcc, 0, v9, vcc
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v9, vcc, 0, v9, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v8, vcc, 0, v9, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v9, vcc, 0, v9, vcc
 ; GFX9-NEXT:    s_mov_b64 s[6:7], 0x7f
 ; GFX9-NEXT:    v_cmp_lt_u64_e32 vcc, s[6:7], v[6:7]
 ; GFX9-NEXT:    v_or_b32_e32 v12, v7, v9
@@ -1541,8 +1541,8 @@ define i128 @v_urem_i128_vv(i128 %lhs, i128 %rhs) {
 ; GFX9-NEXT:    v_sub_co_u32_e32 v8, vcc, v8, v9
 ; GFX9-NEXT:    v_subb_co_u32_e32 v9, vcc, v10, v12, vcc
 ; GFX9-NEXT:    v_mov_b32_e32 v11, 0
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v10, vcc, 0, v11, vcc
-; GFX9-NEXT:    v_subbrev_co_u32_e32 v11, vcc, 0, v11, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v10, vcc, 0, v11, vcc
+; GFX9-NEXT:    v_subb_co_u32_e32 v11, vcc, 0, v11, vcc
 ; GFX9-NEXT:    v_cmp_lt_u64_e32 vcc, s[6:7], v[8:9]
 ; GFX9-NEXT:    v_cndmask_b32_e64 v12, 0, 1, vcc
 ; GFX9-NEXT:    v_cmp_ne_u64_e32 vcc, 0, v[10:11]
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-operands-commute-same-operands-assert.mir b/llvm/test/CodeGen/AMDGPU/si-fold-operands-commute-same-operands-assert.mir
new file mode 100644
index 0000000000000..8d20ada8a14d5
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-operands-commute-same-operands-assert.mir
@@ -0,0 +1,65 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -run-pass=peephole-opt,si-fold-operands -o - %s | FileCheck %s
+
+# Check for assert when trying later folds after commuting an
+# instruction where both operands are the same register. This depended
+# on use list ordering, so we need to run peephole-opt first to
+# reproduce.
+
+---
+name:            commute_add_same_inputs_assert
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+
+    ; CHECK-LABEL: name: commute_add_same_inputs_assert
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; CHECK-NEXT: [[V_ADD_CO_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADD_CO_U32_e32 0, killed [[COPY]], implicit-def $vcc, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_ADDC_U32_e32_:%[0-9]+]]:vgpr_32 = V_ADDC_U32_e32 0, [[V_MOV_B32_e32_]], implicit-def $vcc, implicit $vcc, implicit $exec
+    ; CHECK-NEXT: $vgpr4 = COPY [[V_ADDC_U32_e32_]]
+    ; CHECK-NEXT: SI_RETURN implicit $vgpr4
+    %0:vgpr_32 = COPY $vgpr0
+    %1:sreg_64 = S_MOV_B64 0
+    %2:sreg_32 = COPY %1.sub0
+    %3:vgpr_32 = V_ADD_CO_U32_e32 %2, killed %0, implicit-def $vcc, implicit $exec
+    %4:vgpr_32 = COPY %2
+    %5:vgpr_32 = COPY %2
+    %6:vgpr_32 = V_ADDC_U32_e32 %4, %5, implicit-def $vcc, implicit $vcc, implicit $exec
+    $vgpr4 = COPY %6
+    SI_RETURN implicit $vgpr4
+
+...
+
+---
+name:            commute_sub_same_inputs_assert
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+
+    ; CHECK-LABEL: name: commute_sub_same_inputs_assert
+    ; CHECK: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
+    ; CHECK-NEXT: [[V_SUB_CO_U32_e32_:%[0-9]+]]:vgpr_32 = V_SUB_CO_U32_e32 0, killed [[COPY]], implicit-def $vcc, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_SUBB_U32_e32_:%[0-9]+]]:vgpr_32 = V_SUBB_U32_e32 0, [[V_MOV_B32_e32_]], implicit-def $vcc, implicit $vcc, implicit $exec
+    ; CHECK-NEXT: $vgpr4 = COPY [[V_SUBB_U32_e32_]]
+    ; CHECK-NEXT: SI_RETURN implicit $vgpr4
+    %0:vgpr_32 = COPY $vgpr0
+    %1:sreg_64 = S_MOV_B64 0
+    %2:sreg_32 = COPY %1.sub0
+    %3:vgpr_32 = V_SUB_CO_U32_e32 %2, killed %0, implicit-def $vcc, implicit $exec
+    %4:vgpr_32 = COPY %2
+    %5:vgpr_32 = COPY %2
+    %6:vgpr_32 = V_SUBB_U32_e32 %5, %4, implicit-def $vcc, implicit $vcc, implicit $exec
+    $vgpr4 = COPY %6
+    SI_RETURN implicit $vgpr4
+
+...
+
+

@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subregisters-immediate-folds-si-peephole-opt branch from d978a96 to e5b8e8e Compare February 18, 2025 04:26
@arsenm arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from b28280b to 2f11ad0 Compare February 18, 2025 04:26
@arsenm arsenm force-pushed the users/arsenm/amdgpu/handle-subregisters-immediate-folds-si-peephole-opt branch from e5b8e8e to e929391 Compare February 18, 2025 10:17
Base automatically changed from users/arsenm/amdgpu/handle-subregisters-immediate-folds-si-peephole-opt to main February 18, 2025 10:19
@arsenm arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from 2f11ad0 to 1d3dce0 Compare February 18, 2025 10:21
arsenm added a commit that referenced this pull request Feb 18, 2025
There were 2 parallel fold check mechanisms, so consistently use the
fold list. The worklist management here is still not good. Other types
of folds are not using it, and we should probably rewrite the pass to
look more like peephole-opt.

This should be an alternative fix to skipping commute if the operands
are the same (#127562). The new test is still not broken as-is, but
demonstrates failures in a future patch.
arsenm added a commit that referenced this pull request Feb 25, 2025
#127612)

There were 2 parallel fold check mechanisms, so consistently use the
fold list. The worklist management here is still not good. Other types
of folds are not using it, and we should probably rewrite the pass to
look more like peephole-opt.

This should be an alternative fix to skipping commute if the operands
are the same (#127562). The new test is still not broken as-is, but
demonstrates failures in a future patch.
@arsenm arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from 1d3dce0 to e9a741f Compare February 26, 2025 16:37
@arsenm arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from e9a741f to 7e136af Compare February 27, 2025 00:18
Copy link
Contributor Author

arsenm commented Feb 27, 2025

Merge activity

  • Feb 26, 8:34 PM EST: A user started a stack merge that includes this pull request via Graphite.
  • Feb 26, 8:36 PM EST: Graphite rebased this pull request as part of a merge.
  • Feb 26, 8:39 PM EST: A user merged this pull request with Graphite.

There's little point to trying to commute an instruction if the
two operands are already the same.

This avoids an assertion in a future patch, but this likely isn't the
correct fix. The worklist management in SIFoldOperands is dodgy, and
we should probably fix it to work like PeepholeOpt (i.e. stop looking
at use lists, and fold from users). This is an extension of the already
handled special case which it's trying to avoid folding an instruction
which is already being folded.
@arsenm arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from 7e136af to 0edaa87 Compare February 27, 2025 01:35
@arsenm arsenm merged commit 4be4133 into main Feb 27, 2025
6 of 10 checks passed
@arsenm arsenm deleted the users/arsenm/do-not-commute-instruction-with-same-reg branch February 27, 2025 01:39
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025
…lvm#127562)

There's little point to trying to commute an instruction if the
two operands are already the same.

This avoids an assertion in a future patch, but this likely isn't the
correct fix. The worklist management in SIFoldOperands is dodgy, and
we should probably fix it to work like PeepholeOpt (i.e. stop looking
at use lists, and fold from users). This is an extension of the already
handled special case which it's trying to avoid folding an instruction
which is already being folded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants