[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases #122267

jchlanda · 2025-01-09T12:41:01Z

This came up in downstream static analysis - as a dead code.

Admittedly, it depends on what the intention was when checking for if (NumOpcodes == 2 && IsB32) and I took a guess that for certain cases the selection should take place.

If that's incorrect, that whole if statement can be removed, as it is after a check for: if (NumOpcodes < 4)

llvmbot · 2025-01-09T12:42:25Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jakub Chlanda (jchlanda)

Changes

This came up in downstream static analysis - as a dead code.

Admittedly, it depends on what the intention was when checking for if (NumOpcodes == 2 && IsB32) and I took a guess that for certain cases the selection should take place.

If that's incorrect, that whole if statement can be removed, as it is after a check for: if (NumOpcodes < 4)

Full diff: https://github.com/llvm/llvm-project/pull/122267.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+6-8)
(modified) llvm/test/CodeGen/AMDGPU/bitop3.ll (+11-15)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 041b9b4d66f63f..72d43ac9fbc245 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -3782,13 +3782,7 @@ bool AMDGPUInstructionSelector::selectBITOP3(MachineInstr &MI) const {
   if (NumOpcodes < 2 || Src.empty())
     return false;
 
-  // For a uniform case threshold should be higher to account for moves between
-  // VGPRs and SGPRs. It needs one operand in a VGPR, rest two can be in SGPRs
-  // and a readtfirstlane after.
-  if (NumOpcodes < 4)
-    return false;
-
-  bool IsB32 = MRI->getType(DstReg) == LLT::scalar(32);
+  const bool IsB32 = MRI->getType(DstReg) == LLT::scalar(32);
   if (NumOpcodes == 2 && IsB32) {
     // Avoid using BITOP3 for OR3, XOR3, AND_OR. This is not faster but makes
     // asm more readable. This cannot be modeled with AddedComplexity because
@@ -3797,7 +3791,11 @@ bool AMDGPUInstructionSelector::selectBITOP3(MachineInstr &MI) const {
         mi_match(MI, *MRI, m_GOr(m_GOr(m_Reg(), m_Reg()), m_Reg())) ||
         mi_match(MI, *MRI, m_GOr(m_GAnd(m_Reg(), m_Reg()), m_Reg())))
       return false;
-  }
+  } else if (NumOpcodes < 4)
+    // For a uniform case threshold should be higher to account for moves
+    // between VGPRs and SGPRs. It needs one operand in a VGPR, rest two can be
+    // in SGPRs and a readtfirstlane after.
+    return false;
 
   unsigned Opc = IsB32 ? AMDGPU::V_BITOP3_B32_e64 : AMDGPU::V_BITOP3_B16_e64;
   unsigned CBL = STI.getConstantBusLimit(Opc);
diff --git a/llvm/test/CodeGen/AMDGPU/bitop3.ll b/llvm/test/CodeGen/AMDGPU/bitop3.ll
index b08ab5a2dc4223..eb149a93ee3288 100644
--- a/llvm/test/CodeGen/AMDGPU/bitop3.ll
+++ b/llvm/test/CodeGen/AMDGPU/bitop3.ll
@@ -52,8 +52,7 @@ define amdgpu_ps float @not_and_and_and(i32 %a, i32 %b, i32 %c) {
 ;
 ; GFX950-GISEL-LABEL: not_and_and_and:
 ; GFX950-GISEL:       ; %bb.0:
-; GFX950-GISEL-NEXT:    v_not_b32_e32 v0, v0
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v2
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v0, v0, v2, v0 bitop3:0xc
 ; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX950-GISEL-NEXT:    ; return to shader part epilog
   %nota = xor i32 %a, -1
@@ -103,8 +102,7 @@ define amdgpu_ps float @and_and_not_and(i32 %a, i32 %b, i32 %c) {
 ;
 ; GFX950-GISEL-LABEL: and_and_not_and:
 ; GFX950-GISEL:       ; %bb.0:
-; GFX950-GISEL-NEXT:    v_not_b32_e32 v2, v2
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v2
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v0, v0, v2, v0 bitop3:0x30
 ; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX950-GISEL-NEXT:    ; return to shader part epilog
   %notc = xor i32 %c, -1
@@ -122,8 +120,7 @@ define amdgpu_ps float @and_and_and(i32 %a, i32 %b, i32 %c) {
 ;
 ; GFX950-GISEL-LABEL: and_and_and:
 ; GFX950-GISEL:       ; %bb.0:
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v2
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v1
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v0, v0, v1, v2 bitop3:0x80
 ; GFX950-GISEL-NEXT:    ; return to shader part epilog
   %and1 = and i32 %a, %c
   %and2 = and i32 %and1, %b
@@ -141,8 +138,7 @@ define amdgpu_ps float @test_12(i32 %a, i32 %b) {
 ;
 ; GFX950-GISEL-LABEL: test_12:
 ; GFX950-GISEL:       ; %bb.0:
-; GFX950-GISEL-NEXT:    v_not_b32_e32 v0, v0
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v1
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v0, v0, v1, v0 bitop3:0xc
 ; GFX950-GISEL-NEXT:    ; return to shader part epilog
   %nota = xor i32 %a, -1
   %and1 = and i32 %nota, %b
@@ -214,9 +210,11 @@ define amdgpu_ps float @test_12_src_overflow(i32 %a, i32 %b, i32 %c) {
 ;
 ; GFX950-GISEL-LABEL: test_12_src_overflow:
 ; GFX950-GISEL:       ; %bb.0:
-; GFX950-GISEL-NEXT:    v_not_b32_e32 v0, v0
-; GFX950-GISEL-NEXT:    v_bfi_b32 v0, v2, v0, v0
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v0, v1
+; GFX950-GISEL-NEXT:    v_not_b32_e32 v3, v0
+; GFX950-GISEL-NEXT:    v_not_b32_e32 v4, v2
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v0, v0, v2, v0 bitop3:0xc
+; GFX950-GISEL-NEXT:    v_and_b32_e32 v2, v3, v4
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v0, v0, v1, v2 bitop3:0xc8
 ; GFX950-GISEL-NEXT:    ; return to shader part epilog
   %nota = xor i32 %a, -1
   %notc = xor i32 %c, -1
@@ -242,11 +240,9 @@ define amdgpu_ps float @test_100_src_overflow(i32 %a, i32 %b, i32 %c) {
 ;
 ; GFX950-GISEL-LABEL: test_100_src_overflow:
 ; GFX950-GISEL:       ; %bb.0:
-; GFX950-GISEL-NEXT:    v_or_b32_e32 v3, v2, v0
-; GFX950-GISEL-NEXT:    v_not_b32_e32 v3, v3
-; GFX950-GISEL-NEXT:    v_not_b32_e32 v4, v1
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v3, v2, v0, v2 bitop3:3
 ; GFX950-GISEL-NEXT:    v_and_b32_e32 v3, v1, v3
-; GFX950-GISEL-NEXT:    v_and_b32_e32 v4, v0, v4
+; GFX950-GISEL-NEXT:    v_bitop3_b32 v4, v0, v1, v0 bitop3:0x30
 ; GFX950-GISEL-NEXT:    v_and_b32_e32 v0, v1, v0
 ; GFX950-GISEL-NEXT:    v_not_b32_e32 v1, v2
 ; GFX950-GISEL-NEXT:    v_and_b32_e32 v4, v4, v2

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm-ci · 2025-01-10T09:37:14Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/11751

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
Input file: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.dSYM
Output file (aarch64): /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.default.gSYM
Loaded 3 functions from DWARF.
Loaded 3 functions from symbol table.
warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000047
addr=0x0000000000000248, file=  3, line=  5
addr=0x0000000000000254, file=  3, line=  7
addr=0x0000000000000258, file=  3, line=  9
addr=0x000000000000025c, file=  3, line=  8
addr=0x0000000000000260, file=  3, line= 11
addr=0x0000000000000264, file=  3, line= 10
addr=0x0000000000000268, file=  3, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
addr=0x0000000000000268, file=  1, line=  6


warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
addr=0x0000000000000268, file=  1, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000030
addr=0x0000000000000248, file=  2, line=  5
addr=0x0000000000000254, file=  2, line=  7
addr=0x0000000000000258, file=  2, line=  9
addr=0x000000000000025c, file=  2, line=  8
addr=0x0000000000000260, file=  2, line= 11
addr=0x0000000000000264, file=  2, line= 10
...

jchlanda · 2025-01-10T12:03:27Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/11751
Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
Input file: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.dSYM
Output file (aarch64): /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.default.gSYM
Loaded 3 functions from DWARF.
Loaded 3 functions from symbol table.
warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000047
addr=0x0000000000000248, file=  3, line=  5
addr=0x0000000000000254, file=  3, line=  7
addr=0x0000000000000258, file=  3, line=  9
addr=0x000000000000025c, file=  3, line=  8
addr=0x0000000000000260, file=  3, line= 11
addr=0x0000000000000264, file=  3, line= 10
addr=0x0000000000000268, file=  3, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
addr=0x0000000000000268, file=  1, line=  6


warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
addr=0x0000000000000268, file=  1, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000030
addr=0x0000000000000248, file=  2, line=  5
addr=0x0000000000000254, file=  2, line=  7
addr=0x0000000000000258, file=  2, line=  9
addr=0x000000000000025c, file=  2, line=  8
addr=0x0000000000000260, file=  2, line= 11
addr=0x0000000000000264, file=  2, line= 10
...

The failure, unless I can't spot something obvious, does not look related to this PR.

…lvm#122267) This came up in downstream static analysis - as a dead code. Admittedly, it depends on what the intention was when checking for [`if (NumOpcodes == 2 && IsB32)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3792C3-L3792C32) and I took a guess that for certain cases the selection should take place. If that's incorrect, that whole if statement can be removed, as it is after a check for: [`if (NumOpcodes < 4)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3788)

llvmbot added the backend:AMDGPU label Jan 9, 2025

jchlanda requested a review from arsenm January 9, 2025 12:41

arsenm requested a review from rampitec January 9, 2025 12:46

arsenm approved these changes Jan 9, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp Outdated Show resolved Hide resolved

jchlanda force-pushed the jakub/bitop3 branch from fb0013a to 06053dd Compare January 9, 2025 12:53

[AMDGPU] Allow selection of some BITOP3 for 2 opcodes and B32 cases

06053dd

jchlanda merged commit 01a7d4e into llvm:main Jan 10, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases #122267

[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases #122267

Uh oh!

jchlanda commented Jan 9, 2025

Uh oh!

llvmbot commented Jan 9, 2025

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jan 10, 2025

Uh oh!

jchlanda commented Jan 10, 2025

Uh oh!

Uh oh!

[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases #122267

[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases #122267

Uh oh!

Conversation

jchlanda commented Jan 9, 2025

Uh oh!

llvmbot commented Jan 9, 2025

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented Jan 10, 2025

Uh oh!

jchlanda commented Jan 10, 2025

Uh oh!

Uh oh!