Skip to content

[DAG] getNode - convert scalar i1 arithmetic calls to bitwise instructions #125486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Feb 3, 2025

We already do this for vector vXi1 types - this patch removes the vector constraint to handle it for all bool types

…tions

We already do this for vector vXi1 types - this patch removes the vector constraint to handle it for all bool types
@llvmbot
Copy link
Member

llvmbot commented Feb 3, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-nvptx
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Simon Pilgrim (RKSimon)

Changes

We already do this for vector vXi1 types - this patch removes the vector constraint to handle it for all bool types


Patch is 32.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125486.diff

20 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+11-12)
  • (modified) llvm/test/CodeGen/AMDGPU/add_i1.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+17-17)
  • (modified) llvm/test/CodeGen/AMDGPU/sub_i1.ll (+6-6)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/add.ll (+4-4)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll (+2-2)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll (+2-2)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/add.ll (+8-22)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/mul.ll (+10-24)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/sub.ll (+2-7)
  • (modified) llvm/test/CodeGen/NVPTX/boolean-patterns.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/avx512-regcall-NoMask.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/bitcast-vector-bool.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/combine-add.ll (+1-3)
  • (modified) llvm/test/CodeGen/X86/fast-isel-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/gpr-to-mask.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/setcc-combine.ll (+3-8)
  • (modified) llvm/test/CodeGen/X86/sse-regcall.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/sse-regcall4.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/subcarry.ll (+3-8)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 8f50a14da25a8b..16c3b295426c64 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7297,15 +7297,15 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     // it's worth handling here.
     if (N2CV && N2CV->isZero())
       return N1;
-    if ((Opcode == ISD::ADD || Opcode == ISD::SUB) && VT.isVector() &&
-        VT.getVectorElementType() == MVT::i1)
+    if ((Opcode == ISD::ADD || Opcode == ISD::SUB) &&
+        VT.getScalarType() == MVT::i1)
       return getNode(ISD::XOR, DL, VT, N1, N2);
     break;
   case ISD::MUL:
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::AND, DL, VT, N1, N2);
     if (N2C && (N1.getOpcode() == ISD::VSCALE) && Flags.hasNoSignedWrap()) {
       const APInt &MulImm = N1->getConstantOperandAPInt(0);
@@ -7326,7 +7326,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1) {
+    if (VT.getScalarType() == MVT::i1) {
       // fold (add_sat x, y) -> (or x, y) for bool types.
       if (Opcode == ISD::SADDSAT || Opcode == ISD::UADDSAT)
         return getNode(ISD::OR, DL, VT, N1, N2);
@@ -7359,7 +7359,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::XOR, DL, VT, N1, N2);
     break;
   case ISD::SMIN:
@@ -7367,7 +7367,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::OR, DL, VT, N1, N2);
     break;
   case ISD::SMAX:
@@ -7375,7 +7375,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::AND, DL, VT, N1, N2);
     break;
   case ISD::FADD:
@@ -10399,12 +10399,12 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
   case ISD::VP_ADD:
   case ISD::VP_SUB:
     // If it is VP_ADD/VP_SUB mask operation then turn it to VP_XOR
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       Opcode = ISD::VP_XOR;
     break;
   case ISD::VP_MUL:
     // If it is VP_MUL mask operation then turn it to VP_AND
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       Opcode = ISD::VP_AND;
     break;
   case ISD::VP_REDUCE_MUL:
@@ -10509,9 +10509,8 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, SDVTList VTList,
       return getNode(ISD::MERGE_VALUES, DL, VTList, {N1, ZeroOverFlow}, Flags);
     }
 
-    if (VTList.VTs[0].isVector() &&
-        VTList.VTs[0].getVectorElementType() == MVT::i1 &&
-        VTList.VTs[1].getVectorElementType() == MVT::i1) {
+    if (VTList.VTs[0].getScalarType() == MVT::i1 &&
+        VTList.VTs[1].getScalarType() == MVT::i1) {
       SDValue F1 = getFreeze(N1);
       SDValue F2 = getFreeze(N2);
       // {vXi1,vXi1} (u/s)addo(vXi1 x, vXi1y) -> {xor(x,y),and(x,y)}
diff --git a/llvm/test/CodeGen/AMDGPU/add_i1.ll b/llvm/test/CodeGen/AMDGPU/add_i1.ll
index e9e3fa765b52fa..ff1a3ee38be1d0 100644
--- a/llvm/test/CodeGen/AMDGPU/add_i1.ll
+++ b/llvm/test/CodeGen/AMDGPU/add_i1.ll
@@ -3,8 +3,8 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX10 %s
 
 ; GCN-LABEL: {{^}}add_var_var_i1:
-; GFX9:  s_xor_b64
-; GFX10: s_xor_b32
+; GFX9:  v_xor_b32_e32
+; GFX10: v_xor_b32_e32
 define amdgpu_kernel void @add_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1) %in0, ptr addrspace(1) %in1) {
   %a = load volatile i1, ptr addrspace(1) %in0
   %b = load volatile i1, ptr addrspace(1) %in1
@@ -14,8 +14,8 @@ define amdgpu_kernel void @add_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1
 }
 
 ; GCN-LABEL: {{^}}add_var_imm_i1:
-; GFX9:  s_not_b64
-; GFX10: s_not_b32
+; GFX9:  s_xor_b64
+; GFX10: s_xor_b32
 define amdgpu_kernel void @add_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1) %in) {
   %a = load volatile i1, ptr addrspace(1) %in
   %add = add i1 %a, 1
@@ -25,8 +25,8 @@ define amdgpu_kernel void @add_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1
 
 ; GCN-LABEL: {{^}}add_i1_cf:
 ; GCN: ; %endif
-; GFX9: s_not_b64
-; GFX10: s_not_b32
+; GFX9: s_xor_b64
+; GFX10: s_xor_b32
 define amdgpu_kernel void @add_i1_cf(ptr addrspace(1) %out, ptr addrspace(1) %a, ptr addrspace(1) %b) {
 entry:
   %tid = call i32 @llvm.amdgcn.workitem.id.x()
diff --git a/llvm/test/CodeGen/AMDGPU/mul.ll b/llvm/test/CodeGen/AMDGPU/mul.ll
index 2003cb163a985c..9b4693f61147aa 100644
--- a/llvm/test/CodeGen/AMDGPU/mul.ll
+++ b/llvm/test/CodeGen/AMDGPU/mul.ll
@@ -1459,8 +1459,8 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; SI-NEXT:    s_mov_b32 s3, 0xf000
 ; SI-NEXT:    s_mov_b32 s2, -1
 ; SI-NEXT:    s_waitcnt lgkmcnt(0)
-; SI-NEXT:    s_mul_i32 s6, s6, s7
-; SI-NEXT:    s_and_b32 s4, s6, 1
+; SI-NEXT:    s_and_b32 s4, s6, s7
+; SI-NEXT:    s_and_b32 s4, s4, 1
 ; SI-NEXT:    v_mov_b32_e32 v0, s4
 ; SI-NEXT:    buffer_store_byte v0, off, s[0:3], 0
 ; SI-NEXT:    s_endpgm
@@ -1473,8 +1473,8 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; VI-NEXT:    s_mov_b32 s3, 0xf000
 ; VI-NEXT:    s_mov_b32 s2, -1
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    s_mul_i32 s6, s6, s7
-; VI-NEXT:    s_and_b32 s4, s6, 1
+; VI-NEXT:    s_and_b32 s4, s6, s7
+; VI-NEXT:    s_and_b32 s4, s4, 1
 ; VI-NEXT:    v_mov_b32_e32 v0, s4
 ; VI-NEXT:    buffer_store_byte v0, off, s[0:3], 0
 ; VI-NEXT:    s_endpgm
@@ -1487,8 +1487,8 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX9-NEXT:    s_mov_b32 s3, 0xf000
 ; GFX9-NEXT:    s_mov_b32 s2, -1
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    s_mul_i32 s6, s6, s7
-; GFX9-NEXT:    s_and_b32 s4, s6, 1
+; GFX9-NEXT:    s_and_b32 s4, s6, s7
+; GFX9-NEXT:    s_and_b32 s4, s4, 1
 ; GFX9-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX9-NEXT:    buffer_store_byte v0, off, s[0:3], 0
 ; GFX9-NEXT:    s_endpgm
@@ -1500,7 +1500,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX10-NEXT:    s_load_dword s3, s[4:5], 0x70
 ; GFX10-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX10-NEXT:    s_mul_i32 s2, s2, s3
+; GFX10-NEXT:    s_and_b32 s2, s2, s3
 ; GFX10-NEXT:    s_mov_b32 s3, 0x31016000
 ; GFX10-NEXT:    s_and_b32 s2, s2, 1
 ; GFX10-NEXT:    v_mov_b32_e32 v0, s2
@@ -1515,7 +1515,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX11-NEXT:    s_load_b32 s3, s[4:5], 0x70
 ; GFX11-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-NEXT:    s_mul_i32 s2, s2, s3
+; GFX11-NEXT:    s_and_b32 s2, s2, s3
 ; GFX11-NEXT:    s_mov_b32 s3, 0x31016000
 ; GFX11-NEXT:    s_and_b32 s2, s2, 1
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -1531,7 +1531,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX12-NEXT:    s_load_b32 s3, s[4:5], 0x70
 ; GFX12-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_mul_i32 s2, s2, s3
+; GFX12-NEXT:    s_and_b32 s2, s2, s3
 ; GFX12-NEXT:    s_mov_b32 s3, 0x31016000
 ; GFX12-NEXT:    s_and_b32 s2, s2, 1
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -1555,7 +1555,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; EG-NEXT:     MOV * T0.X, 0.0,
 ; EG-NEXT:    ALU clause starting at 11:
 ; EG-NEXT:     AND_INT T0.W, KC0[2].Y, literal.x,
-; EG-NEXT:     MULLO_INT * T0.X, T1.X, T0.X,
+; EG-NEXT:     AND_INT * T1.W, T1.X, T0.X,
 ; EG-NEXT:    3(4.203895e-45), 0(0.000000e+00)
 ; EG-NEXT:     AND_INT T1.W, PS, 1,
 ; EG-NEXT:     LSHL * T0.W, PV.W, literal.x,
@@ -1589,7 +1589,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; SI-NEXT:    s_mov_b32 s4, s0
 ; SI-NEXT:    s_mov_b32 s5, s1
 ; SI-NEXT:    s_waitcnt vmcnt(0)
-; SI-NEXT:    v_mul_lo_u32 v0, v0, v1
+; SI-NEXT:    v_and_b32_e32 v0, v0, v1
 ; SI-NEXT:    v_and_b32_e32 v0, 1, v0
 ; SI-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; SI-NEXT:    s_endpgm
@@ -1609,7 +1609,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; VI-NEXT:    s_mov_b32 s4, s0
 ; VI-NEXT:    s_mov_b32 s5, s1
 ; VI-NEXT:    s_waitcnt vmcnt(0)
-; VI-NEXT:    v_mul_lo_u32 v0, v0, v1
+; VI-NEXT:    v_and_b32_e32 v0, v0, v1
 ; VI-NEXT:    v_and_b32_e32 v0, 1, v0
 ; VI-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; VI-NEXT:    s_endpgm
@@ -1629,7 +1629,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX9-NEXT:    s_mov_b32 s4, s0
 ; GFX9-NEXT:    s_mov_b32 s5, s1
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX9-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX9-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX9-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; GFX9-NEXT:    s_endpgm
@@ -1650,7 +1650,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX10-NEXT:    s_mov_b32 s4, s0
 ; GFX10-NEXT:    s_mov_b32 s5, s1
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
-; GFX10-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX10-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX10-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX10-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; GFX10-NEXT:    s_endpgm
@@ -1671,7 +1671,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX11-NEXT:    s_mov_b32 s4, s0
 ; GFX11-NEXT:    s_mov_b32 s5, s1
 ; GFX11-NEXT:    s_waitcnt vmcnt(0)
-; GFX11-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX11-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX11-NEXT:    buffer_store_b8 v0, off, s[4:7], 0
@@ -1693,7 +1693,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX12-NEXT:    s_mov_b32 s4, s0
 ; GFX12-NEXT:    s_mov_b32 s5, s1
 ; GFX12-NEXT:    s_wait_loadcnt 0x0
-; GFX12-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX12-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX12-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX12-NEXT:    buffer_store_b8 v0, off, s[4:7], null
@@ -1714,7 +1714,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; EG-NEXT:     MOV * T0.X, KC0[2].Z,
 ; EG-NEXT:    ALU clause starting at 11:
 ; EG-NEXT:     AND_INT T0.W, KC0[2].Y, literal.x,
-; EG-NEXT:     MULLO_INT * T0.X, T0.X, T1.X,
+; EG-NEXT:     AND_INT * T1.W, T0.X, T1.X,
 ; EG-NEXT:    3(4.203895e-45), 0(0.000000e+00)
 ; EG-NEXT:     AND_INT T1.W, PS, 1,
 ; EG-NEXT:     LSHL * T0.W, PV.W, literal.x,
diff --git a/llvm/test/CodeGen/AMDGPU/sub_i1.ll b/llvm/test/CodeGen/AMDGPU/sub_i1.ll
index a6ab1bd9e19f1f..19d012fc074f8d 100644
--- a/llvm/test/CodeGen/AMDGPU/sub_i1.ll
+++ b/llvm/test/CodeGen/AMDGPU/sub_i1.ll
@@ -4,8 +4,8 @@
 
 
 ; GCN-LABEL: {{^}}sub_var_var_i1:
-; WAVE32: s_xor_b32
-; WAVE64: s_xor_b64
+; WAVE32: v_xor_b32_e32
+; WAVE64: v_xor_b32_e32
 define amdgpu_kernel void @sub_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1) %in0, ptr addrspace(1) %in1) {
   %a = load volatile i1, ptr addrspace(1) %in0
   %b = load volatile i1, ptr addrspace(1) %in1
@@ -15,8 +15,8 @@ define amdgpu_kernel void @sub_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1
 }
 
 ; GCN-LABEL: {{^}}sub_var_imm_i1:
-; WAVE32: s_not_b32
-; WAVE64: s_not_b64
+; WAVE32: s_xor_b32
+; WAVE64: s_xor_b64
 define amdgpu_kernel void @sub_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1) %in) {
   %a = load volatile i1, ptr addrspace(1) %in
   %sub = sub i1 %a, 1
@@ -26,8 +26,8 @@ define amdgpu_kernel void @sub_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1
 
 ; GCN-LABEL: {{^}}sub_i1_cf:
 ; GCN: ; %endif
-; WAVE32: s_not_b32
-; WAVE64: s_not_b64
+; WAVE32: s_xor_b32
+; WAVE64: s_xor_b64
 define amdgpu_kernel void @sub_i1_cf(ptr addrspace(1) %out, ptr addrspace(1) %a, ptr addrspace(1) %b) {
 entry:
   %tid = call i32 @llvm.amdgcn.workitem.id.x()
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll
index f156f8d6afce5d..69db3790fc1b37 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll
@@ -7,12 +7,12 @@
 define i1 @add_i1(i1 %x, i1 %y) {
 ; LA32-LABEL: add_i1:
 ; LA32:       # %bb.0:
-; LA32-NEXT:    add.w $a0, $a0, $a1
+; LA32-NEXT:    xor $a0, $a0, $a1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: add_i1:
 ; LA64:       # %bb.0:
-; LA64-NEXT:    add.d $a0, $a0, $a1
+; LA64-NEXT:    xor $a0, $a0, $a1
 ; LA64-NEXT:    ret
   %add = add i1 %x, %y
   ret i1 %add
@@ -97,12 +97,12 @@ define i64 @add_i64(i64 %x, i64 %y) {
 define i1 @add_i1_3(i1 %x) {
 ; LA32-LABEL: add_i1_3:
 ; LA32:       # %bb.0:
-; LA32-NEXT:    addi.w $a0, $a0, 1
+; LA32-NEXT:    xori $a0, $a0, 1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: add_i1_3:
 ; LA64:       # %bb.0:
-; LA64-NEXT:    addi.d $a0, $a0, 1
+; LA64-NEXT:    xori $a0, $a0, 1
 ; LA64-NEXT:    ret
   %add = add i1 %x, 3
   ret i1 %add
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll
index 58cc0e7d6484a2..3a0cfd00940c51 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll
@@ -7,12 +7,12 @@
 define i1 @mul_i1(i1 %a, i1 %b) {
 ; LA32-LABEL: mul_i1:
 ; LA32:       # %bb.0: # %entry
-; LA32-NEXT:    mul.w $a0, $a0, $a1
+; LA32-NEXT:    and $a0, $a0, $a1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: mul_i1:
 ; LA64:       # %bb.0: # %entry
-; LA64-NEXT:    mul.d $a0, $a0, $a1
+; LA64-NEXT:    and $a0, $a0, $a1
 ; LA64-NEXT:    ret
 entry:
   %r = mul i1 %a, %b
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll
index 12543f857a1984..ce4a199b57c3d8 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll
@@ -7,12 +7,12 @@
 define i1 @sub_i1(i1 %x, i1 %y) {
 ; LA32-LABEL: sub_i1:
 ; LA32:       # %bb.0:
-; LA32-NEXT:    sub.w $a0, $a0, $a1
+; LA32-NEXT:    xor $a0, $a0, $a1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: sub_i1:
 ; LA64:       # %bb.0:
-; LA64-NEXT:    sub.d $a0, $a0, $a1
+; LA64-NEXT:    xor $a0, $a0, $a1
 ; LA64-NEXT:    ret
   %sub = sub i1 %x, %y
   ret i1 %sub
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/add.ll b/llvm/test/CodeGen/Mips/llvm-ir/add.ll
index f6b3b96aaa0cee..a21477acd53413 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/add.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/add.ll
@@ -38,18 +38,11 @@ define signext i1 @add_i1(i1 signext %a, i1 signext %b) {
 entry:
 ; ALL-LABEL: add_i1:
 
-  ; NOT-R2-R6:  addu   $[[T0:[0-9]+]], $4, $5
-  ; NOT-R2-R6:  andi   $[[T0]], $[[T0]], 1
-  ; NOT-R2-R6:  negu   $2, $[[T0]]
+  ; NOT-R2-R6:  xor    $[[T0:[0-9]+]], $4, $5
 
-  ; R2-R6:      addu   $[[T0:[0-9]+]], $4, $5
-  ; R2-R6:      andi   $[[T0]], $[[T0]], 1
-  ; R2-R6:      negu   $2, $[[T0]]
+  ; R2-R6:      xor    $[[T0:[0-9]+]], $4, $5
 
-  ; MMR6:       addu16  $[[T0:[0-9]+]], $4, $5
-  ; MMR6:       andi16  $[[T0]], $[[T0]], 1
-  ; MMR6:       li16    $[[T1:[0-9]+]], 0
-  ; MMR6:       subu16  $[[T0]], $[[T1]], $[[T0]]
+  ; MMR6:       xor    $[[T0:[0-9]+]], $4, $5
 
   %r = add i1 %a, %b
   ret i1 %r
@@ -368,18 +361,11 @@ define signext i128 @add_i128_4(i128 signext %a) {
 
 define signext i1 @add_i1_3(i1 signext %a) {
 ; ALL-LABEL: add_i1_3:
-  ; GP32:        addiu  $[[T0:[0-9]+]], $4, 1
-  ; GP32:        andi   $[[T0]], $[[T0]], 1
-  ; GP32:        negu   $2, $[[T0]]
-
-  ; GP64:        addiu  $[[T0:[0-9]+]], $4, 1
-  ; GP64:        andi   $[[T0]], $[[T0]], 1
-  ; GP64:        negu   $2, $[[T0]]
-
-  ; MMR6:        addiur2 $[[T0:[0-9]+]], $4, 1
-  ; MMR6:        andi16  $[[T0]], $[[T0]], 1
-  ; MMR6:        li16    $[[T1:[0-9]+]], 0
-  ; MMR6:        subu16  $2, $[[T1]], $[[T0]]
+  ; GP32:        not    $[[T0:[0-9]+]], $4
+
+  ; GP64:        not    $[[T0:[0-9]+]], $4
+
+  ; MMR6:        not16  $[[T0:[0-9]+]], $4
 
   %r = add i1 3, %a
   ret i1 %r
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/mul.ll b/llvm/test/CodeGen/Mips/llvm-ir/mul.ll
index 00b91d1413cfe6..2735d53f5fe060 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/mul.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/mul.ll
@@ -31,36 +31,22 @@ define signext i1 @mul_i1(i1 signext %a, i1 signext %b) {
 entry:
 ; ALL-LABEL: mul_i1:
 
-  ; M2:         mult    $4, $5
-  ; M2:         mflo    $[[T0:[0-9]+]]
-  ; M2:         andi    $[[T0]], $[[T0]], 1
-  ; M2:         negu    $2, $[[T0]]
+  ; M2:         and     $[[T0:[0-9]+]], $4, $5
 
-  ; 32R1-R5:    mul     $[[T0:[0-9]+]], $4, $5
-  ; 32R1-R5:    andi    $[[T0]], $[[T0]], 1
-  ; 32R1-R5:    negu    $2, $[[T0]]
+  ; 32R1-R5:    and     $[[T0:[0-9]+]], $4, $5
 
-  ; 32R6:       mul     $[[T0:[0-9]+]], $4, $5
-  ; 32R6:       andi    $[[T0]], $[[T0]], 1
-  ; 32R6:       negu    $2, $[[T0]]
+  ; 32R6:       and     $[[T0:[0-9]+]], $4, $5
 
-  ; M4:         mult    $4, $5
-  ; M4:         mflo    $[[T0:[0-9]+]]
-  ; M4:         andi    $[[T0]], $[[T0]], 1
-  ; M4:         negu    $2, $[[T0]]
+  ; M4:         and     $[[T0:[0-9]+]], $4, $5
 
-  ; 64R1-R5:    mul     $[[T0:[0-9]+]], $4, $5
-  ; 64R1-R5:    andi    $[[T0]], $[[T0]], 1
-  ; 64R1-R5:    negu    $2, $[[T0]]
+  ; 64R1-R5:    and     $[[T0:[0-9]+]], $4, $5
 
-  ; 64R6:       mul     $[[T0:[0-9]+]], $4, $5
-  ; 64R6:       andi    $[[T0]], $[[T0]], 1
-  ; 64R6:       negu    $2, $[[T0]]
+  ; 64R6:       and     $[[T0:[0-9]+]], $4, $5
 
-  ; MM32:       mul     $[[T0:[0-9]+]], $4, $5
-  ; MM32:       andi16  $[[T0]], $[[T0]], 1
-  ; MM32:       li16    $[[T1:[0-9]+]], 0
-  ; MM32:       subu16  $2, $[[T1]], $[[T0]]
+  ; MM32R3:     and16 $4, $5
+  ; MM32R3:     move $2, $4
+
+  ; MM32R6:     and     $[[T0:[0-9]+]], $4, $5
 
   %r = mul i1 %a, %b
   ret i1 %r
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/sub.ll b/llvm/test/CodeGen/Mips/llvm-ir/sub.ll
index b465e24d47a059..dd5e6e957245d0 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/sub.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/sub.ll
@@ -33,14 +33,9 @@ define signext i1 @sub_i1(i1 signext %a, i1 signext %b) {
 entry:
 ; ALL-LABEL: sub_i1:
 
-  ; NOT-MM:         subu    $[[T0:[0-9]+]], $4, $5
-  ; NOT-MM:         andi    $[[T0]], $[[T0]], 1
-  ; NOT-MM:         negu    $2, $[[T0]]
+  ; NOT-MM:         xor     $[[T0:[0-9]+]], $4, $5
 
-  ; MM:             subu16  $[[T0:[0-9]+]], $4, $5
-  ; MM:             andi16  $[[T0]], $[[T0]], 1
-  ; MM:             li16    $[[T1:[0-9]+]], 0
-  ; MM:             subu16  $2, $[[T1]], $[[T0]]
+  ; MM:             xor16   $[[T0:[0-9]+]], $4, $5
 
   %r = sub i1 %a, %b
   ret i1 %r
diff --git a/llvm/test/CodeGen/NVPTX/boolean-patterns.ll b/llvm/test/CodeGen/NVPTX/boolean-patterns.ll
index 6ed98906108269..fd4d325ae9374d 100644
--- a/llvm/test/CodeGen/NVPTX/boolean-patterns.ll
+++ b/llv...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Feb 3, 2025

@llvm/pr-subscribers-backend-loongarch

Author: Simon Pilgrim (RKSimon)

Changes

We already do this for vector vXi1 types - this patch removes the vector constraint to handle it for all bool types


Patch is 32.50 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125486.diff

20 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+11-12)
  • (modified) llvm/test/CodeGen/AMDGPU/add_i1.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+17-17)
  • (modified) llvm/test/CodeGen/AMDGPU/sub_i1.ll (+6-6)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/add.ll (+4-4)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll (+2-2)
  • (modified) llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll (+2-2)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/add.ll (+8-22)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/mul.ll (+10-24)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/sub.ll (+2-7)
  • (modified) llvm/test/CodeGen/NVPTX/boolean-patterns.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/avx512-regcall-NoMask.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/bitcast-vector-bool.ll (+3-3)
  • (modified) llvm/test/CodeGen/X86/combine-add.ll (+1-3)
  • (modified) llvm/test/CodeGen/X86/fast-isel-select.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/gpr-to-mask.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/setcc-combine.ll (+3-8)
  • (modified) llvm/test/CodeGen/X86/sse-regcall.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/sse-regcall4.ll (+9-9)
  • (modified) llvm/test/CodeGen/X86/subcarry.ll (+3-8)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 8f50a14da25a8b..16c3b295426c64 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7297,15 +7297,15 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     // it's worth handling here.
     if (N2CV && N2CV->isZero())
       return N1;
-    if ((Opcode == ISD::ADD || Opcode == ISD::SUB) && VT.isVector() &&
-        VT.getVectorElementType() == MVT::i1)
+    if ((Opcode == ISD::ADD || Opcode == ISD::SUB) &&
+        VT.getScalarType() == MVT::i1)
       return getNode(ISD::XOR, DL, VT, N1, N2);
     break;
   case ISD::MUL:
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::AND, DL, VT, N1, N2);
     if (N2C && (N1.getOpcode() == ISD::VSCALE) && Flags.hasNoSignedWrap()) {
       const APInt &MulImm = N1->getConstantOperandAPInt(0);
@@ -7326,7 +7326,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1) {
+    if (VT.getScalarType() == MVT::i1) {
       // fold (add_sat x, y) -> (or x, y) for bool types.
       if (Opcode == ISD::SADDSAT || Opcode == ISD::UADDSAT)
         return getNode(ISD::OR, DL, VT, N1, N2);
@@ -7359,7 +7359,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::XOR, DL, VT, N1, N2);
     break;
   case ISD::SMIN:
@@ -7367,7 +7367,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::OR, DL, VT, N1, N2);
     break;
   case ISD::SMAX:
@@ -7375,7 +7375,7 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
     assert(VT.isInteger() && "This operator does not apply to FP types!");
     assert(N1.getValueType() == N2.getValueType() &&
            N1.getValueType() == VT && "Binary operator types must match!");
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       return getNode(ISD::AND, DL, VT, N1, N2);
     break;
   case ISD::FADD:
@@ -10399,12 +10399,12 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
   case ISD::VP_ADD:
   case ISD::VP_SUB:
     // If it is VP_ADD/VP_SUB mask operation then turn it to VP_XOR
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       Opcode = ISD::VP_XOR;
     break;
   case ISD::VP_MUL:
     // If it is VP_MUL mask operation then turn it to VP_AND
-    if (VT.isVector() && VT.getVectorElementType() == MVT::i1)
+    if (VT.getScalarType() == MVT::i1)
       Opcode = ISD::VP_AND;
     break;
   case ISD::VP_REDUCE_MUL:
@@ -10509,9 +10509,8 @@ SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, SDVTList VTList,
       return getNode(ISD::MERGE_VALUES, DL, VTList, {N1, ZeroOverFlow}, Flags);
     }
 
-    if (VTList.VTs[0].isVector() &&
-        VTList.VTs[0].getVectorElementType() == MVT::i1 &&
-        VTList.VTs[1].getVectorElementType() == MVT::i1) {
+    if (VTList.VTs[0].getScalarType() == MVT::i1 &&
+        VTList.VTs[1].getScalarType() == MVT::i1) {
       SDValue F1 = getFreeze(N1);
       SDValue F2 = getFreeze(N2);
       // {vXi1,vXi1} (u/s)addo(vXi1 x, vXi1y) -> {xor(x,y),and(x,y)}
diff --git a/llvm/test/CodeGen/AMDGPU/add_i1.ll b/llvm/test/CodeGen/AMDGPU/add_i1.ll
index e9e3fa765b52fa..ff1a3ee38be1d0 100644
--- a/llvm/test/CodeGen/AMDGPU/add_i1.ll
+++ b/llvm/test/CodeGen/AMDGPU/add_i1.ll
@@ -3,8 +3,8 @@
 ; RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GFX10 %s
 
 ; GCN-LABEL: {{^}}add_var_var_i1:
-; GFX9:  s_xor_b64
-; GFX10: s_xor_b32
+; GFX9:  v_xor_b32_e32
+; GFX10: v_xor_b32_e32
 define amdgpu_kernel void @add_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1) %in0, ptr addrspace(1) %in1) {
   %a = load volatile i1, ptr addrspace(1) %in0
   %b = load volatile i1, ptr addrspace(1) %in1
@@ -14,8 +14,8 @@ define amdgpu_kernel void @add_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1
 }
 
 ; GCN-LABEL: {{^}}add_var_imm_i1:
-; GFX9:  s_not_b64
-; GFX10: s_not_b32
+; GFX9:  s_xor_b64
+; GFX10: s_xor_b32
 define amdgpu_kernel void @add_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1) %in) {
   %a = load volatile i1, ptr addrspace(1) %in
   %add = add i1 %a, 1
@@ -25,8 +25,8 @@ define amdgpu_kernel void @add_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1
 
 ; GCN-LABEL: {{^}}add_i1_cf:
 ; GCN: ; %endif
-; GFX9: s_not_b64
-; GFX10: s_not_b32
+; GFX9: s_xor_b64
+; GFX10: s_xor_b32
 define amdgpu_kernel void @add_i1_cf(ptr addrspace(1) %out, ptr addrspace(1) %a, ptr addrspace(1) %b) {
 entry:
   %tid = call i32 @llvm.amdgcn.workitem.id.x()
diff --git a/llvm/test/CodeGen/AMDGPU/mul.ll b/llvm/test/CodeGen/AMDGPU/mul.ll
index 2003cb163a985c..9b4693f61147aa 100644
--- a/llvm/test/CodeGen/AMDGPU/mul.ll
+++ b/llvm/test/CodeGen/AMDGPU/mul.ll
@@ -1459,8 +1459,8 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; SI-NEXT:    s_mov_b32 s3, 0xf000
 ; SI-NEXT:    s_mov_b32 s2, -1
 ; SI-NEXT:    s_waitcnt lgkmcnt(0)
-; SI-NEXT:    s_mul_i32 s6, s6, s7
-; SI-NEXT:    s_and_b32 s4, s6, 1
+; SI-NEXT:    s_and_b32 s4, s6, s7
+; SI-NEXT:    s_and_b32 s4, s4, 1
 ; SI-NEXT:    v_mov_b32_e32 v0, s4
 ; SI-NEXT:    buffer_store_byte v0, off, s[0:3], 0
 ; SI-NEXT:    s_endpgm
@@ -1473,8 +1473,8 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; VI-NEXT:    s_mov_b32 s3, 0xf000
 ; VI-NEXT:    s_mov_b32 s2, -1
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    s_mul_i32 s6, s6, s7
-; VI-NEXT:    s_and_b32 s4, s6, 1
+; VI-NEXT:    s_and_b32 s4, s6, s7
+; VI-NEXT:    s_and_b32 s4, s4, 1
 ; VI-NEXT:    v_mov_b32_e32 v0, s4
 ; VI-NEXT:    buffer_store_byte v0, off, s[0:3], 0
 ; VI-NEXT:    s_endpgm
@@ -1487,8 +1487,8 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX9-NEXT:    s_mov_b32 s3, 0xf000
 ; GFX9-NEXT:    s_mov_b32 s2, -1
 ; GFX9-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX9-NEXT:    s_mul_i32 s6, s6, s7
-; GFX9-NEXT:    s_and_b32 s4, s6, 1
+; GFX9-NEXT:    s_and_b32 s4, s6, s7
+; GFX9-NEXT:    s_and_b32 s4, s4, 1
 ; GFX9-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX9-NEXT:    buffer_store_byte v0, off, s[0:3], 0
 ; GFX9-NEXT:    s_endpgm
@@ -1500,7 +1500,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX10-NEXT:    s_load_dword s3, s[4:5], 0x70
 ; GFX10-NEXT:    s_load_dwordx2 s[0:1], s[4:5], 0x24
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX10-NEXT:    s_mul_i32 s2, s2, s3
+; GFX10-NEXT:    s_and_b32 s2, s2, s3
 ; GFX10-NEXT:    s_mov_b32 s3, 0x31016000
 ; GFX10-NEXT:    s_and_b32 s2, s2, 1
 ; GFX10-NEXT:    v_mov_b32_e32 v0, s2
@@ -1515,7 +1515,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX11-NEXT:    s_load_b32 s3, s[4:5], 0x70
 ; GFX11-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX11-NEXT:    s_mul_i32 s2, s2, s3
+; GFX11-NEXT:    s_and_b32 s2, s2, s3
 ; GFX11-NEXT:    s_mov_b32 s3, 0x31016000
 ; GFX11-NEXT:    s_and_b32 s2, s2, 1
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -1531,7 +1531,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; GFX12-NEXT:    s_load_b32 s3, s[4:5], 0x70
 ; GFX12-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_mul_i32 s2, s2, s3
+; GFX12-NEXT:    s_and_b32 s2, s2, s3
 ; GFX12-NEXT:    s_mov_b32 s3, 0x31016000
 ; GFX12-NEXT:    s_and_b32 s2, s2, 1
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -1555,7 +1555,7 @@ define amdgpu_kernel void @s_mul_i1(ptr addrspace(1) %out, [8 x i32], i1 %a, [8
 ; EG-NEXT:     MOV * T0.X, 0.0,
 ; EG-NEXT:    ALU clause starting at 11:
 ; EG-NEXT:     AND_INT T0.W, KC0[2].Y, literal.x,
-; EG-NEXT:     MULLO_INT * T0.X, T1.X, T0.X,
+; EG-NEXT:     AND_INT * T1.W, T1.X, T0.X,
 ; EG-NEXT:    3(4.203895e-45), 0(0.000000e+00)
 ; EG-NEXT:     AND_INT T1.W, PS, 1,
 ; EG-NEXT:     LSHL * T0.W, PV.W, literal.x,
@@ -1589,7 +1589,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; SI-NEXT:    s_mov_b32 s4, s0
 ; SI-NEXT:    s_mov_b32 s5, s1
 ; SI-NEXT:    s_waitcnt vmcnt(0)
-; SI-NEXT:    v_mul_lo_u32 v0, v0, v1
+; SI-NEXT:    v_and_b32_e32 v0, v0, v1
 ; SI-NEXT:    v_and_b32_e32 v0, 1, v0
 ; SI-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; SI-NEXT:    s_endpgm
@@ -1609,7 +1609,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; VI-NEXT:    s_mov_b32 s4, s0
 ; VI-NEXT:    s_mov_b32 s5, s1
 ; VI-NEXT:    s_waitcnt vmcnt(0)
-; VI-NEXT:    v_mul_lo_u32 v0, v0, v1
+; VI-NEXT:    v_and_b32_e32 v0, v0, v1
 ; VI-NEXT:    v_and_b32_e32 v0, 1, v0
 ; VI-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; VI-NEXT:    s_endpgm
@@ -1629,7 +1629,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX9-NEXT:    s_mov_b32 s4, s0
 ; GFX9-NEXT:    s_mov_b32 s5, s1
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
-; GFX9-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX9-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX9-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX9-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; GFX9-NEXT:    s_endpgm
@@ -1650,7 +1650,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX10-NEXT:    s_mov_b32 s4, s0
 ; GFX10-NEXT:    s_mov_b32 s5, s1
 ; GFX10-NEXT:    s_waitcnt vmcnt(0)
-; GFX10-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX10-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX10-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX10-NEXT:    buffer_store_byte v0, off, s[4:7], 0
 ; GFX10-NEXT:    s_endpgm
@@ -1671,7 +1671,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX11-NEXT:    s_mov_b32 s4, s0
 ; GFX11-NEXT:    s_mov_b32 s5, s1
 ; GFX11-NEXT:    s_waitcnt vmcnt(0)
-; GFX11-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX11-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX11-NEXT:    buffer_store_b8 v0, off, s[4:7], 0
@@ -1693,7 +1693,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; GFX12-NEXT:    s_mov_b32 s4, s0
 ; GFX12-NEXT:    s_mov_b32 s5, s1
 ; GFX12-NEXT:    s_wait_loadcnt 0x0
-; GFX12-NEXT:    v_mul_lo_u32 v0, v0, v1
+; GFX12-NEXT:    v_and_b32_e32 v0, v0, v1
 ; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX12-NEXT:    v_and_b32_e32 v0, 1, v0
 ; GFX12-NEXT:    buffer_store_b8 v0, off, s[4:7], null
@@ -1714,7 +1714,7 @@ define amdgpu_kernel void @v_mul_i1(ptr addrspace(1) %out, ptr addrspace(1) %in)
 ; EG-NEXT:     MOV * T0.X, KC0[2].Z,
 ; EG-NEXT:    ALU clause starting at 11:
 ; EG-NEXT:     AND_INT T0.W, KC0[2].Y, literal.x,
-; EG-NEXT:     MULLO_INT * T0.X, T0.X, T1.X,
+; EG-NEXT:     AND_INT * T1.W, T0.X, T1.X,
 ; EG-NEXT:    3(4.203895e-45), 0(0.000000e+00)
 ; EG-NEXT:     AND_INT T1.W, PS, 1,
 ; EG-NEXT:     LSHL * T0.W, PV.W, literal.x,
diff --git a/llvm/test/CodeGen/AMDGPU/sub_i1.ll b/llvm/test/CodeGen/AMDGPU/sub_i1.ll
index a6ab1bd9e19f1f..19d012fc074f8d 100644
--- a/llvm/test/CodeGen/AMDGPU/sub_i1.ll
+++ b/llvm/test/CodeGen/AMDGPU/sub_i1.ll
@@ -4,8 +4,8 @@
 
 
 ; GCN-LABEL: {{^}}sub_var_var_i1:
-; WAVE32: s_xor_b32
-; WAVE64: s_xor_b64
+; WAVE32: v_xor_b32_e32
+; WAVE64: v_xor_b32_e32
 define amdgpu_kernel void @sub_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1) %in0, ptr addrspace(1) %in1) {
   %a = load volatile i1, ptr addrspace(1) %in0
   %b = load volatile i1, ptr addrspace(1) %in1
@@ -15,8 +15,8 @@ define amdgpu_kernel void @sub_var_var_i1(ptr addrspace(1) %out, ptr addrspace(1
 }
 
 ; GCN-LABEL: {{^}}sub_var_imm_i1:
-; WAVE32: s_not_b32
-; WAVE64: s_not_b64
+; WAVE32: s_xor_b32
+; WAVE64: s_xor_b64
 define amdgpu_kernel void @sub_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1) %in) {
   %a = load volatile i1, ptr addrspace(1) %in
   %sub = sub i1 %a, 1
@@ -26,8 +26,8 @@ define amdgpu_kernel void @sub_var_imm_i1(ptr addrspace(1) %out, ptr addrspace(1
 
 ; GCN-LABEL: {{^}}sub_i1_cf:
 ; GCN: ; %endif
-; WAVE32: s_not_b32
-; WAVE64: s_not_b64
+; WAVE32: s_xor_b32
+; WAVE64: s_xor_b64
 define amdgpu_kernel void @sub_i1_cf(ptr addrspace(1) %out, ptr addrspace(1) %a, ptr addrspace(1) %b) {
 entry:
   %tid = call i32 @llvm.amdgcn.workitem.id.x()
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll
index f156f8d6afce5d..69db3790fc1b37 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/add.ll
@@ -7,12 +7,12 @@
 define i1 @add_i1(i1 %x, i1 %y) {
 ; LA32-LABEL: add_i1:
 ; LA32:       # %bb.0:
-; LA32-NEXT:    add.w $a0, $a0, $a1
+; LA32-NEXT:    xor $a0, $a0, $a1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: add_i1:
 ; LA64:       # %bb.0:
-; LA64-NEXT:    add.d $a0, $a0, $a1
+; LA64-NEXT:    xor $a0, $a0, $a1
 ; LA64-NEXT:    ret
   %add = add i1 %x, %y
   ret i1 %add
@@ -97,12 +97,12 @@ define i64 @add_i64(i64 %x, i64 %y) {
 define i1 @add_i1_3(i1 %x) {
 ; LA32-LABEL: add_i1_3:
 ; LA32:       # %bb.0:
-; LA32-NEXT:    addi.w $a0, $a0, 1
+; LA32-NEXT:    xori $a0, $a0, 1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: add_i1_3:
 ; LA64:       # %bb.0:
-; LA64-NEXT:    addi.d $a0, $a0, 1
+; LA64-NEXT:    xori $a0, $a0, 1
 ; LA64-NEXT:    ret
   %add = add i1 %x, 3
   ret i1 %add
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll
index 58cc0e7d6484a2..3a0cfd00940c51 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/mul.ll
@@ -7,12 +7,12 @@
 define i1 @mul_i1(i1 %a, i1 %b) {
 ; LA32-LABEL: mul_i1:
 ; LA32:       # %bb.0: # %entry
-; LA32-NEXT:    mul.w $a0, $a0, $a1
+; LA32-NEXT:    and $a0, $a0, $a1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: mul_i1:
 ; LA64:       # %bb.0: # %entry
-; LA64-NEXT:    mul.d $a0, $a0, $a1
+; LA64-NEXT:    and $a0, $a0, $a1
 ; LA64-NEXT:    ret
 entry:
   %r = mul i1 %a, %b
diff --git a/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll b/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll
index 12543f857a1984..ce4a199b57c3d8 100644
--- a/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll
+++ b/llvm/test/CodeGen/LoongArch/ir-instruction/sub.ll
@@ -7,12 +7,12 @@
 define i1 @sub_i1(i1 %x, i1 %y) {
 ; LA32-LABEL: sub_i1:
 ; LA32:       # %bb.0:
-; LA32-NEXT:    sub.w $a0, $a0, $a1
+; LA32-NEXT:    xor $a0, $a0, $a1
 ; LA32-NEXT:    ret
 ;
 ; LA64-LABEL: sub_i1:
 ; LA64:       # %bb.0:
-; LA64-NEXT:    sub.d $a0, $a0, $a1
+; LA64-NEXT:    xor $a0, $a0, $a1
 ; LA64-NEXT:    ret
   %sub = sub i1 %x, %y
   ret i1 %sub
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/add.ll b/llvm/test/CodeGen/Mips/llvm-ir/add.ll
index f6b3b96aaa0cee..a21477acd53413 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/add.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/add.ll
@@ -38,18 +38,11 @@ define signext i1 @add_i1(i1 signext %a, i1 signext %b) {
 entry:
 ; ALL-LABEL: add_i1:
 
-  ; NOT-R2-R6:  addu   $[[T0:[0-9]+]], $4, $5
-  ; NOT-R2-R6:  andi   $[[T0]], $[[T0]], 1
-  ; NOT-R2-R6:  negu   $2, $[[T0]]
+  ; NOT-R2-R6:  xor    $[[T0:[0-9]+]], $4, $5
 
-  ; R2-R6:      addu   $[[T0:[0-9]+]], $4, $5
-  ; R2-R6:      andi   $[[T0]], $[[T0]], 1
-  ; R2-R6:      negu   $2, $[[T0]]
+  ; R2-R6:      xor    $[[T0:[0-9]+]], $4, $5
 
-  ; MMR6:       addu16  $[[T0:[0-9]+]], $4, $5
-  ; MMR6:       andi16  $[[T0]], $[[T0]], 1
-  ; MMR6:       li16    $[[T1:[0-9]+]], 0
-  ; MMR6:       subu16  $[[T0]], $[[T1]], $[[T0]]
+  ; MMR6:       xor    $[[T0:[0-9]+]], $4, $5
 
   %r = add i1 %a, %b
   ret i1 %r
@@ -368,18 +361,11 @@ define signext i128 @add_i128_4(i128 signext %a) {
 
 define signext i1 @add_i1_3(i1 signext %a) {
 ; ALL-LABEL: add_i1_3:
-  ; GP32:        addiu  $[[T0:[0-9]+]], $4, 1
-  ; GP32:        andi   $[[T0]], $[[T0]], 1
-  ; GP32:        negu   $2, $[[T0]]
-
-  ; GP64:        addiu  $[[T0:[0-9]+]], $4, 1
-  ; GP64:        andi   $[[T0]], $[[T0]], 1
-  ; GP64:        negu   $2, $[[T0]]
-
-  ; MMR6:        addiur2 $[[T0:[0-9]+]], $4, 1
-  ; MMR6:        andi16  $[[T0]], $[[T0]], 1
-  ; MMR6:        li16    $[[T1:[0-9]+]], 0
-  ; MMR6:        subu16  $2, $[[T1]], $[[T0]]
+  ; GP32:        not    $[[T0:[0-9]+]], $4
+
+  ; GP64:        not    $[[T0:[0-9]+]], $4
+
+  ; MMR6:        not16  $[[T0:[0-9]+]], $4
 
   %r = add i1 3, %a
   ret i1 %r
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/mul.ll b/llvm/test/CodeGen/Mips/llvm-ir/mul.ll
index 00b91d1413cfe6..2735d53f5fe060 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/mul.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/mul.ll
@@ -31,36 +31,22 @@ define signext i1 @mul_i1(i1 signext %a, i1 signext %b) {
 entry:
 ; ALL-LABEL: mul_i1:
 
-  ; M2:         mult    $4, $5
-  ; M2:         mflo    $[[T0:[0-9]+]]
-  ; M2:         andi    $[[T0]], $[[T0]], 1
-  ; M2:         negu    $2, $[[T0]]
+  ; M2:         and     $[[T0:[0-9]+]], $4, $5
 
-  ; 32R1-R5:    mul     $[[T0:[0-9]+]], $4, $5
-  ; 32R1-R5:    andi    $[[T0]], $[[T0]], 1
-  ; 32R1-R5:    negu    $2, $[[T0]]
+  ; 32R1-R5:    and     $[[T0:[0-9]+]], $4, $5
 
-  ; 32R6:       mul     $[[T0:[0-9]+]], $4, $5
-  ; 32R6:       andi    $[[T0]], $[[T0]], 1
-  ; 32R6:       negu    $2, $[[T0]]
+  ; 32R6:       and     $[[T0:[0-9]+]], $4, $5
 
-  ; M4:         mult    $4, $5
-  ; M4:         mflo    $[[T0:[0-9]+]]
-  ; M4:         andi    $[[T0]], $[[T0]], 1
-  ; M4:         negu    $2, $[[T0]]
+  ; M4:         and     $[[T0:[0-9]+]], $4, $5
 
-  ; 64R1-R5:    mul     $[[T0:[0-9]+]], $4, $5
-  ; 64R1-R5:    andi    $[[T0]], $[[T0]], 1
-  ; 64R1-R5:    negu    $2, $[[T0]]
+  ; 64R1-R5:    and     $[[T0:[0-9]+]], $4, $5
 
-  ; 64R6:       mul     $[[T0:[0-9]+]], $4, $5
-  ; 64R6:       andi    $[[T0]], $[[T0]], 1
-  ; 64R6:       negu    $2, $[[T0]]
+  ; 64R6:       and     $[[T0:[0-9]+]], $4, $5
 
-  ; MM32:       mul     $[[T0:[0-9]+]], $4, $5
-  ; MM32:       andi16  $[[T0]], $[[T0]], 1
-  ; MM32:       li16    $[[T1:[0-9]+]], 0
-  ; MM32:       subu16  $2, $[[T1]], $[[T0]]
+  ; MM32R3:     and16 $4, $5
+  ; MM32R3:     move $2, $4
+
+  ; MM32R6:     and     $[[T0:[0-9]+]], $4, $5
 
   %r = mul i1 %a, %b
   ret i1 %r
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/sub.ll b/llvm/test/CodeGen/Mips/llvm-ir/sub.ll
index b465e24d47a059..dd5e6e957245d0 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/sub.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/sub.ll
@@ -33,14 +33,9 @@ define signext i1 @sub_i1(i1 signext %a, i1 signext %b) {
 entry:
 ; ALL-LABEL: sub_i1:
 
-  ; NOT-MM:         subu    $[[T0:[0-9]+]], $4, $5
-  ; NOT-MM:         andi    $[[T0]], $[[T0]], 1
-  ; NOT-MM:         negu    $2, $[[T0]]
+  ; NOT-MM:         xor     $[[T0:[0-9]+]], $4, $5
 
-  ; MM:             subu16  $[[T0:[0-9]+]], $4, $5
-  ; MM:             andi16  $[[T0]], $[[T0]], 1
-  ; MM:             li16    $[[T1:[0-9]+]], 0
-  ; MM:             subu16  $2, $[[T1]], $[[T0]]
+  ; MM:             xor16   $[[T0:[0-9]+]], $4, $5
 
   %r = sub i1 %a, %b
   ret i1 %r
diff --git a/llvm/test/CodeGen/NVPTX/boolean-patterns.ll b/llvm/test/CodeGen/NVPTX/boolean-patterns.ll
index 6ed98906108269..fd4d325ae9374d 100644
--- a/llvm/test/CodeGen/NVPTX/boolean-patterns.ll
+++ b/llv...
[truncated]

@RKSimon RKSimon merged commit b7c8271 into llvm:main Feb 3, 2025
14 checks passed
@RKSimon RKSimon deleted the dag-boolean-arithmetic branch February 3, 2025 16:36
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…tions (llvm#125486)

We already do this for vector vXi1 types - this patch removes the vector constraint to handle it for all bool types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants