[DAGCombiner] Add basic support for `trunc nsw/nuw` #113808

dtcxzyw · 2024-10-27T13:46:14Z

This patch adds basic support for trunc nsw/nuw in SDAG. It will allow DAGCombiner to further eliminate in-reg zext/sext instructions.

PR Link: llvm/llvm-project#113808

llvmbot · 2024-10-27T14:26:01Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch adds basic support for trunc nsw/nuw in SDAG. It will allow DAGCombiner to further eliminate in-reg zext/sext instructions.

Full diff: https://github.com/llvm/llvm-project/pull/113808.diff

5 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+27-27)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+7-1)
(added) llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll (+60)
(added) llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll (+78)
(added) llvm/test/CodeGen/X86/trunc-nsw-nuw.ll (+58)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index ad2d2ede302af8..0b249f2f7267bd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -2329,6 +2329,8 @@ static bool isTruncateOf(SelectionDAG &DAG, SDValue N, SDValue &Op,
   if (N->getOpcode() == ISD::TRUNCATE) {
     Op = N->getOperand(0);
     Known = DAG.computeKnownBits(Op);
+    if (N->getFlags().hasNoUnsignedWrap())
+      Known.Zero.setBitsFrom(N.getScalarValueSizeInBits());
     return true;
   }
 
@@ -13793,23 +13795,22 @@ SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {
     unsigned OpBits   = Op.getScalarValueSizeInBits();
     unsigned MidBits  = N0.getScalarValueSizeInBits();
     unsigned DestBits = VT.getScalarSizeInBits();
-    unsigned NumSignBits = DAG.ComputeNumSignBits(Op);
 
-    if (OpBits == DestBits) {
-      // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
-      // bits, it is already ready.
-      if (NumSignBits > DestBits-MidBits)
+    if (N0->getFlags().hasNoSignedWrap() ||
+        DAG.ComputeNumSignBits(Op) > OpBits - MidBits) {
+      if (OpBits == DestBits) {
+        // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
+        // bits, it is already ready.
         return Op;
-    } else if (OpBits < DestBits) {
-      // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
-      // bits, just sext from i32.
-      if (NumSignBits > OpBits-MidBits)
+      } else if (OpBits < DestBits) {
+        // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
+        // bits, just sext from i32.
         return DAG.getNode(ISD::SIGN_EXTEND, DL, VT, Op);
-    } else {
-      // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
-      // bits, just truncate to i32.
-      if (NumSignBits > OpBits-MidBits)
+      } else {
+        // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
+        // bits, just truncate to i32.
         return DAG.getNode(ISD::TRUNCATE, DL, VT, Op);
+      }
     }
 
     // fold (sext (truncate x)) -> (sextinreg x).
@@ -14083,24 +14084,23 @@ SDValue DAGCombiner::visitZERO_EXTEND(SDNode *N) {
       unsigned OpBits = SrcVT.getScalarSizeInBits();
       unsigned MidBits = MinVT.getScalarSizeInBits();
       unsigned DestBits = VT.getScalarSizeInBits();
-      unsigned NumSignBits = DAG.ComputeNumSignBits(Op);
 
-      if (OpBits == DestBits) {
-        // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
-        // bits, it is already ready.
-        if (NumSignBits > DestBits - MidBits)
+      if (N0->getFlags().hasNoSignedWrap() ||
+          DAG.ComputeNumSignBits(Op) > OpBits - MidBits) {
+        if (OpBits == DestBits) {
+          // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
+          // bits, it is already ready.
           return Op;
-      } else if (OpBits < DestBits) {
-        // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
-        // bits, just sext from i32.
-        // FIXME: This can probably be ZERO_EXTEND nneg?
-        if (NumSignBits > OpBits - MidBits)
+        } else if (OpBits < DestBits) {
+          // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
+          // bits, just sext from i32.
+          // FIXME: This can probably be ZERO_EXTEND nneg?
           return DAG.getNode(ISD::SIGN_EXTEND, DL, VT, Op);
-      } else {
-        // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
-        // bits, just truncate to i32.
-        if (NumSignBits > OpBits - MidBits)
+        } else {
+          // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
+          // bits, just truncate to i32.
           return DAG.getNode(ISD::TRUNCATE, DL, VT, Op);
+        }
       }
     }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 8450553743074c..e1e4db79627ef6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3823,7 +3823,13 @@ void SelectionDAGBuilder::visitTrunc(const User &I) {
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
-  setValue(&I, DAG.getNode(ISD::TRUNCATE, getCurSDLoc(), DestVT, N));
+  SDNodeFlags Flags;
+  if (auto *Trunc = dyn_cast<TruncInst>(&I)) {
+    Flags.setNoSignedWrap(Trunc->hasNoSignedWrap());
+    Flags.setNoUnsignedWrap(Trunc->hasNoUnsignedWrap());
+  }
+
+  setValue(&I, DAG.getNode(ISD::TRUNCATE, getCurSDLoc(), DestVT, N, Flags));
 }
 
 void SelectionDAGBuilder::visitZExt(const User &I) {
diff --git a/llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll b/llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll
new file mode 100644
index 00000000000000..6041db74639f32
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s
+
+define zeroext i32 @trunc_nuw_nsw_urem(i64 %x) nounwind {
+; CHECK-LABEL: trunc_nuw_nsw_urem:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #5977 // =0x1759
+; CHECK-NEXT:    mov w9, #10000 // =0x2710
+; CHECK-NEXT:    movk w8, #53687, lsl #16
+; CHECK-NEXT:    mul x8, x0, x8
+; CHECK-NEXT:    lsr x8, x8, #45
+; CHECK-NEXT:    msub w0, w8, w9, w0
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %rem = urem i32 %trunc, 10000
+  ret i32 %rem
+}
+
+define i64 @zext_nneg_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: zext_nneg_udiv_trunc_nuw:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #52429 // =0xcccd
+; CHECK-NEXT:    mul w8, w0, w8
+; CHECK-NEXT:    lsr w0, w8, #23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = zext nneg i16 %div to i64
+  ret i64 %ext
+}
+
+define i64 @sext_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: sext_udiv_trunc_nuw:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #52429 // =0xcccd
+; CHECK-NEXT:    mul w8, w0, w8
+; CHECK-NEXT:    lsr w0, w8, #23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = sext i16 %div to i64
+  ret i64 %ext
+}
+
+define ptr @gep_nusw_zext_nneg_add_trunc_nuw_nsw(ptr %p, i64 %x) nounwind {
+; CHECK-LABEL: gep_nusw_zext_nneg_add_trunc_nuw_nsw:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    add w8, w1, #5
+; CHECK-NEXT:    add x0, x0, w8, uxtw #2
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %add = add nuw nsw i32 %trunc, 5
+  %offset = zext nneg i32 %add to i64
+  %gep = getelementptr nusw float, ptr %p, i64 %offset
+  ret ptr %gep
+}
diff --git a/llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll b/llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll
new file mode 100644
index 00000000000000..f270775adcc155
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m | FileCheck %s
+
+define signext i8 @trunc_nsw_add(i32 signext %x) nounwind {
+; CHECK-LABEL: trunc_nsw_add:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addiw a0, a0, 1
+; CHECK-NEXT:    ret
+entry:
+  %add = add nsw i32 %x, 1
+  %trunc = trunc nsw i32 %add to i8
+  ret i8 %trunc
+}
+
+define signext i32 @trunc_nuw_nsw_urem(i64 %x) nounwind {
+; CHECK-LABEL: trunc_nuw_nsw_urem:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, 210
+; CHECK-NEXT:    addiw a1, a1, -1167
+; CHECK-NEXT:    slli a1, a1, 12
+; CHECK-NEXT:    addi a1, a1, 1881
+; CHECK-NEXT:    mul a1, a0, a1
+; CHECK-NEXT:    srli a1, a1, 45
+; CHECK-NEXT:    lui a2, 2
+; CHECK-NEXT:    addi a2, a2, 1808
+; CHECK-NEXT:    mul a1, a1, a2
+; CHECK-NEXT:    subw a0, a0, a1
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %rem = urem i32 %trunc, 10000
+  ret i32 %rem
+}
+
+define i64 @zext_nneg_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: zext_nneg_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, 13
+; CHECK-NEXT:    addi a1, a1, -819
+; CHECK-NEXT:    mul a0, a0, a1
+; CHECK-NEXT:    srliw a0, a0, 23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = zext nneg i16 %div to i64
+  ret i64 %ext
+}
+
+define i64 @sext_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: sext_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, 13
+; CHECK-NEXT:    addi a1, a1, -819
+; CHECK-NEXT:    mul a0, a0, a1
+; CHECK-NEXT:    srliw a0, a0, 23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = sext i16 %div to i64
+  ret i64 %ext
+}
+
+define ptr @gep_nusw_zext_nneg_add_trunc_nuw_nsw(ptr %p, i64 %x) nounwind {
+; CHECK-LABEL: gep_nusw_zext_nneg_add_trunc_nuw_nsw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    slli a1, a1, 2
+; CHECK-NEXT:    add a0, a1, a0
+; CHECK-NEXT:    addi a0, a0, 20
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %add = add nuw nsw i32 %trunc, 5
+  %offset = zext nneg i32 %add to i64
+  %gep = getelementptr nusw float, ptr %p, i64 %offset
+  ret ptr %gep
+}
diff --git a/llvm/test/CodeGen/X86/trunc-nsw-nuw.ll b/llvm/test/CodeGen/X86/trunc-nsw-nuw.ll
new file mode 100644
index 00000000000000..40b48bec8fffd7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/trunc-nsw-nuw.ll
@@ -0,0 +1,58 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s
+
+define zeroext i32 @trunc_nuw_nsw_urem(i64 %x) nounwind {
+; CHECK-LABEL: trunc_nuw_nsw_urem:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    movl $3518437209, %ecx # imm = 0xD1B71759
+; CHECK-NEXT:    imulq %rdi, %rcx
+; CHECK-NEXT:    shrq $45, %rcx
+; CHECK-NEXT:    imull $10000, %ecx, %ecx # imm = 0x2710
+; CHECK-NEXT:    subl %ecx, %eax
+; CHECK-NEXT:    # kill: def $eax killed $eax killed $rax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %rem = urem i32 %trunc, 10000
+  ret i32 %rem
+}
+
+define i64 @zext_nneg_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: zext_nneg_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    imull $52429, %edi, %eax # imm = 0xCCCD
+; CHECK-NEXT:    shrl $23, %eax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = zext nneg i16 %div to i64
+  ret i64 %ext
+}
+
+define i64 @sext_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: sext_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    imull $52429, %edi, %eax # imm = 0xCCCD
+; CHECK-NEXT:    shrl $23, %eax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = sext i16 %div to i64
+  ret i64 %ext
+}
+
+define ptr @gep_nusw_zext_nneg_add_trunc_nuw_nsw(ptr %p, i64 %x) nounwind {
+; CHECK-LABEL: gep_nusw_zext_nneg_add_trunc_nuw_nsw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    leaq 20(%rdi,%rsi,4), %rax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %add = add nuw nsw i32 %trunc, 5
+  %offset = zext nneg i32 %add to i64
+  %gep = getelementptr nusw float, ptr %p, i64 %offset
+  ret ptr %gep
+}

llvmbot · 2024-10-27T14:26:02Z

@llvm/pr-subscribers-backend-x86

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch adds basic support for trunc nsw/nuw in SDAG. It will allow DAGCombiner to further eliminate in-reg zext/sext instructions.

Full diff: https://github.com/llvm/llvm-project/pull/113808.diff

5 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+27-27)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+7-1)
(added) llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll (+60)
(added) llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll (+78)
(added) llvm/test/CodeGen/X86/trunc-nsw-nuw.ll (+58)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index ad2d2ede302af8..0b249f2f7267bd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -2329,6 +2329,8 @@ static bool isTruncateOf(SelectionDAG &DAG, SDValue N, SDValue &Op,
   if (N->getOpcode() == ISD::TRUNCATE) {
     Op = N->getOperand(0);
     Known = DAG.computeKnownBits(Op);
+    if (N->getFlags().hasNoUnsignedWrap())
+      Known.Zero.setBitsFrom(N.getScalarValueSizeInBits());
     return true;
   }
 
@@ -13793,23 +13795,22 @@ SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {
     unsigned OpBits   = Op.getScalarValueSizeInBits();
     unsigned MidBits  = N0.getScalarValueSizeInBits();
     unsigned DestBits = VT.getScalarSizeInBits();
-    unsigned NumSignBits = DAG.ComputeNumSignBits(Op);
 
-    if (OpBits == DestBits) {
-      // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
-      // bits, it is already ready.
-      if (NumSignBits > DestBits-MidBits)
+    if (N0->getFlags().hasNoSignedWrap() ||
+        DAG.ComputeNumSignBits(Op) > OpBits - MidBits) {
+      if (OpBits == DestBits) {
+        // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
+        // bits, it is already ready.
         return Op;
-    } else if (OpBits < DestBits) {
-      // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
-      // bits, just sext from i32.
-      if (NumSignBits > OpBits-MidBits)
+      } else if (OpBits < DestBits) {
+        // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
+        // bits, just sext from i32.
         return DAG.getNode(ISD::SIGN_EXTEND, DL, VT, Op);
-    } else {
-      // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
-      // bits, just truncate to i32.
-      if (NumSignBits > OpBits-MidBits)
+      } else {
+        // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
+        // bits, just truncate to i32.
         return DAG.getNode(ISD::TRUNCATE, DL, VT, Op);
+      }
     }
 
     // fold (sext (truncate x)) -> (sextinreg x).
@@ -14083,24 +14084,23 @@ SDValue DAGCombiner::visitZERO_EXTEND(SDNode *N) {
       unsigned OpBits = SrcVT.getScalarSizeInBits();
       unsigned MidBits = MinVT.getScalarSizeInBits();
       unsigned DestBits = VT.getScalarSizeInBits();
-      unsigned NumSignBits = DAG.ComputeNumSignBits(Op);
 
-      if (OpBits == DestBits) {
-        // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
-        // bits, it is already ready.
-        if (NumSignBits > DestBits - MidBits)
+      if (N0->getFlags().hasNoSignedWrap() ||
+          DAG.ComputeNumSignBits(Op) > OpBits - MidBits) {
+        if (OpBits == DestBits) {
+          // Op is i32, Mid is i8, and Dest is i32.  If Op has more than 24 sign
+          // bits, it is already ready.
           return Op;
-      } else if (OpBits < DestBits) {
-        // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
-        // bits, just sext from i32.
-        // FIXME: This can probably be ZERO_EXTEND nneg?
-        if (NumSignBits > OpBits - MidBits)
+        } else if (OpBits < DestBits) {
+          // Op is i32, Mid is i8, and Dest is i64.  If Op has more than 24 sign
+          // bits, just sext from i32.
+          // FIXME: This can probably be ZERO_EXTEND nneg?
           return DAG.getNode(ISD::SIGN_EXTEND, DL, VT, Op);
-      } else {
-        // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
-        // bits, just truncate to i32.
-        if (NumSignBits > OpBits - MidBits)
+        } else {
+          // Op is i64, Mid is i8, and Dest is i32.  If Op has more than 56 sign
+          // bits, just truncate to i32.
           return DAG.getNode(ISD::TRUNCATE, DL, VT, Op);
+        }
       }
     }
 
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 8450553743074c..e1e4db79627ef6 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3823,7 +3823,13 @@ void SelectionDAGBuilder::visitTrunc(const User &I) {
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
-  setValue(&I, DAG.getNode(ISD::TRUNCATE, getCurSDLoc(), DestVT, N));
+  SDNodeFlags Flags;
+  if (auto *Trunc = dyn_cast<TruncInst>(&I)) {
+    Flags.setNoSignedWrap(Trunc->hasNoSignedWrap());
+    Flags.setNoUnsignedWrap(Trunc->hasNoUnsignedWrap());
+  }
+
+  setValue(&I, DAG.getNode(ISD::TRUNCATE, getCurSDLoc(), DestVT, N, Flags));
 }
 
 void SelectionDAGBuilder::visitZExt(const User &I) {
diff --git a/llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll b/llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll
new file mode 100644
index 00000000000000..6041db74639f32
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/trunc-nsw-nuw.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s
+
+define zeroext i32 @trunc_nuw_nsw_urem(i64 %x) nounwind {
+; CHECK-LABEL: trunc_nuw_nsw_urem:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #5977 // =0x1759
+; CHECK-NEXT:    mov w9, #10000 // =0x2710
+; CHECK-NEXT:    movk w8, #53687, lsl #16
+; CHECK-NEXT:    mul x8, x0, x8
+; CHECK-NEXT:    lsr x8, x8, #45
+; CHECK-NEXT:    msub w0, w8, w9, w0
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %rem = urem i32 %trunc, 10000
+  ret i32 %rem
+}
+
+define i64 @zext_nneg_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: zext_nneg_udiv_trunc_nuw:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #52429 // =0xcccd
+; CHECK-NEXT:    mul w8, w0, w8
+; CHECK-NEXT:    lsr w0, w8, #23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = zext nneg i16 %div to i64
+  ret i64 %ext
+}
+
+define i64 @sext_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: sext_udiv_trunc_nuw:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    mov w8, #52429 // =0xcccd
+; CHECK-NEXT:    mul w8, w0, w8
+; CHECK-NEXT:    lsr w0, w8, #23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = sext i16 %div to i64
+  ret i64 %ext
+}
+
+define ptr @gep_nusw_zext_nneg_add_trunc_nuw_nsw(ptr %p, i64 %x) nounwind {
+; CHECK-LABEL: gep_nusw_zext_nneg_add_trunc_nuw_nsw:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    add w8, w1, #5
+; CHECK-NEXT:    add x0, x0, w8, uxtw #2
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %add = add nuw nsw i32 %trunc, 5
+  %offset = zext nneg i32 %add to i64
+  %gep = getelementptr nusw float, ptr %p, i64 %offset
+  ret ptr %gep
+}
diff --git a/llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll b/llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll
new file mode 100644
index 00000000000000..f270775adcc155
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/trunc-nsw-nuw.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=riscv64 -mattr=+m | FileCheck %s
+
+define signext i8 @trunc_nsw_add(i32 signext %x) nounwind {
+; CHECK-LABEL: trunc_nsw_add:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    addiw a0, a0, 1
+; CHECK-NEXT:    ret
+entry:
+  %add = add nsw i32 %x, 1
+  %trunc = trunc nsw i32 %add to i8
+  ret i8 %trunc
+}
+
+define signext i32 @trunc_nuw_nsw_urem(i64 %x) nounwind {
+; CHECK-LABEL: trunc_nuw_nsw_urem:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, 210
+; CHECK-NEXT:    addiw a1, a1, -1167
+; CHECK-NEXT:    slli a1, a1, 12
+; CHECK-NEXT:    addi a1, a1, 1881
+; CHECK-NEXT:    mul a1, a0, a1
+; CHECK-NEXT:    srli a1, a1, 45
+; CHECK-NEXT:    lui a2, 2
+; CHECK-NEXT:    addi a2, a2, 1808
+; CHECK-NEXT:    mul a1, a1, a2
+; CHECK-NEXT:    subw a0, a0, a1
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %rem = urem i32 %trunc, 10000
+  ret i32 %rem
+}
+
+define i64 @zext_nneg_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: zext_nneg_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, 13
+; CHECK-NEXT:    addi a1, a1, -819
+; CHECK-NEXT:    mul a0, a0, a1
+; CHECK-NEXT:    srliw a0, a0, 23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = zext nneg i16 %div to i64
+  ret i64 %ext
+}
+
+define i64 @sext_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: sext_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    lui a1, 13
+; CHECK-NEXT:    addi a1, a1, -819
+; CHECK-NEXT:    mul a0, a0, a1
+; CHECK-NEXT:    srliw a0, a0, 23
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = sext i16 %div to i64
+  ret i64 %ext
+}
+
+define ptr @gep_nusw_zext_nneg_add_trunc_nuw_nsw(ptr %p, i64 %x) nounwind {
+; CHECK-LABEL: gep_nusw_zext_nneg_add_trunc_nuw_nsw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    slli a1, a1, 2
+; CHECK-NEXT:    add a0, a1, a0
+; CHECK-NEXT:    addi a0, a0, 20
+; CHECK-NEXT:    ret
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %add = add nuw nsw i32 %trunc, 5
+  %offset = zext nneg i32 %add to i64
+  %gep = getelementptr nusw float, ptr %p, i64 %offset
+  ret ptr %gep
+}
diff --git a/llvm/test/CodeGen/X86/trunc-nsw-nuw.ll b/llvm/test/CodeGen/X86/trunc-nsw-nuw.ll
new file mode 100644
index 00000000000000..40b48bec8fffd7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/trunc-nsw-nuw.ll
@@ -0,0 +1,58 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=x86_64 | FileCheck %s
+
+define zeroext i32 @trunc_nuw_nsw_urem(i64 %x) nounwind {
+; CHECK-LABEL: trunc_nuw_nsw_urem:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    movq %rdi, %rax
+; CHECK-NEXT:    movl $3518437209, %ecx # imm = 0xD1B71759
+; CHECK-NEXT:    imulq %rdi, %rcx
+; CHECK-NEXT:    shrq $45, %rcx
+; CHECK-NEXT:    imull $10000, %ecx, %ecx # imm = 0x2710
+; CHECK-NEXT:    subl %ecx, %eax
+; CHECK-NEXT:    # kill: def $eax killed $eax killed $rax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %rem = urem i32 %trunc, 10000
+  ret i32 %rem
+}
+
+define i64 @zext_nneg_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: zext_nneg_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    imull $52429, %edi, %eax # imm = 0xCCCD
+; CHECK-NEXT:    shrl $23, %eax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = zext nneg i16 %div to i64
+  ret i64 %ext
+}
+
+define i64 @sext_udiv_trunc_nuw(i64 %x) nounwind {
+; CHECK-LABEL: sext_udiv_trunc_nuw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    imull $52429, %edi, %eax # imm = 0xCCCD
+; CHECK-NEXT:    shrl $23, %eax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw i64 %x to i16
+  %div = udiv i16 %trunc, 160
+  %ext = sext i16 %div to i64
+  ret i64 %ext
+}
+
+define ptr @gep_nusw_zext_nneg_add_trunc_nuw_nsw(ptr %p, i64 %x) nounwind {
+; CHECK-LABEL: gep_nusw_zext_nneg_add_trunc_nuw_nsw:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    leaq 20(%rdi,%rsi,4), %rax
+; CHECK-NEXT:    retq
+entry:
+  %trunc = trunc nuw nsw i64 %x to i32
+  %add = add nuw nsw i32 %trunc, 5
+  %offset = zext nneg i32 %add to i64
+  %gep = getelementptr nusw float, ptr %p, i64 %offset
+  ret ptr %gep
+}

goldsteinn · 2024-10-27T14:55:55Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

-        // bits, it is already ready.
-        if (NumSignBits > DestBits - MidBits)
+      if (N0->getFlags().hasNoSignedWrap() ||
+          DAG.ComputeNumSignBits(Op) > OpBits - MidBits) {


Should probably just set nsw if visitTrunc if computeNumSignBits >= OldSize - NewSize
Likewise nuw if high bits are zero.

dtcxzyw · 2024-10-28T05:57:40Z

Miscompilation:

define i32 @test(i1 zeroext %x, i1 zeroext %y) {
entry:
  %sel = select i1 %y, i64 4, i64 0
  %conv0 = sext i1 %x to i64
  %xor = xor i64 %sel, %conv0
  %conv1 = trunc nsw i64 %xor to i32
  %div = sdiv i32 %conv1, -10765
  ret i32 %div
}

Before:

Optimized legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 27 nodes:
  t0: ch,glue = EntryToken
            t7: i32,ch = CopyFromReg t0, Register:i32 %1
          t71: i32 = AssertZext t7, ValueType:ch:i1
        t57: i32 = shl t71, Constant:i8<2>
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t73: i32 = AssertZext t2, ValueType:ch:i1
        t78: i32 = sub Constant:i32<0>, t73
      t48: i32 = xor t57, t78
    t62: i64 = sign_extend t48
  t42: i64 = mul t62, Constant:i64<-1634202141>
        t64: i64 = sra t42, Constant:i8<44>
      t65: i32 = truncate t64
        t68: i64 = srl t42, Constant:i8<63>
      t69: i32 = truncate t68
    t37: i32 = add t65, t69
  t25: ch,glue = CopyToReg t0, Register:i32 $eax, t37
  t26: ch = X86ISD::RET_GLUE t25, TargetConstant:i32<0>, Register:i32 $eax, t25:1

After:

Optimized legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 27 nodes:
  t0: ch,glue = EntryToken
            t7: i32,ch = CopyFromReg t0, Register:i32 %1
          t66: i32 = AssertZext t7, ValueType:ch:i1
        t57: i32 = shl t66, Constant:i8<2>
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t68: i32 = AssertZext t2, ValueType:ch:i1
        t73: i32 = sub Constant:i32<0>, t68
      t48: i32 = xor t57, t73
    t49: i64 = any_extend t48
  t42: i64 = mul t49, Constant:i64<-1634202141>
        t61: i64 = sra t42, Constant:i8<44>
      t62: i32 = truncate t61
        t64: i64 = srl t42, Constant:i8<63>
      t65: i32 = truncate t64
    t37: i32 = add t62, t65
  t25: ch,glue = CopyToReg t0, Register:i32 $eax, t37
  t26: ch = X86ISD::RET_GLUE t25, TargetConstant:i32<0>, Register:i32 $eax, t25:1

dtcxzyw · 2024-10-28T06:12:21Z

Miscompilation:

define i32 @test(i1 zeroext %x, i1 zeroext %y) {
entry:
  %sel = select i1 %y, i64 4, i64 0
  %conv0 = sext i1 %x to i64
  %xor = xor i64 %sel, %conv0
  %conv1 = trunc nsw i64 %xor to i32
  %div = sdiv i32 %conv1, -10765
  ret i32 %div
}

Before:

Optimized legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 27 nodes:
  t0: ch,glue = EntryToken
            t7: i32,ch = CopyFromReg t0, Register:i32 %1
          t71: i32 = AssertZext t7, ValueType:ch:i1
        t57: i32 = shl t71, Constant:i8<2>
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t73: i32 = AssertZext t2, ValueType:ch:i1
        t78: i32 = sub Constant:i32<0>, t73
      t48: i32 = xor t57, t78
    t62: i64 = sign_extend t48
  t42: i64 = mul t62, Constant:i64<-1634202141>
        t64: i64 = sra t42, Constant:i8<44>
      t65: i32 = truncate t64
        t68: i64 = srl t42, Constant:i8<63>
      t69: i32 = truncate t68
    t37: i32 = add t65, t69
  t25: ch,glue = CopyToReg t0, Register:i32 $eax, t37
  t26: ch = X86ISD::RET_GLUE t25, TargetConstant:i32<0>, Register:i32 $eax, t25:1

After:

Optimized legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 27 nodes:
  t0: ch,glue = EntryToken
            t7: i32,ch = CopyFromReg t0, Register:i32 %1
          t66: i32 = AssertZext t7, ValueType:ch:i1
        t57: i32 = shl t66, Constant:i8<2>
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t68: i32 = AssertZext t2, ValueType:ch:i1
        t73: i32 = sub Constant:i32<0>, t68
      t48: i32 = xor t57, t73
    t49: i64 = any_extend t48
  t42: i64 = mul t49, Constant:i64<-1634202141>
        t61: i64 = sra t42, Constant:i8<44>
      t62: i32 = truncate t61
        t64: i64 = srl t42, Constant:i8<63>
      t65: i32 = truncate t64
    t37: i32 = add t62, t65
  t25: ch,glue = CopyToReg t0, Register:i32 $eax, t37
  t26: ch = X86ISD::RET_GLUE t25, TargetConstant:i32<0>, Register:i32 $eax, t25:1

These flags should be cleared in TargetLowering::SimplifyDemandedBits.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in #113808 (comment).

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

goldsteinn · 2024-10-31T14:18:22Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

-      // bits, it is already ready.
-      if (NumSignBits > DestBits-MidBits)
+    if (N0->getFlags().hasNoSignedWrap() ||
+        DAG.ComputeNumSignBits(Op) > OpBits - MidBits) {


Still think the way to do this is just set nsw/nuw in visitTRUNC and drop the compute...

Unlike in InstCombine, we don't infer poison generating flags in SDAG. Without recursive analysis we may miss some optimization opportunities.

I guess you're right. Is there a good reason for that?

This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in llvm#113808 (comment).

dtcxzyw · 2024-11-05T09:58:45Z

Ping.

nikic

LGTM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

…dedBits`

dtcxzyw mentioned this pull request Oct 27, 2024

Fuzz PR113808 dtcxzyw/llvm-fuzz-service#28

Closed

dtcxzyw added a commit to dtcxzyw/llvm-codegen-benchmark that referenced this pull request Oct 27, 2024

pre-commit: test PR113808

9cb360d

PR Link: llvm/llvm-project#113808

dtcxzyw mentioned this pull request Oct 27, 2024

pre-commit: test PR113808 dtcxzyw/llvm-codegen-benchmark#142

Closed

dtcxzyw changed the title ~~Perf/trunc nsw nuw codegen~~ [DAGCombiner] Add basic supprt for trunc nsw/nuw Oct 27, 2024

dtcxzyw changed the title ~~[DAGCombiner] Add basic supprt for trunc nsw/nuw~~ [DAGCombiner] Add basic support for trunc nsw/nuw Oct 27, 2024

dtcxzyw requested review from asb, nikic, preames, RKSimon, antoniofrighetto, davemgreen, topperc, goldsteinn and wangpc-pp October 27, 2024 14:25

dtcxzyw marked this pull request as ready for review October 27, 2024 14:25

llvmbot added backend:AArch64 backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Oct 27, 2024

goldsteinn reviewed Oct 27, 2024

View reviewed changes

arsenm reviewed Oct 29, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Outdated Show resolved Hide resolved

dtcxzyw mentioned this pull request Oct 29, 2024

[SDAG] Simplify SDNodeFlags with bitwise logic #114061

Merged

dtcxzyw added a commit that referenced this pull request Oct 31, 2024

[SDAG] Simplify SDNodeFlags with bitwise logic (#114061)

cf9d1c1

This patch allows using enumeration values directly and simplifies the implementation with bitwise logic. It addresses the comment in #113808 (comment).

dtcxzyw force-pushed the perf/trunc-nsw-nuw-codegen branch from fced38f to 9137580 Compare October 31, 2024 01:12

topperc reviewed Oct 31, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Outdated Show resolved Hide resolved

goldsteinn reviewed Oct 31, 2024

View reviewed changes

nikic approved these changes Nov 5, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Outdated Show resolved Hide resolved

dtcxzyw added 8 commits November 6, 2024 22:19

[DAGCombiner] Add pre-commit tests. NFC.

a64ac3f

[DAGCombiner] Add basic support for trunc nsw/nuw

708a6ee

[DAGCombiner] Propagate nsw/nuw flags

5648aa3

[DAGCombiner] Add miscompilation reproducer. NFC.

1e928cd

[TargetLowering] Drop nuw/nsw flags in `TargetLowering::SimplifyDeman…

52981c1

…dedBits`

[TargetLowering] Use dropFlags API. NFC.

6a6d6e5

[SDAG] Remove else after return. NFC.

6bc000c

[TargetLowering] Remove unused variable. NFC.

4b2274a

dtcxzyw force-pushed the perf/trunc-nsw-nuw-codegen branch from c6cfcbf to 4b2274a Compare November 6, 2024 15:10

dtcxzyw merged commit f74aed7 into llvm:main Nov 6, 2024
6 of 7 checks passed

dtcxzyw deleted the perf/trunc-nsw-nuw-codegen branch November 6, 2024 16:23

dtcxzyw mentioned this pull request Dec 1, 2024

[CodeGenPrepare] Drop nsw flags in optimizeLoadExt #118180

Merged

RKSimon mentioned this pull request Feb 4, 2025

[DAG] SelectionDAGBuilder - add ISD::TRUNCATE NSW/NUW flags support and initial testing #89237

Closed

dlee992 mentioned this pull request Jun 18, 2025

[DAGCombiner] zext (trunc nuw nsw (x)) incorrectly optimized away #144736

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DAGCombiner] Add basic support for `trunc nsw/nuw` #113808

[DAGCombiner] Add basic support for `trunc nsw/nuw` #113808

Uh oh!

dtcxzyw commented Oct 27, 2024 •

edited

Loading

Uh oh!

llvmbot commented Oct 27, 2024 •

edited

Loading

Uh oh!

llvmbot commented Oct 27, 2024

Uh oh!

goldsteinn Oct 27, 2024

Uh oh!

dtcxzyw commented Oct 28, 2024

Uh oh!

dtcxzyw commented Oct 28, 2024

Uh oh!

Uh oh!

Uh oh!

goldsteinn Oct 31, 2024

Uh oh!

dtcxzyw Oct 31, 2024

Uh oh!

goldsteinn Nov 1, 2024

Uh oh!

dtcxzyw commented Nov 5, 2024

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[DAGCombiner] Add basic support for trunc nsw/nuw #113808

[DAGCombiner] Add basic support for trunc nsw/nuw #113808

Uh oh!

Conversation

dtcxzyw commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 27, 2024

Uh oh!

goldsteinn Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw commented Oct 28, 2024

Uh oh!

dtcxzyw commented Oct 28, 2024

Uh oh!

Uh oh!

Uh oh!

goldsteinn Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

goldsteinn Nov 1, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw commented Nov 5, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[DAGCombiner] Add basic support for `trunc nsw/nuw` #113808

[DAGCombiner] Add basic support for `trunc nsw/nuw` #113808

dtcxzyw commented Oct 27, 2024 •

edited

Loading

llvmbot commented Oct 27, 2024 •

edited

Loading