[RISCV] Enable the TypePromotion pass from AArch64/ARM. #81574

topperc · 2024-02-13T05:57:12Z

This pass looks for unsigned icmps that have illegal types and tries
to widen the use/def graph to improve the placement of the zero
extends that type legalization would need to insert.

I've explicitly disabled it for i32 by adding a check for
isSExtCheaperThanZExt to the pass.

The generated code isn't perfect on the lit test, but my data shows a net
dynamic instruction count improvement on spec2017 for both base and
Zba+Zbb+Zbs.

Spec2017 dynamic count for rv64gc_zba_zbb_zbs

Metric	Before	After	Before/After
500.perlbench_r	3032622956407	3032363566062	99.991%
502.gcc_r	1231223971793	1226644351851	99.628%
505.mcf_r	696461864012	696461368989	100.000%
520.omnetpp_r	1142136357540	1142234473240	100.009%
523.xalancbmk_r	1157112062701	1148472443956	99.253%
525.x264_r	3970723342408	3970671717801	99.999%
531.deepsjeng_r	1833000391081	1833000391091	100.000%
541.leela_r	2059859108461	2059858783398	100.000%
557.xz_r	1806198505643	1806198505653	100.000%

This pass looks for unsigned icmps that have illegal types and tries to widen the use/def graph to improve the placement of the zero extends that type legalization would need to insert. I've explicitly disabled it for i32 by adding a check for isSExtCheaperThanZExt to the pass. The generated code isn't perfect, but my data shows a net dynamic instruction count improvement on spec2017 for both base and Zba+Zbb+Zbs.

llvmbot · 2024-02-13T05:57:40Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

This pass looks for unsigned icmps that have illegal types and tries
to widen the use/def graph to improve the placement of the zero
extends that type legalization would need to insert.

I've explicitly disabled it for i32 by adding a check for
isSExtCheaperThanZExt to the pass.

The generated code isn't perfect on the lit test, but my data shows a net
dynamic instruction count improvement on spec2017 for both base and
Zba+Zbb+Zbs.

Spec2017 dynamic count for rv64gc_zba_zbb_zbs

Metric	Before	After	Before/After
500.perlbench_r	3032622956407	3032363566062	99.991%
502.gcc_r	1231223971793	1226644351851	99.628%
505.mcf_r	696461864012	696461368989	100.000%
520.omnetpp_r	1142136357540	1142234473240	100.009%
523.xalancbmk_r	1157112062701	1148472443956	99.253%
525.x264_r	3970723342408	3970671717801	99.999%
531.deepsjeng_r	1833000391081	1833000391091	100.000%
541.leela_r	2059859108461	2059858783398	100.000%
557.xz_r	1806198505643	1806198505653	100.000%

Patch is 23.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81574.diff

7 Files Affected:

(modified) llvm/lib/CodeGen/TypePromotion.cpp (+2)
(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+7)
(modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1)
(modified) llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll (+64-28)
(modified) llvm/test/CodeGen/RISCV/signbit-test.ll (+24-6)
(modified) llvm/test/CodeGen/RISCV/signed-truncation-check.ll (+72-32)
(added) llvm/test/CodeGen/RISCV/typepromotion-overflow.ll (+387)

diff --git a/llvm/lib/CodeGen/TypePromotion.cpp b/llvm/lib/CodeGen/TypePromotion.cpp
index 053caf518bd1f7..7a3bc6c2043f4c 100644
--- a/llvm/lib/CodeGen/TypePromotion.cpp
+++ b/llvm/lib/CodeGen/TypePromotion.cpp
@@ -937,6 +937,8 @@ bool TypePromotionImpl::run(Function &F, const TargetMachine *TM,
       return 0;
 
     EVT PromotedVT = TLI->getTypeToTransformTo(*Ctx, SrcVT);
+    if (TLI->isSExtCheaperThanZExt(SrcVT, PromotedVT))
+      return 0;
     if (RegisterBitWidth < PromotedVT.getFixedSizeInBits()) {
       LLVM_DEBUG(dbgs() << "IR Promotion: Couldn't find target register "
                         << "for promoted type\n");
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 4c3da3ad311168..adef40e19cba4a 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -366,6 +366,7 @@ class RISCVPassConfig : public TargetPassConfig {
 
   void addIRPasses() override;
   bool addPreISel() override;
+  void addCodeGenPrepare() override;
   bool addInstSelector() override;
   bool addIRTranslator() override;
   void addPreLegalizeMachineIR() override;
@@ -452,6 +453,12 @@ bool RISCVPassConfig::addPreISel() {
   return false;
 }
 
+void RISCVPassConfig::addCodeGenPrepare() {
+  if (getOptLevel() != CodeGenOptLevel::None)
+    addPass(createTypePromotionLegacyPass());
+  TargetPassConfig::addCodeGenPrepare();
+}
+
 bool RISCVPassConfig::addInstSelector() {
   addPass(createRISCVISelDag(getRISCVTargetMachine(), getOptLevel()));
 
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index e7db8ef9d5aff3..364c1e430b9156 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -68,6 +68,7 @@
 ; CHECK-NEXT:       Expand reduction intrinsics
 ; CHECK-NEXT:       Natural Loop Information
 ; CHECK-NEXT:       TLS Variable Hoist
+; CHECK-NEXT:       Type Promotion
 ; CHECK-NEXT:       CodeGen Prepare
 ; CHECK-NEXT:       Dominator Tree Construction
 ; CHECK-NEXT:       Exception handling preparation
diff --git a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
index 9e7f2e9525d3b4..6e3a50542939f1 100644
--- a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
@@ -254,21 +254,39 @@ define i1 @shifts_necmp_i64_i8(i64 %x) nounwind {
 ; ---------------------------------------------------------------------------- ;
 
 define i1 @add_ultcmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ultcmp_i16_i8:
-; RV32:       # %bb.0:
-; RV32-NEXT:    addi a0, a0, -128
-; RV32-NEXT:    slli a0, a0, 16
-; RV32-NEXT:    srli a0, a0, 24
-; RV32-NEXT:    sltiu a0, a0, 255
-; RV32-NEXT:    ret
+; RV32I-LABEL: add_ultcmp_i16_i8:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    slli a0, a0, 16
+; RV32I-NEXT:    srli a0, a0, 16
+; RV32I-NEXT:    addi a0, a0, -128
+; RV32I-NEXT:    srli a0, a0, 8
+; RV32I-NEXT:    sltiu a0, a0, 255
+; RV32I-NEXT:    ret
 ;
-; RV64-LABEL: add_ultcmp_i16_i8:
-; RV64:       # %bb.0:
-; RV64-NEXT:    addi a0, a0, -128
-; RV64-NEXT:    slli a0, a0, 48
-; RV64-NEXT:    srli a0, a0, 56
-; RV64-NEXT:    sltiu a0, a0, 255
-; RV64-NEXT:    ret
+; RV64I-LABEL: add_ultcmp_i16_i8:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    slli a0, a0, 48
+; RV64I-NEXT:    srli a0, a0, 48
+; RV64I-NEXT:    addi a0, a0, -128
+; RV64I-NEXT:    srli a0, a0, 8
+; RV64I-NEXT:    sltiu a0, a0, 255
+; RV64I-NEXT:    ret
+;
+; RV32ZBB-LABEL: add_ultcmp_i16_i8:
+; RV32ZBB:       # %bb.0:
+; RV32ZBB-NEXT:    zext.h a0, a0
+; RV32ZBB-NEXT:    addi a0, a0, -128
+; RV32ZBB-NEXT:    srli a0, a0, 8
+; RV32ZBB-NEXT:    sltiu a0, a0, 255
+; RV32ZBB-NEXT:    ret
+;
+; RV64ZBB-LABEL: add_ultcmp_i16_i8:
+; RV64ZBB:       # %bb.0:
+; RV64ZBB-NEXT:    zext.h a0, a0
+; RV64ZBB-NEXT:    addi a0, a0, -128
+; RV64ZBB-NEXT:    srli a0, a0, 8
+; RV64ZBB-NEXT:    sltiu a0, a0, 255
+; RV64ZBB-NEXT:    ret
   %tmp0 = add i16 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp ult i16 %tmp0, -256 ; ~0U << 8
   ret i1 %tmp1
@@ -421,21 +439,39 @@ define i1 @add_ultcmp_i64_i8(i64 %x) nounwind {
 
 ; Slightly more canonical variant
 define i1 @add_ulecmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ulecmp_i16_i8:
-; RV32:       # %bb.0:
-; RV32-NEXT:    addi a0, a0, -128
-; RV32-NEXT:    slli a0, a0, 16
-; RV32-NEXT:    srli a0, a0, 24
-; RV32-NEXT:    sltiu a0, a0, 255
-; RV32-NEXT:    ret
+; RV32I-LABEL: add_ulecmp_i16_i8:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    slli a0, a0, 16
+; RV32I-NEXT:    srli a0, a0, 16
+; RV32I-NEXT:    addi a0, a0, -128
+; RV32I-NEXT:    srli a0, a0, 8
+; RV32I-NEXT:    sltiu a0, a0, 255
+; RV32I-NEXT:    ret
 ;
-; RV64-LABEL: add_ulecmp_i16_i8:
-; RV64:       # %bb.0:
-; RV64-NEXT:    addi a0, a0, -128
-; RV64-NEXT:    slli a0, a0, 48
-; RV64-NEXT:    srli a0, a0, 56
-; RV64-NEXT:    sltiu a0, a0, 255
-; RV64-NEXT:    ret
+; RV64I-LABEL: add_ulecmp_i16_i8:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    slli a0, a0, 48
+; RV64I-NEXT:    srli a0, a0, 48
+; RV64I-NEXT:    addi a0, a0, -128
+; RV64I-NEXT:    srli a0, a0, 8
+; RV64I-NEXT:    sltiu a0, a0, 255
+; RV64I-NEXT:    ret
+;
+; RV32ZBB-LABEL: add_ulecmp_i16_i8:
+; RV32ZBB:       # %bb.0:
+; RV32ZBB-NEXT:    zext.h a0, a0
+; RV32ZBB-NEXT:    addi a0, a0, -128
+; RV32ZBB-NEXT:    srli a0, a0, 8
+; RV32ZBB-NEXT:    sltiu a0, a0, 255
+; RV32ZBB-NEXT:    ret
+;
+; RV64ZBB-LABEL: add_ulecmp_i16_i8:
+; RV64ZBB:       # %bb.0:
+; RV64ZBB-NEXT:    zext.h a0, a0
+; RV64ZBB-NEXT:    addi a0, a0, -128
+; RV64ZBB-NEXT:    srli a0, a0, 8
+; RV64ZBB-NEXT:    sltiu a0, a0, 255
+; RV64ZBB-NEXT:    ret
   %tmp0 = add i16 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp ule i16 %tmp0, -257 ; ~0U << 8 - 1
   ret i1 %tmp1
diff --git a/llvm/test/CodeGen/RISCV/signbit-test.ll b/llvm/test/CodeGen/RISCV/signbit-test.ll
index 69a9026d9af9e2..4e10fae06d8860 100644
--- a/llvm/test/CodeGen/RISCV/signbit-test.ll
+++ b/llvm/test/CodeGen/RISCV/signbit-test.ll
@@ -303,7 +303,10 @@ define i16 @test_clear_mask_i16_i8(i16 %x) nounwind {
 ; RV32-NEXT:    bnez a1, .LBB10_2
 ; RV32-NEXT:  # %bb.1: # %t
 ; RV32-NEXT:    li a0, 42
-; RV32-NEXT:  .LBB10_2: # %f
+; RV32-NEXT:    ret
+; RV32-NEXT:  .LBB10_2:
+; RV32-NEXT:    slli a0, a0, 16
+; RV32-NEXT:    srli a0, a0, 16
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: test_clear_mask_i16_i8:
@@ -312,7 +315,10 @@ define i16 @test_clear_mask_i16_i8(i16 %x) nounwind {
 ; RV64-NEXT:    bnez a1, .LBB10_2
 ; RV64-NEXT:  # %bb.1: # %t
 ; RV64-NEXT:    li a0, 42
-; RV64-NEXT:  .LBB10_2: # %f
+; RV64-NEXT:    ret
+; RV64-NEXT:  .LBB10_2:
+; RV64-NEXT:    slli a0, a0, 48
+; RV64-NEXT:    srli a0, a0, 48
 ; RV64-NEXT:    ret
 entry:
   %a = and i16 %x, 128
@@ -332,7 +338,10 @@ define i16 @test_set_mask_i16_i8(i16 %x) nounwind {
 ; RV32-NEXT:    beqz a1, .LBB11_2
 ; RV32-NEXT:  # %bb.1: # %t
 ; RV32-NEXT:    li a0, 42
-; RV32-NEXT:  .LBB11_2: # %f
+; RV32-NEXT:    ret
+; RV32-NEXT:  .LBB11_2:
+; RV32-NEXT:    slli a0, a0, 16
+; RV32-NEXT:    srli a0, a0, 16
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: test_set_mask_i16_i8:
@@ -341,7 +350,10 @@ define i16 @test_set_mask_i16_i8(i16 %x) nounwind {
 ; RV64-NEXT:    beqz a1, .LBB11_2
 ; RV64-NEXT:  # %bb.1: # %t
 ; RV64-NEXT:    li a0, 42
-; RV64-NEXT:  .LBB11_2: # %f
+; RV64-NEXT:    ret
+; RV64-NEXT:  .LBB11_2:
+; RV64-NEXT:    slli a0, a0, 48
+; RV64-NEXT:    srli a0, a0, 48
 ; RV64-NEXT:    ret
 entry:
   %a = and i16 %x, 128
@@ -361,7 +373,10 @@ define i16 @test_set_mask_i16_i7(i16 %x) nounwind {
 ; RV32-NEXT:    beqz a1, .LBB12_2
 ; RV32-NEXT:  # %bb.1: # %t
 ; RV32-NEXT:    li a0, 42
-; RV32-NEXT:  .LBB12_2: # %f
+; RV32-NEXT:    ret
+; RV32-NEXT:  .LBB12_2:
+; RV32-NEXT:    slli a0, a0, 16
+; RV32-NEXT:    srli a0, a0, 16
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: test_set_mask_i16_i7:
@@ -370,7 +385,10 @@ define i16 @test_set_mask_i16_i7(i16 %x) nounwind {
 ; RV64-NEXT:    beqz a1, .LBB12_2
 ; RV64-NEXT:  # %bb.1: # %t
 ; RV64-NEXT:    li a0, 42
-; RV64-NEXT:  .LBB12_2: # %f
+; RV64-NEXT:    ret
+; RV64-NEXT:  .LBB12_2:
+; RV64-NEXT:    slli a0, a0, 48
+; RV64-NEXT:    srli a0, a0, 48
 ; RV64-NEXT:    ret
 entry:
   %a = and i16 %x, 64
diff --git a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
index 0860853ae9c0af..de36bcdb910609 100644
--- a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
@@ -254,23 +254,43 @@ define i1 @shifts_eqcmp_i64_i8(i64 %x) nounwind {
 ; ---------------------------------------------------------------------------- ;
 
 define i1 @add_ugecmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ugecmp_i16_i8:
-; RV32:       # %bb.0:
-; RV32-NEXT:    addi a0, a0, -128
-; RV32-NEXT:    slli a0, a0, 16
-; RV32-NEXT:    srli a0, a0, 24
-; RV32-NEXT:    sltiu a0, a0, 255
-; RV32-NEXT:    xori a0, a0, 1
-; RV32-NEXT:    ret
+; RV32I-LABEL: add_ugecmp_i16_i8:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    slli a0, a0, 16
+; RV32I-NEXT:    srli a0, a0, 16
+; RV32I-NEXT:    addi a0, a0, -128
+; RV32I-NEXT:    srli a0, a0, 8
+; RV32I-NEXT:    sltiu a0, a0, 255
+; RV32I-NEXT:    xori a0, a0, 1
+; RV32I-NEXT:    ret
 ;
-; RV64-LABEL: add_ugecmp_i16_i8:
-; RV64:       # %bb.0:
-; RV64-NEXT:    addi a0, a0, -128
-; RV64-NEXT:    slli a0, a0, 48
-; RV64-NEXT:    srli a0, a0, 56
-; RV64-NEXT:    sltiu a0, a0, 255
-; RV64-NEXT:    xori a0, a0, 1
-; RV64-NEXT:    ret
+; RV64I-LABEL: add_ugecmp_i16_i8:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    slli a0, a0, 48
+; RV64I-NEXT:    srli a0, a0, 48
+; RV64I-NEXT:    addi a0, a0, -128
+; RV64I-NEXT:    srli a0, a0, 8
+; RV64I-NEXT:    sltiu a0, a0, 255
+; RV64I-NEXT:    xori a0, a0, 1
+; RV64I-NEXT:    ret
+;
+; RV32ZBB-LABEL: add_ugecmp_i16_i8:
+; RV32ZBB:       # %bb.0:
+; RV32ZBB-NEXT:    zext.h a0, a0
+; RV32ZBB-NEXT:    addi a0, a0, -128
+; RV32ZBB-NEXT:    srli a0, a0, 8
+; RV32ZBB-NEXT:    sltiu a0, a0, 255
+; RV32ZBB-NEXT:    xori a0, a0, 1
+; RV32ZBB-NEXT:    ret
+;
+; RV64ZBB-LABEL: add_ugecmp_i16_i8:
+; RV64ZBB:       # %bb.0:
+; RV64ZBB-NEXT:    zext.h a0, a0
+; RV64ZBB-NEXT:    addi a0, a0, -128
+; RV64ZBB-NEXT:    srli a0, a0, 8
+; RV64ZBB-NEXT:    sltiu a0, a0, 255
+; RV64ZBB-NEXT:    xori a0, a0, 1
+; RV64ZBB-NEXT:    ret
   %tmp0 = add i16 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp uge i16 %tmp0, -256 ; ~0U << 8
   ret i1 %tmp1
@@ -471,23 +491,43 @@ define i1 @add_ugecmp_i64_i8(i64 %x) nounwind {
 
 ; Slightly more canonical variant
 define i1 @add_ugtcmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ugtcmp_i16_i8:
-; RV32:       # %bb.0:
-; RV32-NEXT:    addi a0, a0, -128
-; RV32-NEXT:    slli a0, a0, 16
-; RV32-NEXT:    srli a0, a0, 24
-; RV32-NEXT:    sltiu a0, a0, 255
-; RV32-NEXT:    xori a0, a0, 1
-; RV32-NEXT:    ret
+; RV32I-LABEL: add_ugtcmp_i16_i8:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    slli a0, a0, 16
+; RV32I-NEXT:    srli a0, a0, 16
+; RV32I-NEXT:    addi a0, a0, -128
+; RV32I-NEXT:    srli a0, a0, 8
+; RV32I-NEXT:    sltiu a0, a0, 255
+; RV32I-NEXT:    xori a0, a0, 1
+; RV32I-NEXT:    ret
 ;
-; RV64-LABEL: add_ugtcmp_i16_i8:
-; RV64:       # %bb.0:
-; RV64-NEXT:    addi a0, a0, -128
-; RV64-NEXT:    slli a0, a0, 48
-; RV64-NEXT:    srli a0, a0, 56
-; RV64-NEXT:    sltiu a0, a0, 255
-; RV64-NEXT:    xori a0, a0, 1
-; RV64-NEXT:    ret
+; RV64I-LABEL: add_ugtcmp_i16_i8:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    slli a0, a0, 48
+; RV64I-NEXT:    srli a0, a0, 48
+; RV64I-NEXT:    addi a0, a0, -128
+; RV64I-NEXT:    srli a0, a0, 8
+; RV64I-NEXT:    sltiu a0, a0, 255
+; RV64I-NEXT:    xori a0, a0, 1
+; RV64I-NEXT:    ret
+;
+; RV32ZBB-LABEL: add_ugtcmp_i16_i8:
+; RV32ZBB:       # %bb.0:
+; RV32ZBB-NEXT:    zext.h a0, a0
+; RV32ZBB-NEXT:    addi a0, a0, -128
+; RV32ZBB-NEXT:    srli a0, a0, 8
+; RV32ZBB-NEXT:    sltiu a0, a0, 255
+; RV32ZBB-NEXT:    xori a0, a0, 1
+; RV32ZBB-NEXT:    ret
+;
+; RV64ZBB-LABEL: add_ugtcmp_i16_i8:
+; RV64ZBB:       # %bb.0:
+; RV64ZBB-NEXT:    zext.h a0, a0
+; RV64ZBB-NEXT:    addi a0, a0, -128
+; RV64ZBB-NEXT:    srli a0, a0, 8
+; RV64ZBB-NEXT:    sltiu a0, a0, 255
+; RV64ZBB-NEXT:    xori a0, a0, 1
+; RV64ZBB-NEXT:    ret
   %tmp0 = add i16 %x, -128 ; ~0U << (8-1)
   %tmp1 = icmp ugt i16 %tmp0, -257 ; ~0U << 8 - 1
   ret i1 %tmp1
diff --git a/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
new file mode 100644
index 00000000000000..3740dc675949fa
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
@@ -0,0 +1,387 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m %s -o - | FileCheck %s
+
+define zeroext i16 @overflow_add(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_add:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    add a0, a1, a0
+; CHECK-NEXT:    ori a0, a0, 1
+; CHECK-NEXT:    slli a0, a0, 48
+; CHECK-NEXT:    srli a1, a0, 48
+; CHECK-NEXT:    li a2, 1024
+; CHECK-NEXT:    li a0, 2
+; CHECK-NEXT:    bltu a2, a1, .LBB0_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:  .LBB0_2:
+; CHECK-NEXT:    ret
+  %add = add i16 %b, %a
+  %or = or i16 %add, 1
+  %cmp = icmp ugt i16 %or, 1024
+  %res = select i1 %cmp, i16 2, i16 5
+  ret i16 %res
+}
+
+define zeroext i16 @overflow_sub(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_sub:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    subw a0, a0, a1
+; CHECK-NEXT:    ori a0, a0, 1
+; CHECK-NEXT:    slli a0, a0, 48
+; CHECK-NEXT:    srli a1, a0, 48
+; CHECK-NEXT:    li a2, 1024
+; CHECK-NEXT:    li a0, 2
+; CHECK-NEXT:    bltu a2, a1, .LBB1_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:  .LBB1_2:
+; CHECK-NEXT:    ret
+  %add = sub i16 %a, %b
+  %or = or i16 %add, 1
+  %cmp = icmp ugt i16 %or, 1024
+  %res = select i1 %cmp, i16 2, i16 5
+  ret i16 %res
+}
+
+define zeroext i16 @overflow_mul(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_mul:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    mul a0, a1, a0
+; CHECK-NEXT:    ori a0, a0, 1
+; CHECK-NEXT:    slli a0, a0, 48
+; CHECK-NEXT:    srli a1, a0, 48
+; CHECK-NEXT:    li a2, 1024
+; CHECK-NEXT:    li a0, 2
+; CHECK-NEXT:    bltu a2, a1, .LBB2_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:  .LBB2_2:
+; CHECK-NEXT:    ret
+  %add = mul i16 %b, %a
+  %or = or i16 %add, 1
+  %cmp = icmp ugt i16 %or, 1024
+  %res = select i1 %cmp, i16 2, i16 5
+  ret i16 %res
+}
+
+define zeroext i16 @overflow_shl(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_shl:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    sll a0, a0, a1
+; CHECK-NEXT:    ori a0, a0, 1
+; CHECK-NEXT:    slli a0, a0, 48
+; CHECK-NEXT:    srli a1, a0, 48
+; CHECK-NEXT:    li a2, 1024
+; CHECK-NEXT:    li a0, 2
+; CHECK-NEXT:    bltu a2, a1, .LBB3_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 5
+; CHECK-NEXT:  .LBB3_2:
+; CHECK-NEXT:    ret
+  %add = shl i16 %a, %b
+  %or = or i16 %add, 1
+  %cmp = icmp ugt i16 %or, 1024
+  %res = select i1 %cmp, i16 2, i16 5
+  ret i16 %res
+}
+
+define i32 @overflow_add_no_consts(i8 zeroext %a, i8 zeroext %b, i8 zeroext %limit) {
+; CHECK-LABEL: overflow_add_no_consts:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    add a0, a1, a0
+; CHECK-NEXT:    andi a1, a0, 255
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    bltu a2, a1, .LBB4_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB4_2:
+; CHECK-NEXT:    ret
+  %add = add i8 %b, %a
+  %cmp = icmp ugt i8 %add, %limit
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @overflow_add_const_limit(i8 zeroext %a, i8 zeroext %b) {
+; CHECK-LABEL: overflow_add_const_limit:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    add a0, a1, a0
+; CHECK-NEXT:    andi a1, a0, 255
+; CHECK-NEXT:    li a2, 128
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    bltu a2, a1, .LBB5_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB5_2:
+; CHECK-NEXT:    ret
+  %add = add i8 %b, %a
+  %cmp = icmp ugt i8 %add, -128
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
+; CHECK-LABEL: overflow_add_positive_const_limit:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    slli a0, a0, 56
+; CHECK-NEXT:    srai a1, a0, 56
+; CHECK-NEXT:    li a2, -1
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    blt a1, a2, .LBB6_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB6_2:
+; CHECK-NEXT:    ret
+  %cmp = icmp slt i8 %a, -1
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @unsafe_add_underflow(i8 zeroext %a) {
+; CHECK-LABEL: unsafe_add_underflow:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    li a2, 1
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    beq a1, a2, .LBB7_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB7_2:
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i8 %a, 1
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @safe_add_underflow(i8 zeroext %a) {
+; CHECK-LABEL: safe_add_underflow:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    beqz a1, .LBB8_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB8_2:
+; CHECK-NEXT:    ret
+  %cmp = icmp eq i8 %a, 0
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @safe_add_underflow_neg(i8 zeroext %a) {
+; CHECK-LABEL: safe_add_underflow_neg:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a1, a0, -2
+; CHECK-NEXT:    li a2, 251
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    bltu a1, a2, .LBB9_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB9_2:
+; CHECK-NEXT:    ret
+  %add = add i8 %a, -2
+  %cmp = icmp ult i8 %add, -5
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @overflow_sub_negative_const_limit(i8 zeroext %a) {
+; CHECK-LABEL: overflow_sub_negative_const_limit:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    slli a0, a0, 56
+; CHECK-NEXT:    srai a1, a0, 56
+; CHECK-NEXT:    li a2, -1
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    blt a1, a2, .LBB10_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB10_2:
+; CHECK-NEXT:    ret
+  %cmp = icmp slt i8 %a, -1
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+; This is valid so long as the icmp immediate is sext.
+define i32 @sext_sub_underflow(i8 zeroext %a) {
+; CHECK-LABEL: sext_sub_underflow:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a1, a0, -6
+; CHECK-NEXT:    li a2, -6
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    bltu a2, a1, .LBB11_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB11_2:
+; CHECK-NEXT:    ret
+  %sub = add i8 %a, -6
+  %cmp = icmp ugt i8 %sub, -6
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @safe_sub_underflow(i8 zeroext %a) {
+; CHECK-LABEL: safe_sub_underflow:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    mv a1, a0
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:    beqz a1, .LBB12_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:  .LBB12_2:
+; CHECK-NEXT:    ret
+  %cmp.not = icmp eq i8 %a, 0
+  %res = select i1 %cmp.not, i32 16, i32 8
+  ret i32 %res
+}
+
+define i32 @safe_sub_underflow_neg(i8 zeroext %a) {
+; CHECK-LABEL: safe_sub_underflow_neg:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a1, a0, -4
+; CHECK-NEXT:    li a2, 250
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    bltu a2, a1, .LBB13_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB13_2:
+; CHECK-NEXT:    ret
+  %sub = add i8 %a, -4
+  %cmp = icmp ugt i8 %sub, -6
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+; This is valid so long as the icmp immediate is sext.
+define i32 @sext_sub_underflow_neg(i8 zeroext %a) {
+; CHECK-LABEL: sext_sub_underflow_neg:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    addi a1, a0, -4
+; CHECK-NEXT:    li a2, -3
+; CHECK-NEXT:    li a0, 8
+; CHECK-NEXT:    bltu a1, a2, .LBB14_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    li a0, 16
+; CHECK-NEXT:  .LBB14_2:
+; CHECK-NEXT:    ret
+  %sub = add i8 %a, -4
+  %cmp = icmp ult i8 %sub, -3
+  %res = select i1 %cmp, i32 8, i32 16
+  ret i32 %res
+}
+
+define i32 @safe_sub_imm_var(ptr nocapture readonly %b) local_unnamed_addr #1 {
+; CHECK-LABEL: safe_sub_imm_var:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    li a0, 0
+; CHECK-NEXT:    ret
+...
[truncated]

dtcxzyw · 2024-02-13T07:15:52Z

The generated code isn't perfect on the lit test, but my data shows a net
dynamic instruction count improvement on spec2017 for both base and
Zba+Zbb+Zbs.

I agree.

preames

LGTM

preames · 2024-02-13T15:50:49Z

llvm/test/CodeGen/RISCV/signbit-test.ll

-; RV64-NEXT:  .LBB10_2: # %f
+; RV64-NEXT:    ret
+; RV64-NEXT:  .LBB10_2:
+; RV64-NEXT:    slli a0, a0, 48


This delta is interesting and looks like a missed optimization. After moving the zext, we appear to loose track of the fact that x was already used in the ABI, and thus should have the required properties for the return.

We moved the zext to the argument and updated both uses. Then in SelectionDAG we type legalize that to an AND. There's already an AND with 128 after it going to the compare so we decide we don't need the other AND on that path. That leaves the AND only serving the return which doesn't care about the upper bits.

Seems like there was no reason to move the zext for this case.

topperc · 2024-02-13T17:58:14Z

Committed as 9838c85 and 7d40ea8

topperc added 2 commits February 12, 2024 21:33

[RISCV] Copy typepromotion-overflow.ll from AArch64. NFC

12d37cf

topperc requested review from asb, preames, pcwang-thead and dtcxzyw February 13, 2024 05:57

llvmbot added the backend:RISC-V label Feb 13, 2024

dtcxzyw requested review from wangpc-pp and removed request for pcwang-thead February 13, 2024 06:05

dtcxzyw mentioned this pull request Feb 13, 2024

Test PR81574 plctlab/llvm-ci#1044

Closed

preames approved these changes Feb 13, 2024

View reviewed changes

topperc closed this Feb 13, 2024

topperc deleted the pr/typeprom branch February 13, 2024 17:58

topperc mentioned this pull request Feb 14, 2024

[TypePromotion] Support positive addition amounts in isSafeWrap. #81690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Enable the TypePromotion pass from AArch64/ARM. #81574

[RISCV] Enable the TypePromotion pass from AArch64/ARM. #81574

Uh oh!

topperc commented Feb 13, 2024

Uh oh!

llvmbot commented Feb 13, 2024

Uh oh!

dtcxzyw commented Feb 13, 2024

Uh oh!

preames left a comment

Uh oh!

preames Feb 13, 2024

Uh oh!

topperc Feb 13, 2024

Uh oh!

topperc commented Feb 13, 2024

Uh oh!

Uh oh!

[RISCV] Enable the TypePromotion pass from AArch64/ARM. #81574

[RISCV] Enable the TypePromotion pass from AArch64/ARM. #81574

Uh oh!

Conversation

topperc commented Feb 13, 2024

Uh oh!

llvmbot commented Feb 13, 2024

Uh oh!

dtcxzyw commented Feb 13, 2024

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

preames Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

topperc Feb 13, 2024

Choose a reason for hiding this comment

Uh oh!

topperc commented Feb 13, 2024

Uh oh!

Uh oh!