-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Enable the TypePromotion pass from AArch64/ARM. #81574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pass looks for unsigned icmps that have illegal types and tries to widen the use/def graph to improve the placement of the zero extends that type legalization would need to insert. I've explicitly disabled it for i32 by adding a check for isSExtCheaperThanZExt to the pass. The generated code isn't perfect, but my data shows a net dynamic instruction count improvement on spec2017 for both base and Zba+Zbb+Zbs.
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) ChangesThis pass looks for unsigned icmps that have illegal types and tries I've explicitly disabled it for i32 by adding a check for The generated code isn't perfect on the lit test, but my data shows a net Spec2017 dynamic count for rv64gc_zba_zbb_zbs
Patch is 23.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81574.diff 7 Files Affected:
diff --git a/llvm/lib/CodeGen/TypePromotion.cpp b/llvm/lib/CodeGen/TypePromotion.cpp
index 053caf518bd1f7..7a3bc6c2043f4c 100644
--- a/llvm/lib/CodeGen/TypePromotion.cpp
+++ b/llvm/lib/CodeGen/TypePromotion.cpp
@@ -937,6 +937,8 @@ bool TypePromotionImpl::run(Function &F, const TargetMachine *TM,
return 0;
EVT PromotedVT = TLI->getTypeToTransformTo(*Ctx, SrcVT);
+ if (TLI->isSExtCheaperThanZExt(SrcVT, PromotedVT))
+ return 0;
if (RegisterBitWidth < PromotedVT.getFixedSizeInBits()) {
LLVM_DEBUG(dbgs() << "IR Promotion: Couldn't find target register "
<< "for promoted type\n");
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 4c3da3ad311168..adef40e19cba4a 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -366,6 +366,7 @@ class RISCVPassConfig : public TargetPassConfig {
void addIRPasses() override;
bool addPreISel() override;
+ void addCodeGenPrepare() override;
bool addInstSelector() override;
bool addIRTranslator() override;
void addPreLegalizeMachineIR() override;
@@ -452,6 +453,12 @@ bool RISCVPassConfig::addPreISel() {
return false;
}
+void RISCVPassConfig::addCodeGenPrepare() {
+ if (getOptLevel() != CodeGenOptLevel::None)
+ addPass(createTypePromotionLegacyPass());
+ TargetPassConfig::addCodeGenPrepare();
+}
+
bool RISCVPassConfig::addInstSelector() {
addPass(createRISCVISelDag(getRISCVTargetMachine(), getOptLevel()));
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index e7db8ef9d5aff3..364c1e430b9156 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -68,6 +68,7 @@
; CHECK-NEXT: Expand reduction intrinsics
; CHECK-NEXT: Natural Loop Information
; CHECK-NEXT: TLS Variable Hoist
+; CHECK-NEXT: Type Promotion
; CHECK-NEXT: CodeGen Prepare
; CHECK-NEXT: Dominator Tree Construction
; CHECK-NEXT: Exception handling preparation
diff --git a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
index 9e7f2e9525d3b4..6e3a50542939f1 100644
--- a/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll
@@ -254,21 +254,39 @@ define i1 @shifts_necmp_i64_i8(i64 %x) nounwind {
; ---------------------------------------------------------------------------- ;
define i1 @add_ultcmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ultcmp_i16_i8:
-; RV32: # %bb.0:
-; RV32-NEXT: addi a0, a0, -128
-; RV32-NEXT: slli a0, a0, 16
-; RV32-NEXT: srli a0, a0, 24
-; RV32-NEXT: sltiu a0, a0, 255
-; RV32-NEXT: ret
+; RV32I-LABEL: add_ultcmp_i16_i8:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 16
+; RV32I-NEXT: srli a0, a0, 16
+; RV32I-NEXT: addi a0, a0, -128
+; RV32I-NEXT: srli a0, a0, 8
+; RV32I-NEXT: sltiu a0, a0, 255
+; RV32I-NEXT: ret
;
-; RV64-LABEL: add_ultcmp_i16_i8:
-; RV64: # %bb.0:
-; RV64-NEXT: addi a0, a0, -128
-; RV64-NEXT: slli a0, a0, 48
-; RV64-NEXT: srli a0, a0, 56
-; RV64-NEXT: sltiu a0, a0, 255
-; RV64-NEXT: ret
+; RV64I-LABEL: add_ultcmp_i16_i8:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 48
+; RV64I-NEXT: srli a0, a0, 48
+; RV64I-NEXT: addi a0, a0, -128
+; RV64I-NEXT: srli a0, a0, 8
+; RV64I-NEXT: sltiu a0, a0, 255
+; RV64I-NEXT: ret
+;
+; RV32ZBB-LABEL: add_ultcmp_i16_i8:
+; RV32ZBB: # %bb.0:
+; RV32ZBB-NEXT: zext.h a0, a0
+; RV32ZBB-NEXT: addi a0, a0, -128
+; RV32ZBB-NEXT: srli a0, a0, 8
+; RV32ZBB-NEXT: sltiu a0, a0, 255
+; RV32ZBB-NEXT: ret
+;
+; RV64ZBB-LABEL: add_ultcmp_i16_i8:
+; RV64ZBB: # %bb.0:
+; RV64ZBB-NEXT: zext.h a0, a0
+; RV64ZBB-NEXT: addi a0, a0, -128
+; RV64ZBB-NEXT: srli a0, a0, 8
+; RV64ZBB-NEXT: sltiu a0, a0, 255
+; RV64ZBB-NEXT: ret
%tmp0 = add i16 %x, -128 ; ~0U << (8-1)
%tmp1 = icmp ult i16 %tmp0, -256 ; ~0U << 8
ret i1 %tmp1
@@ -421,21 +439,39 @@ define i1 @add_ultcmp_i64_i8(i64 %x) nounwind {
; Slightly more canonical variant
define i1 @add_ulecmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ulecmp_i16_i8:
-; RV32: # %bb.0:
-; RV32-NEXT: addi a0, a0, -128
-; RV32-NEXT: slli a0, a0, 16
-; RV32-NEXT: srli a0, a0, 24
-; RV32-NEXT: sltiu a0, a0, 255
-; RV32-NEXT: ret
+; RV32I-LABEL: add_ulecmp_i16_i8:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 16
+; RV32I-NEXT: srli a0, a0, 16
+; RV32I-NEXT: addi a0, a0, -128
+; RV32I-NEXT: srli a0, a0, 8
+; RV32I-NEXT: sltiu a0, a0, 255
+; RV32I-NEXT: ret
;
-; RV64-LABEL: add_ulecmp_i16_i8:
-; RV64: # %bb.0:
-; RV64-NEXT: addi a0, a0, -128
-; RV64-NEXT: slli a0, a0, 48
-; RV64-NEXT: srli a0, a0, 56
-; RV64-NEXT: sltiu a0, a0, 255
-; RV64-NEXT: ret
+; RV64I-LABEL: add_ulecmp_i16_i8:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 48
+; RV64I-NEXT: srli a0, a0, 48
+; RV64I-NEXT: addi a0, a0, -128
+; RV64I-NEXT: srli a0, a0, 8
+; RV64I-NEXT: sltiu a0, a0, 255
+; RV64I-NEXT: ret
+;
+; RV32ZBB-LABEL: add_ulecmp_i16_i8:
+; RV32ZBB: # %bb.0:
+; RV32ZBB-NEXT: zext.h a0, a0
+; RV32ZBB-NEXT: addi a0, a0, -128
+; RV32ZBB-NEXT: srli a0, a0, 8
+; RV32ZBB-NEXT: sltiu a0, a0, 255
+; RV32ZBB-NEXT: ret
+;
+; RV64ZBB-LABEL: add_ulecmp_i16_i8:
+; RV64ZBB: # %bb.0:
+; RV64ZBB-NEXT: zext.h a0, a0
+; RV64ZBB-NEXT: addi a0, a0, -128
+; RV64ZBB-NEXT: srli a0, a0, 8
+; RV64ZBB-NEXT: sltiu a0, a0, 255
+; RV64ZBB-NEXT: ret
%tmp0 = add i16 %x, -128 ; ~0U << (8-1)
%tmp1 = icmp ule i16 %tmp0, -257 ; ~0U << 8 - 1
ret i1 %tmp1
diff --git a/llvm/test/CodeGen/RISCV/signbit-test.ll b/llvm/test/CodeGen/RISCV/signbit-test.ll
index 69a9026d9af9e2..4e10fae06d8860 100644
--- a/llvm/test/CodeGen/RISCV/signbit-test.ll
+++ b/llvm/test/CodeGen/RISCV/signbit-test.ll
@@ -303,7 +303,10 @@ define i16 @test_clear_mask_i16_i8(i16 %x) nounwind {
; RV32-NEXT: bnez a1, .LBB10_2
; RV32-NEXT: # %bb.1: # %t
; RV32-NEXT: li a0, 42
-; RV32-NEXT: .LBB10_2: # %f
+; RV32-NEXT: ret
+; RV32-NEXT: .LBB10_2:
+; RV32-NEXT: slli a0, a0, 16
+; RV32-NEXT: srli a0, a0, 16
; RV32-NEXT: ret
;
; RV64-LABEL: test_clear_mask_i16_i8:
@@ -312,7 +315,10 @@ define i16 @test_clear_mask_i16_i8(i16 %x) nounwind {
; RV64-NEXT: bnez a1, .LBB10_2
; RV64-NEXT: # %bb.1: # %t
; RV64-NEXT: li a0, 42
-; RV64-NEXT: .LBB10_2: # %f
+; RV64-NEXT: ret
+; RV64-NEXT: .LBB10_2:
+; RV64-NEXT: slli a0, a0, 48
+; RV64-NEXT: srli a0, a0, 48
; RV64-NEXT: ret
entry:
%a = and i16 %x, 128
@@ -332,7 +338,10 @@ define i16 @test_set_mask_i16_i8(i16 %x) nounwind {
; RV32-NEXT: beqz a1, .LBB11_2
; RV32-NEXT: # %bb.1: # %t
; RV32-NEXT: li a0, 42
-; RV32-NEXT: .LBB11_2: # %f
+; RV32-NEXT: ret
+; RV32-NEXT: .LBB11_2:
+; RV32-NEXT: slli a0, a0, 16
+; RV32-NEXT: srli a0, a0, 16
; RV32-NEXT: ret
;
; RV64-LABEL: test_set_mask_i16_i8:
@@ -341,7 +350,10 @@ define i16 @test_set_mask_i16_i8(i16 %x) nounwind {
; RV64-NEXT: beqz a1, .LBB11_2
; RV64-NEXT: # %bb.1: # %t
; RV64-NEXT: li a0, 42
-; RV64-NEXT: .LBB11_2: # %f
+; RV64-NEXT: ret
+; RV64-NEXT: .LBB11_2:
+; RV64-NEXT: slli a0, a0, 48
+; RV64-NEXT: srli a0, a0, 48
; RV64-NEXT: ret
entry:
%a = and i16 %x, 128
@@ -361,7 +373,10 @@ define i16 @test_set_mask_i16_i7(i16 %x) nounwind {
; RV32-NEXT: beqz a1, .LBB12_2
; RV32-NEXT: # %bb.1: # %t
; RV32-NEXT: li a0, 42
-; RV32-NEXT: .LBB12_2: # %f
+; RV32-NEXT: ret
+; RV32-NEXT: .LBB12_2:
+; RV32-NEXT: slli a0, a0, 16
+; RV32-NEXT: srli a0, a0, 16
; RV32-NEXT: ret
;
; RV64-LABEL: test_set_mask_i16_i7:
@@ -370,7 +385,10 @@ define i16 @test_set_mask_i16_i7(i16 %x) nounwind {
; RV64-NEXT: beqz a1, .LBB12_2
; RV64-NEXT: # %bb.1: # %t
; RV64-NEXT: li a0, 42
-; RV64-NEXT: .LBB12_2: # %f
+; RV64-NEXT: ret
+; RV64-NEXT: .LBB12_2:
+; RV64-NEXT: slli a0, a0, 48
+; RV64-NEXT: srli a0, a0, 48
; RV64-NEXT: ret
entry:
%a = and i16 %x, 64
diff --git a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
index 0860853ae9c0af..de36bcdb910609 100644
--- a/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
+++ b/llvm/test/CodeGen/RISCV/signed-truncation-check.ll
@@ -254,23 +254,43 @@ define i1 @shifts_eqcmp_i64_i8(i64 %x) nounwind {
; ---------------------------------------------------------------------------- ;
define i1 @add_ugecmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ugecmp_i16_i8:
-; RV32: # %bb.0:
-; RV32-NEXT: addi a0, a0, -128
-; RV32-NEXT: slli a0, a0, 16
-; RV32-NEXT: srli a0, a0, 24
-; RV32-NEXT: sltiu a0, a0, 255
-; RV32-NEXT: xori a0, a0, 1
-; RV32-NEXT: ret
+; RV32I-LABEL: add_ugecmp_i16_i8:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 16
+; RV32I-NEXT: srli a0, a0, 16
+; RV32I-NEXT: addi a0, a0, -128
+; RV32I-NEXT: srli a0, a0, 8
+; RV32I-NEXT: sltiu a0, a0, 255
+; RV32I-NEXT: xori a0, a0, 1
+; RV32I-NEXT: ret
;
-; RV64-LABEL: add_ugecmp_i16_i8:
-; RV64: # %bb.0:
-; RV64-NEXT: addi a0, a0, -128
-; RV64-NEXT: slli a0, a0, 48
-; RV64-NEXT: srli a0, a0, 56
-; RV64-NEXT: sltiu a0, a0, 255
-; RV64-NEXT: xori a0, a0, 1
-; RV64-NEXT: ret
+; RV64I-LABEL: add_ugecmp_i16_i8:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 48
+; RV64I-NEXT: srli a0, a0, 48
+; RV64I-NEXT: addi a0, a0, -128
+; RV64I-NEXT: srli a0, a0, 8
+; RV64I-NEXT: sltiu a0, a0, 255
+; RV64I-NEXT: xori a0, a0, 1
+; RV64I-NEXT: ret
+;
+; RV32ZBB-LABEL: add_ugecmp_i16_i8:
+; RV32ZBB: # %bb.0:
+; RV32ZBB-NEXT: zext.h a0, a0
+; RV32ZBB-NEXT: addi a0, a0, -128
+; RV32ZBB-NEXT: srli a0, a0, 8
+; RV32ZBB-NEXT: sltiu a0, a0, 255
+; RV32ZBB-NEXT: xori a0, a0, 1
+; RV32ZBB-NEXT: ret
+;
+; RV64ZBB-LABEL: add_ugecmp_i16_i8:
+; RV64ZBB: # %bb.0:
+; RV64ZBB-NEXT: zext.h a0, a0
+; RV64ZBB-NEXT: addi a0, a0, -128
+; RV64ZBB-NEXT: srli a0, a0, 8
+; RV64ZBB-NEXT: sltiu a0, a0, 255
+; RV64ZBB-NEXT: xori a0, a0, 1
+; RV64ZBB-NEXT: ret
%tmp0 = add i16 %x, -128 ; ~0U << (8-1)
%tmp1 = icmp uge i16 %tmp0, -256 ; ~0U << 8
ret i1 %tmp1
@@ -471,23 +491,43 @@ define i1 @add_ugecmp_i64_i8(i64 %x) nounwind {
; Slightly more canonical variant
define i1 @add_ugtcmp_i16_i8(i16 %x) nounwind {
-; RV32-LABEL: add_ugtcmp_i16_i8:
-; RV32: # %bb.0:
-; RV32-NEXT: addi a0, a0, -128
-; RV32-NEXT: slli a0, a0, 16
-; RV32-NEXT: srli a0, a0, 24
-; RV32-NEXT: sltiu a0, a0, 255
-; RV32-NEXT: xori a0, a0, 1
-; RV32-NEXT: ret
+; RV32I-LABEL: add_ugtcmp_i16_i8:
+; RV32I: # %bb.0:
+; RV32I-NEXT: slli a0, a0, 16
+; RV32I-NEXT: srli a0, a0, 16
+; RV32I-NEXT: addi a0, a0, -128
+; RV32I-NEXT: srli a0, a0, 8
+; RV32I-NEXT: sltiu a0, a0, 255
+; RV32I-NEXT: xori a0, a0, 1
+; RV32I-NEXT: ret
;
-; RV64-LABEL: add_ugtcmp_i16_i8:
-; RV64: # %bb.0:
-; RV64-NEXT: addi a0, a0, -128
-; RV64-NEXT: slli a0, a0, 48
-; RV64-NEXT: srli a0, a0, 56
-; RV64-NEXT: sltiu a0, a0, 255
-; RV64-NEXT: xori a0, a0, 1
-; RV64-NEXT: ret
+; RV64I-LABEL: add_ugtcmp_i16_i8:
+; RV64I: # %bb.0:
+; RV64I-NEXT: slli a0, a0, 48
+; RV64I-NEXT: srli a0, a0, 48
+; RV64I-NEXT: addi a0, a0, -128
+; RV64I-NEXT: srli a0, a0, 8
+; RV64I-NEXT: sltiu a0, a0, 255
+; RV64I-NEXT: xori a0, a0, 1
+; RV64I-NEXT: ret
+;
+; RV32ZBB-LABEL: add_ugtcmp_i16_i8:
+; RV32ZBB: # %bb.0:
+; RV32ZBB-NEXT: zext.h a0, a0
+; RV32ZBB-NEXT: addi a0, a0, -128
+; RV32ZBB-NEXT: srli a0, a0, 8
+; RV32ZBB-NEXT: sltiu a0, a0, 255
+; RV32ZBB-NEXT: xori a0, a0, 1
+; RV32ZBB-NEXT: ret
+;
+; RV64ZBB-LABEL: add_ugtcmp_i16_i8:
+; RV64ZBB: # %bb.0:
+; RV64ZBB-NEXT: zext.h a0, a0
+; RV64ZBB-NEXT: addi a0, a0, -128
+; RV64ZBB-NEXT: srli a0, a0, 8
+; RV64ZBB-NEXT: sltiu a0, a0, 255
+; RV64ZBB-NEXT: xori a0, a0, 1
+; RV64ZBB-NEXT: ret
%tmp0 = add i16 %x, -128 ; ~0U << (8-1)
%tmp1 = icmp ugt i16 %tmp0, -257 ; ~0U << 8 - 1
ret i1 %tmp1
diff --git a/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
new file mode 100644
index 00000000000000..3740dc675949fa
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/typepromotion-overflow.ll
@@ -0,0 +1,387 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv64 -mattr=+m %s -o - | FileCheck %s
+
+define zeroext i16 @overflow_add(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_add:
+; CHECK: # %bb.0:
+; CHECK-NEXT: add a0, a1, a0
+; CHECK-NEXT: ori a0, a0, 1
+; CHECK-NEXT: slli a0, a0, 48
+; CHECK-NEXT: srli a1, a0, 48
+; CHECK-NEXT: li a2, 1024
+; CHECK-NEXT: li a0, 2
+; CHECK-NEXT: bltu a2, a1, .LBB0_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: .LBB0_2:
+; CHECK-NEXT: ret
+ %add = add i16 %b, %a
+ %or = or i16 %add, 1
+ %cmp = icmp ugt i16 %or, 1024
+ %res = select i1 %cmp, i16 2, i16 5
+ ret i16 %res
+}
+
+define zeroext i16 @overflow_sub(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_sub:
+; CHECK: # %bb.0:
+; CHECK-NEXT: subw a0, a0, a1
+; CHECK-NEXT: ori a0, a0, 1
+; CHECK-NEXT: slli a0, a0, 48
+; CHECK-NEXT: srli a1, a0, 48
+; CHECK-NEXT: li a2, 1024
+; CHECK-NEXT: li a0, 2
+; CHECK-NEXT: bltu a2, a1, .LBB1_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: .LBB1_2:
+; CHECK-NEXT: ret
+ %add = sub i16 %a, %b
+ %or = or i16 %add, 1
+ %cmp = icmp ugt i16 %or, 1024
+ %res = select i1 %cmp, i16 2, i16 5
+ ret i16 %res
+}
+
+define zeroext i16 @overflow_mul(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_mul:
+; CHECK: # %bb.0:
+; CHECK-NEXT: mul a0, a1, a0
+; CHECK-NEXT: ori a0, a0, 1
+; CHECK-NEXT: slli a0, a0, 48
+; CHECK-NEXT: srli a1, a0, 48
+; CHECK-NEXT: li a2, 1024
+; CHECK-NEXT: li a0, 2
+; CHECK-NEXT: bltu a2, a1, .LBB2_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: .LBB2_2:
+; CHECK-NEXT: ret
+ %add = mul i16 %b, %a
+ %or = or i16 %add, 1
+ %cmp = icmp ugt i16 %or, 1024
+ %res = select i1 %cmp, i16 2, i16 5
+ ret i16 %res
+}
+
+define zeroext i16 @overflow_shl(i16 zeroext %a, i16 zeroext %b) {
+; CHECK-LABEL: overflow_shl:
+; CHECK: # %bb.0:
+; CHECK-NEXT: sll a0, a0, a1
+; CHECK-NEXT: ori a0, a0, 1
+; CHECK-NEXT: slli a0, a0, 48
+; CHECK-NEXT: srli a1, a0, 48
+; CHECK-NEXT: li a2, 1024
+; CHECK-NEXT: li a0, 2
+; CHECK-NEXT: bltu a2, a1, .LBB3_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 5
+; CHECK-NEXT: .LBB3_2:
+; CHECK-NEXT: ret
+ %add = shl i16 %a, %b
+ %or = or i16 %add, 1
+ %cmp = icmp ugt i16 %or, 1024
+ %res = select i1 %cmp, i16 2, i16 5
+ ret i16 %res
+}
+
+define i32 @overflow_add_no_consts(i8 zeroext %a, i8 zeroext %b, i8 zeroext %limit) {
+; CHECK-LABEL: overflow_add_no_consts:
+; CHECK: # %bb.0:
+; CHECK-NEXT: add a0, a1, a0
+; CHECK-NEXT: andi a1, a0, 255
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: bltu a2, a1, .LBB4_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB4_2:
+; CHECK-NEXT: ret
+ %add = add i8 %b, %a
+ %cmp = icmp ugt i8 %add, %limit
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @overflow_add_const_limit(i8 zeroext %a, i8 zeroext %b) {
+; CHECK-LABEL: overflow_add_const_limit:
+; CHECK: # %bb.0:
+; CHECK-NEXT: add a0, a1, a0
+; CHECK-NEXT: andi a1, a0, 255
+; CHECK-NEXT: li a2, 128
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: bltu a2, a1, .LBB5_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB5_2:
+; CHECK-NEXT: ret
+ %add = add i8 %b, %a
+ %cmp = icmp ugt i8 %add, -128
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @overflow_add_positive_const_limit(i8 zeroext %a) {
+; CHECK-LABEL: overflow_add_positive_const_limit:
+; CHECK: # %bb.0:
+; CHECK-NEXT: slli a0, a0, 56
+; CHECK-NEXT: srai a1, a0, 56
+; CHECK-NEXT: li a2, -1
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: blt a1, a2, .LBB6_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB6_2:
+; CHECK-NEXT: ret
+ %cmp = icmp slt i8 %a, -1
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @unsafe_add_underflow(i8 zeroext %a) {
+; CHECK-LABEL: unsafe_add_underflow:
+; CHECK: # %bb.0:
+; CHECK-NEXT: mv a1, a0
+; CHECK-NEXT: li a2, 1
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: beq a1, a2, .LBB7_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB7_2:
+; CHECK-NEXT: ret
+ %cmp = icmp eq i8 %a, 1
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @safe_add_underflow(i8 zeroext %a) {
+; CHECK-LABEL: safe_add_underflow:
+; CHECK: # %bb.0:
+; CHECK-NEXT: mv a1, a0
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: beqz a1, .LBB8_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB8_2:
+; CHECK-NEXT: ret
+ %cmp = icmp eq i8 %a, 0
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @safe_add_underflow_neg(i8 zeroext %a) {
+; CHECK-LABEL: safe_add_underflow_neg:
+; CHECK: # %bb.0:
+; CHECK-NEXT: addi a1, a0, -2
+; CHECK-NEXT: li a2, 251
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: bltu a1, a2, .LBB9_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB9_2:
+; CHECK-NEXT: ret
+ %add = add i8 %a, -2
+ %cmp = icmp ult i8 %add, -5
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @overflow_sub_negative_const_limit(i8 zeroext %a) {
+; CHECK-LABEL: overflow_sub_negative_const_limit:
+; CHECK: # %bb.0:
+; CHECK-NEXT: slli a0, a0, 56
+; CHECK-NEXT: srai a1, a0, 56
+; CHECK-NEXT: li a2, -1
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: blt a1, a2, .LBB10_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB10_2:
+; CHECK-NEXT: ret
+ %cmp = icmp slt i8 %a, -1
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+; This is valid so long as the icmp immediate is sext.
+define i32 @sext_sub_underflow(i8 zeroext %a) {
+; CHECK-LABEL: sext_sub_underflow:
+; CHECK: # %bb.0:
+; CHECK-NEXT: addi a1, a0, -6
+; CHECK-NEXT: li a2, -6
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: bltu a2, a1, .LBB11_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB11_2:
+; CHECK-NEXT: ret
+ %sub = add i8 %a, -6
+ %cmp = icmp ugt i8 %sub, -6
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @safe_sub_underflow(i8 zeroext %a) {
+; CHECK-LABEL: safe_sub_underflow:
+; CHECK: # %bb.0:
+; CHECK-NEXT: mv a1, a0
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: beqz a1, .LBB12_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: .LBB12_2:
+; CHECK-NEXT: ret
+ %cmp.not = icmp eq i8 %a, 0
+ %res = select i1 %cmp.not, i32 16, i32 8
+ ret i32 %res
+}
+
+define i32 @safe_sub_underflow_neg(i8 zeroext %a) {
+; CHECK-LABEL: safe_sub_underflow_neg:
+; CHECK: # %bb.0:
+; CHECK-NEXT: addi a1, a0, -4
+; CHECK-NEXT: li a2, 250
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: bltu a2, a1, .LBB13_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB13_2:
+; CHECK-NEXT: ret
+ %sub = add i8 %a, -4
+ %cmp = icmp ugt i8 %sub, -6
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+; This is valid so long as the icmp immediate is sext.
+define i32 @sext_sub_underflow_neg(i8 zeroext %a) {
+; CHECK-LABEL: sext_sub_underflow_neg:
+; CHECK: # %bb.0:
+; CHECK-NEXT: addi a1, a0, -4
+; CHECK-NEXT: li a2, -3
+; CHECK-NEXT: li a0, 8
+; CHECK-NEXT: bltu a1, a2, .LBB14_2
+; CHECK-NEXT: # %bb.1:
+; CHECK-NEXT: li a0, 16
+; CHECK-NEXT: .LBB14_2:
+; CHECK-NEXT: ret
+ %sub = add i8 %a, -4
+ %cmp = icmp ult i8 %sub, -3
+ %res = select i1 %cmp, i32 8, i32 16
+ ret i32 %res
+}
+
+define i32 @safe_sub_imm_var(ptr nocapture readonly %b) local_unnamed_addr #1 {
+; CHECK-LABEL: safe_sub_imm_var:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: li a0, 0
+; CHECK-NEXT: ret
+...
[truncated]
|
I agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
; RV64-NEXT: .LBB10_2: # %f | ||
; RV64-NEXT: ret | ||
; RV64-NEXT: .LBB10_2: | ||
; RV64-NEXT: slli a0, a0, 48 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This delta is interesting and looks like a missed optimization. After moving the zext, we appear to loose track of the fact that x was already used in the ABI, and thus should have the required properties for the return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We moved the zext to the argument and updated both uses. Then in SelectionDAG we type legalize that to an AND. There's already an AND with 128 after it going to the compare so we decide we don't need the other AND on that path. That leaves the AND only serving the return which doesn't care about the upper bits.
Seems like there was no reason to move the zext for this case.
This pass looks for unsigned icmps that have illegal types and tries
to widen the use/def graph to improve the placement of the zero
extends that type legalization would need to insert.
I've explicitly disabled it for i32 by adding a check for
isSExtCheaperThanZExt to the pass.
The generated code isn't perfect on the lit test, but my data shows a net
dynamic instruction count improvement on spec2017 for both base and
Zba+Zbb+Zbs.
Spec2017 dynamic count for rv64gc_zba_zbb_zbs