-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Allow tail memcmp expansion #121460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This optimization was introduced by llvm#70469. Like AArch64, we allow tail expansions for 3 on RV32 and 3/5/6 on RV64. This can simplify the comparison and reduce the number of blocks.
c11a5d2
to
ffa03c3
Compare
@llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) ChangesThis optimization was introduced by #70469. Like AArch64, we allow tail expansions for 3 on RV32 and 3/5/6 This can simplify the comparison and reduce the number of blocks. Patch is 28.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121460.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 0abb270edcabc8..50e4b5f378932a 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -2565,9 +2565,12 @@ RISCVTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
Options.AllowOverlappingLoads = true;
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
Options.NumLoadsPerBlock = Options.MaxNumLoads;
- if (ST->is64Bit())
+ if (ST->is64Bit()) {
Options.LoadSizes = {8, 4, 2, 1};
- else
+ Options.AllowedTailExpansions = {3, 5, 6, 7};
+ } else {
Options.LoadSizes = {4, 2, 1};
+ Options.AllowedTailExpansions = {3};
+ }
return Options;
}
diff --git a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
index d529ae6ecd0aba..b9a27b9d0c9e70 100644
--- a/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
+++ b/llvm/test/CodeGen/RISCV/memcmp-optsize.ll
@@ -2449,82 +2449,72 @@ define i32 @memcmp_size_3(ptr %s1, ptr %s2) nounwind optsize {
;
; CHECK-UNALIGNED-RV32-ZBB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV32-ZBB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: srli a2, a2, 16
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: srli a3, a3, 16
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a0, 2(a0)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: ret
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a2, 2(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lhu a0, 0(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a3, 2(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lhu a1, 0(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: slli a2, a2, 16
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: slli a3, a3, 16
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV32-ZBB-NEXT: ret
;
; CHECK-UNALIGNED-RV64-ZBB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV64-ZBB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a2, a2, 48
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a3, a3, 48
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a0, 2(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a2, 2(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lhu a0, 0(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a3, 2(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lhu a1, 0(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a2, a2, 16
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a3, a3, 16
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a0, a0, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a1, a1, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-ZBKB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV32-ZBKB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: srli a2, a2, 16
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: srli a3, a3, 16
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: # %bb.1: # %loadbb1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lhu a2, 0(a0)
; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lbu a0, 2(a0)
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lhu a3, 0(a1)
; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: ret
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: pack a0, a2, a0
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: pack a1, a3, a1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV32-ZBKB-NEXT: ret
;
; CHECK-UNALIGNED-RV64-ZBKB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV64-ZBKB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a2, a2, 48
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a3, a3, 48
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lbu a0, 2(a0)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lbu a2, 2(a0)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lhu a0, 0(a0)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lbu a3, 2(a1)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lhu a1, 0(a1)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: slli a2, a2, 16
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: slli a3, a3, 16
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a0, a0, 32
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a1, a1, 32
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-V-LABEL: memcmp_size_3:
@@ -2845,22 +2835,19 @@ define i32 @memcmp_size_5(ptr %s1, ptr %s2) nounwind optsize {
;
; CHECK-UNALIGNED-RV64-ZBB-LABEL: memcmp_size_5:
; CHECK-UNALIGNED-RV64-ZBB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lw a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lw a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a2, a2, 32
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a3, a3, 32
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: bne a2, a3, .LBB26_2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a0, 4(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a1, 4(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: .LBB26_2: # %res_block
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a2, 4(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lwu a0, 0(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a3, 4(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lwu a1, 0(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a2, a2, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a3, a3, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-ZBKB-LABEL: memcmp_size_5:
@@ -2883,22 +2870,17 @@ define i32 @memcmp_size_5(ptr %s1, ptr %s2) nounwind optsize {
;
; CHECK-UNALIGNED-RV64-ZBKB-LABEL: memcmp_size_5:
; CHECK-UNALIGNED-RV64-ZBKB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lw a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lw a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a2, a2, 32
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a3, a3, 32
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: bne a2, a3, .LBB26_2
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: # %bb.1: # %loadbb1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lwu a2, 0(a0)
; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lbu a0, 4(a0)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lwu a3, 0(a1)
; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lbu a1, 4(a1)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: .LBB26_2: # %res_block
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: pack a0, a2, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: pack a1, a3, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-V-LABEL: memcmp_size_5:
@@ -3052,28 +3034,19 @@ define i32 @memcmp_size_6(ptr %s1, ptr %s2) nounwind optsize {
;
; CHECK-UNALIGNED-RV64-ZBB-LABEL: memcmp_size_6:
; CHECK-UNALIGNED-RV64-ZBB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lw a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lw a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a2, a2, 32
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a3, a3, 32
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: bne a2, a3, .LBB27_3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lh a0, 4(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lh a1, 4(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a2, a0
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a3, a1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a2, a2, 48
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a3, a3, 48
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: bne a2, a3, .LBB27_3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: # %bb.2:
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: li a0, 0
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: .LBB27_3: # %res_block
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lhu a2, 4(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lwu a0, 0(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lhu a3, 4(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lwu a1, 0(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a2, a2, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a3, a3, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-ZBKB-LABEL: memcmp_size_6:
@@ -3102,28 +3075,17 @@ define i32 @memcmp_size_6(ptr %s1, ptr %s2) nounwind optsize {
;
; CHECK-UNALIGNED-RV64-ZBKB-LABEL: memcmp_size_6:
; CHECK-UNALIGNED-RV64-ZBKB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lw a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lw a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a2, a2, 32
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a3, a3, 32
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: bne a2, a3, .LBB27_3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lh a0, 4(a0)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lh a1, 4(a1)
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a2, a0
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a3, a1
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a2, a2, 48
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: srli a3, a3, 48
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: bne a2, a3, .LBB27_3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: # %bb.2:
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: li a0, 0
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: .LBB27_3: # %res_block
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lwu a2, 0(a0)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lhu a0, 4(a0)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lwu a3, 0(a1)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: lhu a1, 4(a1)
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: pack a0, a2, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: pack a1, a3, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBKB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBKB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-V-LABEL: memcmp_size_6:
diff --git a/llvm/test/CodeGen/RISCV/memcmp.ll b/llvm/test/CodeGen/RISCV/memcmp.ll
index 860c3a94abc0a7..629a9298ee469d 100644
--- a/llvm/test/CodeGen/RISCV/memcmp.ll
+++ b/llvm/test/CodeGen/RISCV/memcmp.ll
@@ -3145,82 +3145,72 @@ define i32 @memcmp_size_3(ptr %s1, ptr %s2) nounwind {
;
; CHECK-UNALIGNED-RV32-ZBB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV32-ZBB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: srli a2, a2, 16
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: srli a3, a3, 16
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a0, 2(a0)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: ret
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV32-ZBB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a2, 2(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lhu a0, 0(a0)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lbu a3, 2(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: lhu a1, 0(a1)
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: slli a2, a2, 16
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: slli a3, a3, 16
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV32-ZBB-NEXT: ret
;
; CHECK-UNALIGNED-RV64-ZBB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV64-ZBB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a2, a2, 48
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a3, a3, 48
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: # %bb.1: # %loadbb1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a0, 2(a0)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV64-ZBB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a2, 2(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lhu a0, 0(a0)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lbu a3, 2(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: lhu a1, 0(a1)
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a2, a2, 16
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a0, a0, a2
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: slli a3, a3, 16
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: or a1, a1, a3
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a0, a0, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: srli a1, a1, 32
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV64-ZBB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV64-ZBB-NEXT: ret
;
; CHECK-UNALIGNED-RV32-ZBKB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV32-ZBKB: # %bb.0: # %entry
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lh a2, 0(a0)
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lh a3, 0(a1)
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a2, a2
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a3, a3
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: srli a2, a2, 16
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: srli a3, a3, 16
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: bne a2, a3, .LBB24_2
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: # %bb.1: # %loadbb1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lhu a2, 0(a0)
; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lbu a0, 2(a0)
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lhu a3, 0(a1)
; CHECK-UNALIGNED-RV32-ZBKB-NEXT: lbu a1, 2(a1)
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sub a0, a0, a1
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: ret
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: .LBB24_2: # %res_block
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sltu a0, a2, a3
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: neg a0, a0
-; CHECK-UNALIGNED-RV32-ZBKB-NEXT: ori a0, a0, 1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: pack a0, a2, a0
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: pack a1, a3, a1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a0, a0
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: rev8 a1, a1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sltu a2, a1, a0
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sltu a0, a0, a1
+; CHECK-UNALIGNED-RV32-ZBKB-NEXT: sub a0, a2, a0
; CHECK-UNALIGNED-RV32-ZBKB-NEXT: ret
;
; CHECK-UNALIGNED-RV64-ZBKB-LABEL: memcmp_size_3:
; CHECK-UNALIGNED-RV64-ZBKB: # %bb.0: # %entry
-;...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can, the reference optimization is assuming that unaligned loads are legal and fast. This is not true for us. As such, this optimization is not sound.
Please, stop posting additional memcmp enable patches until you've addressed the soundness issues in that code with unaligned loads. It a waste of your time, and a waste of reviewers time.
What are the soundness issues? I thought we only had costing issues? The code in |
Gah, you're correct! I read the check code too fast and apparently mentally placed the negate in the wrong place. Please ignore my comment on this review entirely. |
(I had accidentally added a comment instead of deleting it, please ignore the email. Deleted from the web-view to (hopefully) avoid confusion.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/8415 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/4948 Here is the relevant piece of the build log for the reference
|
This optimization was introduced by #70469.
Like AArch64, we allow tail expansions for 3 on RV32 and 3/5/6
on RV64.
This can simplify the comparison and reduce the number of blocks.