Skip to content

[RISCV] Add isel special case for (and (shl X, c2), c1) -> (slli_uw (srli x, c4-c2), c4). #91638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 9, 2024

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented May 9, 2024

Where c1 is a shifted mask with 32 set bits and c4 trailing zeros.

This is an alternative to #91626.

…srli x, c4-c2), c4).

Where c1 is a shifted mask with 32 set bits and c4 trailing zeros.

This is an alternative to llvm#91626.
@topperc topperc requested review from preames and dtcxzyw May 9, 2024 18:22
@llvmbot
Copy link
Member

llvmbot commented May 9, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

Where c1 is a shifted mask with 32 set bits and c4 trailing zeros.

This is an alternative to #91626.


Full diff: https://github.com/llvm/llvm-project/pull/91638.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp (+17-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+23-4)
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index e73a3af92af6f..6fd16210aade9 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -1322,11 +1322,11 @@ void RISCVDAGToDAGISel::Select(SDNode *Node) {
         }
       }
 
-      // Turn (and (shl x, c2), c1) -> (srli (slli c2+c3), c3) if c1 is a mask
-      // shifted by c2 bits with c3 leading zeros.
       if (LeftShift && isShiftedMask_64(C1)) {
         unsigned Leading = XLen - llvm::bit_width(C1);
 
+        // Turn (and (shl x, c2), c1) -> (srli (slli c2+c3), c3) if c1 is a mask
+        // shifted by c2 bits with c3 leading zeros.
         if (C2 + Leading < XLen &&
             C1 == (maskTrailingOnes<uint64_t>(XLen - (C2 + Leading)) << C2)) {
           // Use slli.uw when possible.
@@ -1350,6 +1350,21 @@ void RISCVDAGToDAGISel::Select(SDNode *Node) {
             return;
           }
         }
+
+        // Turn (and (shl x, c2), c1) -> (slli_uw (srli x, c4-c2), c4) where c1
+        // is shifted mask with 32 set bits and c4 trailing zeros.
+        unsigned Trailing = llvm::countr_zero(C1);
+        if (Leading + Trailing == 32 && C2 < Trailing &&
+            Subtarget->hasStdExtZba() && OneUseOrZExtW) {
+          SDNode *SRLI = CurDAG->getMachineNode(
+              RISCV::SRLI, DL, VT, X,
+              CurDAG->getTargetConstant(Trailing - C2, DL, VT));
+          SDNode *SLLI_UW = CurDAG->getMachineNode(
+              RISCV::SLLI_UW, DL, VT, SDValue(SRLI, 0),
+              CurDAG->getTargetConstant(Trailing, DL, VT));
+          ReplaceNode(Node, SLLI_UW);
+          return;
+        }
       }
 
       // Turn (and (shr x, c2), c1) -> (slli (srli x, c2+c3), c3) if c1 is a
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index 8fe221f2a297a..867775452e0c0 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -2866,8 +2866,7 @@ define ptr @gep_lshr_i32(ptr %0, i64 %1) {
 ;
 ; RV64ZBA-LABEL: gep_lshr_i32:
 ; RV64ZBA:       # %bb.0: # %entry
-; RV64ZBA-NEXT:    slli a1, a1, 2
-; RV64ZBA-NEXT:    srli a1, a1, 4
+; RV64ZBA-NEXT:    srli a1, a1, 2
 ; RV64ZBA-NEXT:    slli.uw a1, a1, 4
 ; RV64ZBA-NEXT:    sh2add a1, a1, a1
 ; RV64ZBA-NEXT:    add a0, a0, a1
@@ -2891,8 +2890,7 @@ define i64 @srli_slliw(i64 %1) {
 ;
 ; RV64ZBA-LABEL: srli_slliw:
 ; RV64ZBA:       # %bb.0: # %entry
-; RV64ZBA-NEXT:    slli a0, a0, 2
-; RV64ZBA-NEXT:    srli a0, a0, 4
+; RV64ZBA-NEXT:    srli a0, a0, 2
 ; RV64ZBA-NEXT:    slli.uw a0, a0, 4
 ; RV64ZBA-NEXT:    ret
 entry:
@@ -2902,6 +2900,27 @@ entry:
   ret i64 %4
 }
 
+define i64 @srli_slliw_canonical(i64 %0) {
+; RV64I-LABEL: srli_slliw_canonical:
+; RV64I:       # %bb.0: # %entry
+; RV64I-NEXT:    slli a0, a0, 2
+; RV64I-NEXT:    li a1, 1
+; RV64I-NEXT:    slli a1, a1, 36
+; RV64I-NEXT:    addi a1, a1, -16
+; RV64I-NEXT:    and a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: srli_slliw_canonical:
+; RV64ZBA:       # %bb.0: # %entry
+; RV64ZBA-NEXT:    srli a0, a0, 2
+; RV64ZBA-NEXT:    slli.uw a0, a0, 4
+; RV64ZBA-NEXT:    ret
+entry:
+  %1 = shl i64 %0, 2
+  %2 = and i64 %1, 68719476720
+  ret i64 %2
+}
+
 define i64 @srli_slli_i16(i64 %1) {
 ; CHECK-LABEL: srli_slli_i16:
 ; CHECK:       # %bb.0: # %entry

dtcxzyw added a commit to dtcxzyw/llvm-codegen-benchmark that referenced this pull request May 9, 2024
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/minor comment

// Turn (and (shl x, c2), c1) -> (slli_uw (srli x, c4-c2), c4) where c1
// is shifted mask with 32 set bits and c4 trailing zeros.
unsigned Trailing = llvm::countr_zero(C1);
if (Leading + Trailing == 32 && C2 < Trailing &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a !IsCANDI check here?

@dtcxzyw
Copy link
Member

dtcxzyw commented May 9, 2024

Emm, #91626 seems to perform better than this patch. BTW this patch converts some slli insts into slli.uw, which makes them less compressible.

@topperc
Copy link
Collaborator Author

topperc commented May 9, 2024

Emm, #91626 seems to perform better than this patch. BTW this patch converts some slli insts into slli.uw, which makes them less compressible.

Can you get reproducers?

@dtcxzyw
Copy link
Member

dtcxzyw commented May 9, 2024

Emm, #91626 seems to perform better than this patch. BTW this patch converts some slli insts into slli.uw, which makes them less compressible.

Can you get reproducers?

godbolt: https://godbolt.org/z/sMv9fzWbY

; llc -mtriple=riscv64 -mattr=+c,+m,+zba -o -
define i64 @func0000000000000000(i64 %0) #0 {
entry:
  %1 = mul i64 %0, 100
  %2 = udiv i64 %1, 70
  %3 = shl i64 %2, 32
  ret i64 %3
}

Before:

func0000000000000000:                   # @func0000000000000000
        lui     a1, %hi(.LCPI0_0)
        ld      a1, %lo(.LCPI0_0)(a1)
        li      a2, 100
        mul     a0, a0, a2
        mulhu   a0, a0, a1
        srli    a0, a0, 6
        slli    a0, a0, 32
        ret

After:

func0000000000000000:                   # @func0000000000000000
	lui	a1, %hi(.LCPI1_0)
	ld	a1, %lo(.LCPI1_0)(a1)
	li	a2, 100
 	mul	a0, a0, a2
 	mulhu	a0, a0, a1
 	srli	a0, a0, 6
 	slli.uw	a0, a0, 32 *!!!*
 	ret

@topperc
Copy link
Collaborator Author

topperc commented May 9, 2024

Need a check that Leading is non-zero or Trailing is less than 32

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@topperc
Copy link
Collaborator Author

topperc commented May 9, 2024

LGTM.

Are the results better now? Was that the only issue?

I'll add a test for that case before I commit.

@topperc topperc merged commit dfff57e into llvm:main May 9, 2024
@topperc topperc deleted the pr/shift-isel branch May 9, 2024 21:39
@dtcxzyw
Copy link
Member

dtcxzyw commented May 10, 2024

LGTM.

Are the results better now? Was that the only issue?

Yeah, I confirmed that all regressions have been fixed :)
See dtcxzyw/llvm-codegen-benchmark@5c1d002

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants