Skip to content

[RISCV] Inhibit DAG folding shl through zext.w pattern with zba #91626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

preames
Copy link
Collaborator

@preames preames commented May 9, 2024

If we allow the fold, the zext.w pattern becomes an and by shifted 32 bit mask. In practice, we can't undo this during ISEL resulting in worse code in some cases. There is a cost to inhibiting the generic transform -- we loose out on the possibility of folds enabled by pushing the shift earlier.

If we allow the fold, the zext.w pattern becomes an and by shifted 32 bit
mask.  In practice, we can't undo this during ISEL resulting in worse code
in some cases. There is a cost to inhibiting the generic transform -- we
loose out on the possibily of folds enabled by pushing the shift earlier.
@preames preames requested review from dtcxzyw and topperc May 9, 2024 17:12
@llvmbot
Copy link
Member

llvmbot commented May 9, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If we allow the fold, the zext.w pattern becomes an and by shifted 32 bit mask. In practice, we can't undo this during ISEL resulting in worse code in some cases. There is a cost to inhibiting the generic transform -- we loose out on the possibility of folds enabled by pushing the shift earlier.


Full diff: https://github.com/llvm/llvm-project/pull/91626.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+7)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+2-4)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 846768f6d631e..60b21fb508990 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17141,6 +17141,13 @@ bool RISCVTargetLowering::isDesirableToCommuteWithShift(
         return false;
     }
   }
+
+  // Don't break slli.uw patterns.
+  if (Subtarget.hasStdExtZba() && Ty.isScalarInteger() && N->getOpcode() == ISD::SHL &&
+      N0.getOpcode() == ISD::AND && isa<ConstantSDNode>(N0.getOperand(1)) &&
+      N0.getConstantOperandVal(1) == UINT64_C(0xffffffff))
+    return false;
+
   return true;
 }
 
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index 8fe221f2a297a..a0a7db538e835 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -2866,8 +2866,7 @@ define ptr @gep_lshr_i32(ptr %0, i64 %1) {
 ;
 ; RV64ZBA-LABEL: gep_lshr_i32:
 ; RV64ZBA:       # %bb.0: # %entry
-; RV64ZBA-NEXT:    slli a1, a1, 2
-; RV64ZBA-NEXT:    srli a1, a1, 4
+; RV64ZBA-NEXT:    srli a1, a1, 2
 ; RV64ZBA-NEXT:    slli.uw a1, a1, 4
 ; RV64ZBA-NEXT:    sh2add a1, a1, a1
 ; RV64ZBA-NEXT:    add a0, a0, a1
@@ -2891,8 +2890,7 @@ define i64 @srli_slliw(i64 %1) {
 ;
 ; RV64ZBA-LABEL: srli_slliw:
 ; RV64ZBA:       # %bb.0: # %entry
-; RV64ZBA-NEXT:    slli a0, a0, 2
-; RV64ZBA-NEXT:    srli a0, a0, 4
+; RV64ZBA-NEXT:    srli a0, a0, 2
 ; RV64ZBA-NEXT:    slli.uw a0, a0, 4
 ; RV64ZBA-NEXT:    ret
 entry:

Copy link

github-actions bot commented May 9, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

dtcxzyw added a commit to dtcxzyw/llvm-codegen-benchmark that referenced this pull request May 9, 2024
@topperc
Copy link
Collaborator

topperc commented May 9, 2024

In practice, we can't undo this during ISEL resulting in worse code in some cases.

Why can't we undo it? Does it move too far away?

@preames
Copy link
Collaborator Author

preames commented May 9, 2024

In practice, we can't undo this during ISEL resulting in worse code in some cases.

Why can't we undo it? Does it move too far away?

Sorry, I hadn't meant this as "we can't" and more "we don't". We could write a non-trivial pattern match here, but we up having to match the whole (and (shl X, C), ShiftedMask) expression and then we have to check that C and the shift are compatible. I looked at doing that, but it didn't seem particularly clean in tablegen.

@@ -2891,8 +2890,7 @@ define i64 @srli_slliw(i64 %1) {
;
; RV64ZBA-LABEL: srli_slliw:
; RV64ZBA: # %bb.0: # %entry
; RV64ZBA-NEXT: slli a0, a0, 2
; RV64ZBA-NEXT: srli a0, a0, 4
; RV64ZBA-NEXT: srli a0, a0, 2
; RV64ZBA-NEXT: slli.uw a0, a0, 4
; RV64ZBA-NEXT: ret
entry:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this IR non-canonical per InstCombine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test case is canonical: https://godbolt.org/z/ee3YPfY4s

define ptr @test(ptr %0, i64 %1) {
entry:
  %2 = lshr exact i64 %1, 2
  %3 = and i64 %2, 4294967295
  %4 = getelementptr inbounds i8, ptr %0, i64 600
  %5 = getelementptr [80 x i8], ptr %4, i64 %3
  ret ptr %5
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes with a GEP its canonical, but with a shl it isn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was excessive reduction apparently.

The fact instcombine prefers the opposite form does hint that we should maybe (also?) do the late match. I was really hoping not to have to write that code...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just posted it. We had the majority of the code already. Is there a 3 shift version of this we should do without Zba?

topperc added a commit to topperc/llvm-project that referenced this pull request May 9, 2024
…srli x, c4-c2), c4).

Where c1 is a shifted mask with 32 set bits and c4 trailing zeros.

This is an alternative to llvm#91626.
@preames
Copy link
Collaborator Author

preames commented May 9, 2024

Abandon in favor of Craig's alternative.

@preames preames closed this May 9, 2024
@preames preames deleted the pr-riscv-desirable-shift-op branch May 9, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants