-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Inhibit DAG folding shl through zext.w pattern with zba #91626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If we allow the fold, the zext.w pattern becomes an and by shifted 32 bit mask. In practice, we can't undo this during ISEL resulting in worse code in some cases. There is a cost to inhibiting the generic transform -- we loose out on the possibily of folds enabled by pushing the shift earlier.
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesIf we allow the fold, the zext.w pattern becomes an and by shifted 32 bit mask. In practice, we can't undo this during ISEL resulting in worse code in some cases. There is a cost to inhibiting the generic transform -- we loose out on the possibility of folds enabled by pushing the shift earlier. Full diff: https://github.com/llvm/llvm-project/pull/91626.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 846768f6d631e..60b21fb508990 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17141,6 +17141,13 @@ bool RISCVTargetLowering::isDesirableToCommuteWithShift(
return false;
}
}
+
+ // Don't break slli.uw patterns.
+ if (Subtarget.hasStdExtZba() && Ty.isScalarInteger() && N->getOpcode() == ISD::SHL &&
+ N0.getOpcode() == ISD::AND && isa<ConstantSDNode>(N0.getOperand(1)) &&
+ N0.getConstantOperandVal(1) == UINT64_C(0xffffffff))
+ return false;
+
return true;
}
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index 8fe221f2a297a..a0a7db538e835 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -2866,8 +2866,7 @@ define ptr @gep_lshr_i32(ptr %0, i64 %1) {
;
; RV64ZBA-LABEL: gep_lshr_i32:
; RV64ZBA: # %bb.0: # %entry
-; RV64ZBA-NEXT: slli a1, a1, 2
-; RV64ZBA-NEXT: srli a1, a1, 4
+; RV64ZBA-NEXT: srli a1, a1, 2
; RV64ZBA-NEXT: slli.uw a1, a1, 4
; RV64ZBA-NEXT: sh2add a1, a1, a1
; RV64ZBA-NEXT: add a0, a0, a1
@@ -2891,8 +2890,7 @@ define i64 @srli_slliw(i64 %1) {
;
; RV64ZBA-LABEL: srli_slliw:
; RV64ZBA: # %bb.0: # %entry
-; RV64ZBA-NEXT: slli a0, a0, 2
-; RV64ZBA-NEXT: srli a0, a0, 4
+; RV64ZBA-NEXT: srli a0, a0, 2
; RV64ZBA-NEXT: slli.uw a0, a0, 4
; RV64ZBA-NEXT: ret
entry:
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Why can't we undo it? Does it move too far away? |
Sorry, I hadn't meant this as "we can't" and more "we don't". We could write a non-trivial pattern match here, but we up having to match the whole (and (shl X, C), ShiftedMask) expression and then we have to check that C and the shift are compatible. I looked at doing that, but it didn't seem particularly clean in tablegen. |
@@ -2891,8 +2890,7 @@ define i64 @srli_slliw(i64 %1) { | |||
; | |||
; RV64ZBA-LABEL: srli_slliw: | |||
; RV64ZBA: # %bb.0: # %entry | |||
; RV64ZBA-NEXT: slli a0, a0, 2 | |||
; RV64ZBA-NEXT: srli a0, a0, 4 | |||
; RV64ZBA-NEXT: srli a0, a0, 2 | |||
; RV64ZBA-NEXT: slli.uw a0, a0, 4 | |||
; RV64ZBA-NEXT: ret | |||
entry: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this IR non-canonical per InstCombine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test case is canonical: https://godbolt.org/z/ee3YPfY4s
define ptr @test(ptr %0, i64 %1) {
entry:
%2 = lshr exact i64 %1, 2
%3 = and i64 %2, 4294967295
%4 = getelementptr inbounds i8, ptr %0, i64 600
%5 = getelementptr [80 x i8], ptr %4, i64 %3
ret ptr %5
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes with a GEP its canonical, but with a shl it isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was excessive reduction apparently.
The fact instcombine prefers the opposite form does hint that we should maybe (also?) do the late match. I was really hoping not to have to write that code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just posted it. We had the majority of the code already. Is there a 3 shift version of this we should do without Zba?
…srli x, c4-c2), c4). Where c1 is a shifted mask with 32 set bits and c4 trailing zeros. This is an alternative to llvm#91626.
Abandon in favor of Craig's alternative. |
If we allow the fold, the zext.w pattern becomes an and by shifted 32 bit mask. In practice, we can't undo this during ISEL resulting in worse code in some cases. There is a cost to inhibiting the generic transform -- we loose out on the possibility of folds enabled by pushing the shift earlier.