-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[LLVM][AArch64] Correctly lower funnel shifts by constants. #140058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Paul Walker (paulwalker-arm) ChangesPrevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we shoudl not rely on this and so this PR mirror the same optimisation. NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled). Full diff: https://github.com/llvm/llvm-project/pull/140058.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d6e288a59b2ee..2b752498f64a1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -149,6 +149,10 @@ static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
cl::desc("DAG combiner enable load/<replace bytes>/store with "
"a narrower store"));
+static cl::opt<bool> DisableCombines("combiner-disabled", cl::Hidden,
+ cl::init(false),
+ cl::desc("Disable the DAG combiner"));
+
namespace {
class DAGCombiner {
@@ -248,7 +252,8 @@ namespace {
STI(D.getSubtarget().getSelectionDAGInfo()), OptLevel(OL),
BatchAA(BatchAA) {
ForCodeSize = DAG.shouldOptForSize();
- DisableGenericCombines = STI && STI->disableGenericCombines(OptLevel);
+ DisableGenericCombines =
+ DisableCombines || (STI && STI->disableGenericCombines(OptLevel));
MaximumLegalStoreInBits = 0;
// We use the minimum store size here, since that's all we can guarantee
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index fb7f7d6f7537d..7206a619cb767 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7266,12 +7266,18 @@ static SDValue LowerFunnelShift(SDValue Op, SelectionDAG &DAG) {
MVT VT = Op.getSimpleValueType();
if (Op.getOpcode() == ISD::FSHL) {
+ if (ShiftNo->isZero())
+ return Op.getOperand(0);
+
unsigned int NewShiftNo =
VT.getFixedSizeInBits() - ShiftNo->getZExtValue();
return DAG.getNode(
ISD::FSHR, DL, VT, Op.getOperand(0), Op.getOperand(1),
DAG.getConstant(NewShiftNo, DL, Shifts.getValueType()));
} else if (Op.getOpcode() == ISD::FSHR) {
+ if (ShiftNo->isZero())
+ return Op.getOperand(1);
+
return Op;
}
}
|
@llvm/pr-subscribers-llvm-selectiondag Author: Paul Walker (paulwalker-arm) ChangesPrevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we shoudl not rely on this and so this PR mirror the same optimisation. NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled). Full diff: https://github.com/llvm/llvm-project/pull/140058.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d6e288a59b2ee..2b752498f64a1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -149,6 +149,10 @@ static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
cl::desc("DAG combiner enable load/<replace bytes>/store with "
"a narrower store"));
+static cl::opt<bool> DisableCombines("combiner-disabled", cl::Hidden,
+ cl::init(false),
+ cl::desc("Disable the DAG combiner"));
+
namespace {
class DAGCombiner {
@@ -248,7 +252,8 @@ namespace {
STI(D.getSubtarget().getSelectionDAGInfo()), OptLevel(OL),
BatchAA(BatchAA) {
ForCodeSize = DAG.shouldOptForSize();
- DisableGenericCombines = STI && STI->disableGenericCombines(OptLevel);
+ DisableGenericCombines =
+ DisableCombines || (STI && STI->disableGenericCombines(OptLevel));
MaximumLegalStoreInBits = 0;
// We use the minimum store size here, since that's all we can guarantee
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index fb7f7d6f7537d..7206a619cb767 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7266,12 +7266,18 @@ static SDValue LowerFunnelShift(SDValue Op, SelectionDAG &DAG) {
MVT VT = Op.getSimpleValueType();
if (Op.getOpcode() == ISD::FSHL) {
+ if (ShiftNo->isZero())
+ return Op.getOperand(0);
+
unsigned int NewShiftNo =
VT.getFixedSizeInBits() - ShiftNo->getZExtValue();
return DAG.getNode(
ISD::FSHR, DL, VT, Op.getOperand(0), Op.getOperand(1),
DAG.getConstant(NewShiftNo, DL, Shifts.getValueType()));
} else if (Op.getOpcode() == ISD::FSHR) {
+ if (ShiftNo->isZero())
+ return Op.getOperand(1);
+
return Op;
}
}
|
This doesn't look right. fshl and fshr interpret the shift amount as modulo the bitwidth of op0 (or op1), so there is no bug here unless you are worried about non-power-of-two bitwidths. |
Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we should not rely on this and so this PR mirrors the same optimisation. Ensure LowerFunnelShift normalises constant shift amounts because isel rules expect them to be in the range [0, src bit length). NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).
Updated to normalise the shift amounts for both I did try marking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNode() already handles zero shift amounts for shifts and rotates - why not just add it for fshl/fshr as well?
I'll give it a go and see what drops out. |
@RKSimon - If getNode() performs this canonicalisation, can I be sure there's no way for the lowering code to see If that can happen then the custom lowering code must still handle it? |
If it wasn't constant and a later combine manages to fold the node to a constant then it might not be regenerated/combined and technically you could see it at lowering - DAG combines still aren't properly topologically ordered :( But I'm not entirely clear what you're trying to handle here - does arm64 only support FSHR nodes with constant shift amounts? What happens with zero / modulo amounts with that instruction that is so bad? |
This is #139866. The sequence of events is there is i128 shl by load -> i64 shl_parts by load -> i64 shl_parts by 0 (the load is optimized to a 0, via rauw I would guess) -> (there is no combine for shl_part by 0 but does get visited) -> lowered to i64 fshl by 0 via custom/expandShiftParts -> lower fshl that goes wrong. |
Continuing on from what David said above, it's not that FSHR does not like zero, but that the custom lowering code tries to rewrite all FSHLs with constant shift amounts as FSHRs because isel only exists for the latter. The problem being the transformation is invalid when the modulo shift amount is 0. Whilst fixing that I spotted the FSHR AArch64 isel rules only support [0-BitWidth) so I figured I'd fix that as well so that we can be sure all legitimate shift amounts can be lowered to something we can isel. |
) Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we should not rely on this and so this PR mirrors the same optimisation. Ensure LowerFunnelShift normalises constant shift amounts because isel rules expect them to be in the range [0, src bit length). NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).
) Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we should not rely on this and so this PR mirrors the same optimisation. Ensure LowerFunnelShift normalises constant shift amounts because isel rules expect them to be in the range [0, src bit length). NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).
Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we should not rely on this and so this PR mirrors the same optimisation.
Ensure LowerFunnelShift normalises constant shift amounts because isel rules expect them to be in the range [0, src bit length).
NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).