Skip to content

[LLVM][AArch64] Correctly lower funnel shifts by constants. #140058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 20, 2025

Conversation

paulwalker-arm
Copy link
Collaborator

@paulwalker-arm paulwalker-arm commented May 15, 2025

Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we should not rely on this and so this PR mirrors the same optimisation.

Ensure LowerFunnelShift normalises constant shift amounts because isel rules expect them to be in the range [0, src bit length).

NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).

@llvmbot llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels May 15, 2025
@llvmbot
Copy link
Member

llvmbot commented May 15, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Paul Walker (paulwalker-arm)

Changes

Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we shoudl not rely on this and so this PR mirror the same optimisation.

NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).


Full diff: https://github.com/llvm/llvm-project/pull/140058.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+6-1)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+6)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d6e288a59b2ee..2b752498f64a1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -149,6 +149,10 @@ static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
     cl::desc("DAG combiner enable load/<replace bytes>/store with "
              "a narrower store"));
 
+static cl::opt<bool> DisableCombines("combiner-disabled", cl::Hidden,
+                                     cl::init(false),
+                                     cl::desc("Disable the DAG combiner"));
+
 namespace {
 
   class DAGCombiner {
@@ -248,7 +252,8 @@ namespace {
           STI(D.getSubtarget().getSelectionDAGInfo()), OptLevel(OL),
           BatchAA(BatchAA) {
       ForCodeSize = DAG.shouldOptForSize();
-      DisableGenericCombines = STI && STI->disableGenericCombines(OptLevel);
+      DisableGenericCombines =
+          DisableCombines || (STI && STI->disableGenericCombines(OptLevel));
 
       MaximumLegalStoreInBits = 0;
       // We use the minimum store size here, since that's all we can guarantee
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index fb7f7d6f7537d..7206a619cb767 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7266,12 +7266,18 @@ static SDValue LowerFunnelShift(SDValue Op, SelectionDAG &DAG) {
     MVT VT = Op.getSimpleValueType();
 
     if (Op.getOpcode() == ISD::FSHL) {
+      if (ShiftNo->isZero())
+        return Op.getOperand(0);
+
       unsigned int NewShiftNo =
           VT.getFixedSizeInBits() - ShiftNo->getZExtValue();
       return DAG.getNode(
           ISD::FSHR, DL, VT, Op.getOperand(0), Op.getOperand(1),
           DAG.getConstant(NewShiftNo, DL, Shifts.getValueType()));
     } else if (Op.getOpcode() == ISD::FSHR) {
+      if (ShiftNo->isZero())
+        return Op.getOperand(1);
+
       return Op;
     }
   }

@llvmbot
Copy link
Member

llvmbot commented May 15, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Paul Walker (paulwalker-arm)

Changes

Prevent LowerFunnelShift from creating an invalid ISD::FSHR when lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP that DAGCombiner will optimise away. However, we shoudl not rely on this and so this PR mirror the same optimisation.

NOTE: To simiplify testing, this PR also adds a command line option to disable the DAG combiner (-combiner-disabled).


Full diff: https://github.com/llvm/llvm-project/pull/140058.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+6-1)
  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+6)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index d6e288a59b2ee..2b752498f64a1 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -149,6 +149,10 @@ static cl::opt<bool> EnableShrinkLoadReplaceStoreWithStore(
     cl::desc("DAG combiner enable load/<replace bytes>/store with "
              "a narrower store"));
 
+static cl::opt<bool> DisableCombines("combiner-disabled", cl::Hidden,
+                                     cl::init(false),
+                                     cl::desc("Disable the DAG combiner"));
+
 namespace {
 
   class DAGCombiner {
@@ -248,7 +252,8 @@ namespace {
           STI(D.getSubtarget().getSelectionDAGInfo()), OptLevel(OL),
           BatchAA(BatchAA) {
       ForCodeSize = DAG.shouldOptForSize();
-      DisableGenericCombines = STI && STI->disableGenericCombines(OptLevel);
+      DisableGenericCombines =
+          DisableCombines || (STI && STI->disableGenericCombines(OptLevel));
 
       MaximumLegalStoreInBits = 0;
       // We use the minimum store size here, since that's all we can guarantee
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index fb7f7d6f7537d..7206a619cb767 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -7266,12 +7266,18 @@ static SDValue LowerFunnelShift(SDValue Op, SelectionDAG &DAG) {
     MVT VT = Op.getSimpleValueType();
 
     if (Op.getOpcode() == ISD::FSHL) {
+      if (ShiftNo->isZero())
+        return Op.getOperand(0);
+
       unsigned int NewShiftNo =
           VT.getFixedSizeInBits() - ShiftNo->getZExtValue();
       return DAG.getNode(
           ISD::FSHR, DL, VT, Op.getOperand(0), Op.getOperand(1),
           DAG.getConstant(NewShiftNo, DL, Shifts.getValueType()));
     } else if (Op.getOpcode() == ISD::FSHR) {
+      if (ShiftNo->isZero())
+        return Op.getOperand(1);
+
       return Op;
     }
   }

@jayfoad
Copy link
Contributor

jayfoad commented May 15, 2025

This doesn't look right. fshl and fshr interpret the shift amount as modulo the bitwidth of op0 (or op1), so there is no bug here unless you are worried about non-power-of-two bitwidths.

Prevent LowerFunnelShift from creating an invalid ISD::FSHR when
lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP
that DAGCombiner will optimise away. However, we should not rely on
this and so this PR mirrors the same optimisation.

Ensure LowerFunnelShift normalises constant shift amounts because isel
rules expect them to be in the range [0, src bit length).

NOTE: To simiplify testing, this PR also adds a command line option to
disable the DAG combiner (-combiner-disabled).
@paulwalker-arm
Copy link
Collaborator Author

Updated to normalise the shift amounts for both fshl and fshr, with new tests that trigger isel failures prior to this PR.

I did try marking ISD::FSHL as needing expansion, but that triggered other asserts and worse code generation so I'd rather get the bug fix out of the way and revisit whether the custom lowering code can be removed at a later date.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNode() already handles zero shift amounts for shifts and rotates - why not just add it for fshl/fshr as well?

@paulwalker-arm
Copy link
Collaborator Author

getNode() already handles zero shift amounts for shifts and rotates - why not just add it for fshl/fshr as well?

I'll give it a go and see what drops out.

@paulwalker-arm
Copy link
Collaborator Author

@RKSimon - If getNode() performs this canonicalisation, can I be sure there's no way for the lowering code to see fshl(x,y,0)? I'm wondering if the lowering could end up generating the "zero" shift amount without there being a new ISD::FHSL construction to canonicalise it.

If that can happen then the custom lowering code must still handle it?

@RKSimon
Copy link
Collaborator

RKSimon commented May 15, 2025

If it wasn't constant and a later combine manages to fold the node to a constant then it might not be regenerated/combined and technically you could see it at lowering - DAG combines still aren't properly topologically ordered :(

But I'm not entirely clear what you're trying to handle here - does arm64 only support FSHR nodes with constant shift amounts? What happens with zero / modulo amounts with that instruction that is so bad?

@davemgreen
Copy link
Collaborator

This is #139866. The sequence of events is there is i128 shl by load -> i64 shl_parts by load -> i64 shl_parts by 0 (the load is optimized to a 0, via rauw I would guess) -> (there is no combine for shl_part by 0 but does get visited) -> lowered to i64 fshl by 0 via custom/expandShiftParts -> lower fshl that goes wrong.

@paulwalker-arm
Copy link
Collaborator Author

paulwalker-arm commented May 15, 2025

But I'm not entirely clear what you're trying to handle here - does arm64 only support FSHR nodes with constant shift amounts? What happens with zero / modulo amounts with that instruction that is so bad?

Continuing on from what David said above, it's not that FSHR does not like zero, but that the custom lowering code tries to rewrite all FSHLs with constant shift amounts as FSHRs because isel only exists for the latter. The problem being the transformation is invalid when the modulo shift amount is 0. Whilst fixing that I spotted the FSHR AArch64 isel rules only support [0-BitWidth) so I figured I'd fix that as well so that we can be sure all legitimate shift amounts can be lowered to something we can isel.

@paulwalker-arm paulwalker-arm changed the title [LLVM][AArch64] Correctly lower funnel shifts by zero. [LLVM][AArch64] Correctly lower funnel shifts by constants. May 20, 2025
@paulwalker-arm paulwalker-arm merged commit 5dfaf84 into llvm:main May 20, 2025
11 checks passed
@paulwalker-arm paulwalker-arm deleted the fshl-fix branch May 20, 2025 10:15
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Jun 3, 2025
)

Prevent LowerFunnelShift from creating an invalid ISD::FSHR when
lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP
that DAGCombiner will optimise away. However, we should not rely on this
and so this PR mirrors the same optimisation.
    
Ensure LowerFunnelShift normalises constant shift amounts because isel
rules expect them to be in the range [0, src bit length).
    
NOTE: To simiplify testing, this PR also adds a command line option to
disable the DAG combiner (-combiner-disabled).
ajaden-codes pushed a commit to Jaddyen/llvm-project that referenced this pull request Jun 6, 2025
)

Prevent LowerFunnelShift from creating an invalid ISD::FSHR when
lowering "ISD::FSHL X, Y, 0". Such inputs are rare because it's a NOP
that DAGCombiner will optimise away. However, we should not rely on this
and so this PR mirrors the same optimisation.
    
Ensure LowerFunnelShift normalises constant shift amounts because isel
rules expect them to be in the range [0, src bit length).
    
NOTE: To simiplify testing, this PR also adds a command line option to
disable the DAG combiner (-combiner-disabled).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants