Skip to content

[DAGCombiner][X86] Improve chain handling in fold (fshl ld1, ld0, c) -> (ld0[ofs]) combine. #124871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Jan 29, 2025

Happened to notice some odd things related to chains in this code.

The code calls hasOneUse on LoadSDNode* which will check users of the data and the chain. I think this was trying to check that the data had one use so one of the loads would definitely be removed by the transform. Load chains don't always have users so our testing may not have noticed that the chains being used would block the transform.

The code makes all users of ld1's chain use the new load's chain, but we don't know that ld1 becomes dead. This can cause incorrect dependencies if ld1's chain is used and it isn't deleted. I think the better thing to do is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1 depend on the new load and the original loads. If the olds loads become dead, their chain will be cleaned up later.

I'm having trouble getting a test for any ordering issue. areNonVolatileConsecutiveLoads requires the two loads to have the same input chain. Given that I don't know how to use one of the load chain results without also using the other. If they are both used we don't do the transform because hasOneUse will return false for both.

Please let me know if I've gotten any of this wrong.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Jan 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Craig Topper (topperc)

Changes

Happened to notice some odd things related to chains in this code. I don't have a good test case yet.

The code calls hasOneUse on LoadSDNode* which will check users of the data and the chain. I think this was trying to check that the data had one use so one of the loads would definitely be removed by the transform. Load chains don't always have users so our testing may not have noticed that the chains being used would block the transform.

The code makes all users of ld1's chain use the new load's chain, but we don't know that ld1 becomes dead. This can cause incorrect dependencies if ld1's chain is used and it isn't deleted. I think the better thing to do is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1 depend on the new load and the original loads. If the olds loads become dead, their chain will be cleaned up later.

Please let me know if I've gotten any of this wrong. If it looks good I'll work on getting a test.


Full diff: https://github.com/llvm/llvm-project/pull/124871.diff

1 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+4-5)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index a0c703d2df8a2f..925da2f0ff2732 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -11005,8 +11005,8 @@ SDValue DAGCombiner::visitFunnelShift(SDNode *N) {
       auto *RHS = dyn_cast<LoadSDNode>(N1);
       if (LHS && RHS && LHS->isSimple() && RHS->isSimple() &&
           LHS->getAddressSpace() == RHS->getAddressSpace() &&
-          (LHS->hasOneUse() || RHS->hasOneUse()) && ISD::isNON_EXTLoad(RHS) &&
-          ISD::isNON_EXTLoad(LHS)) {
+          (LHS->hasNUsesOfValue(1, 0) || RHS->hasNUsesOfValue(1, 0)) &&
+          ISD::isNON_EXTLoad(RHS) && ISD::isNON_EXTLoad(LHS)) {
         if (DAG.areNonVolatileConsecutiveLoads(LHS, RHS, BitWidth / 8, 1)) {
           SDLoc DL(RHS);
           uint64_t PtrOff =
@@ -11024,9 +11024,8 @@ SDValue DAGCombiner::visitFunnelShift(SDNode *N) {
                 VT, DL, RHS->getChain(), NewPtr,
                 RHS->getPointerInfo().getWithOffset(PtrOff), NewAlign,
                 RHS->getMemOperand()->getFlags(), RHS->getAAInfo());
-            // Replace the old load's chain with the new load's chain.
-            WorklistRemover DeadNodes(*this);
-            DAG.ReplaceAllUsesOfValueWith(N1.getValue(1), Load.getValue(1));
+            DAG.makeEquivalentMemoryOrdering(LHS, Load.getValue(1));
+            DAG.makeEquivalentMemoryOrdering(RHS, Load.getValue(1));
             return Load;
           }
         }

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most hasOneUse checks on SDNode and not SDValue are probably wrong

…d0[ofs]) combine.

Happened to notice some odd things related to chains in this code.

The code calls hasOneUse on LoadSDNode* which will check users
of the data and the chain. I think this was trying to check that
the data had one use so one of the loads would definitely be
removed by the transform. Load chains don't always have users so
our testing may not have noticed that the chains being used would
block the transform.

The code makes all users of ld1's chain use the new load's chain, but
we don't know that ld1 becomes dead. This can cause incorrect dependencies if
ld1's chain is used and it isn't deleted. I think the better thing to do
is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1
depend on the new load and the original loads. If the olds loads become
dead, their chain will be cleaned up later.

I'm having trouble getting a test for any ordering issue.
areNonVolatileConsecutiveLoads requires the two loads to have the same
input chain. Given that I don't know how to use one of the load chain
results without also using the other. If they are both used we don't
do the transform because hasOneUse will return false for both.
@topperc
Copy link
Collaborator Author

topperc commented Jan 30, 2025

I haven't been able to get a test that shows a ordering problem. See the description.

I was able to test that we didn't do the transform previously if a store occurs after the loads. I'll pre-commit if it looks good.

@topperc topperc changed the title [DAGCombiner] Improve chain handling in fold (fshl ld1, ld0, c) -> (ld0[ofs]) combine. [DAGCombiner][X86] Improve chain handling in fold (fshl ld1, ld0, c) -> (ld0[ofs]) combine. Jan 31, 2025
@topperc topperc requested a review from s-barannikov February 2, 2025 06:24
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers. (I think DAGCombiner::CombineConsecutiveLoads has a similar issue?)

topperc added a commit that referenced this pull request Feb 3, 2025
This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs])
if the load chain results are used.
topperc added a commit that referenced this pull request Feb 3, 2025
…d0[ofs]) combine. (#124871)

Happened to notice some odd things related to chains in this code.

The code calls hasOneUse on LoadSDNode* which will check users
of the data and the chain. I think this was trying to check that
the data had one use so one of the loads would definitely be
removed by the transform. Load chains don't always have users so
our testing may not have noticed that the chains being used would
block the transform.

The code makes all users of ld1's chain use the new load's chain, but
we don't know that ld1 becomes dead. This can cause incorrect dependencies if
ld1's chain is used and it isn't deleted. I think the better thing to do
is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1
depend on the new load and the original loads. If the olds loads become
dead, their chain will be cleaned up later.

I'm having trouble getting a test for any ordering issue with the current code.
areNonVolatileConsecutiveLoads requires the two loads to have the same
input chain. Given that, I don't know how to use one of the load chain
results without also using the other. If they are both used we don't
do the transform because SDNode::hasOneUse will return false for both.
@topperc
Copy link
Collaborator Author

topperc commented Feb 3, 2025

Committed as 788bbd2

@topperc topperc closed this Feb 3, 2025
@topperc topperc deleted the pr/load-chain branch February 5, 2025 00:09
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
This shows missed opportunity to fold (fshl ld1, ld0, c) -> (ld0[ofs])
if the load chain results are used.
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…d0[ofs]) combine. (llvm#124871)

Happened to notice some odd things related to chains in this code.

The code calls hasOneUse on LoadSDNode* which will check users
of the data and the chain. I think this was trying to check that
the data had one use so one of the loads would definitely be
removed by the transform. Load chains don't always have users so
our testing may not have noticed that the chains being used would
block the transform.

The code makes all users of ld1's chain use the new load's chain, but
we don't know that ld1 becomes dead. This can cause incorrect dependencies if
ld1's chain is used and it isn't deleted. I think the better thing to do
is use makeEquivalentMemoryOrdering to make all users of ld0 and ld1
depend on the new load and the original loads. If the olds loads become
dead, their chain will be cleaned up later.

I'm having trouble getting a test for any ordering issue with the current code.
areNonVolatileConsecutiveLoads requires the two loads to have the same
input chain. Given that, I don't know how to use one of the load chain
results without also using the other. If they are both used we don't
do the transform because SDNode::hasOneUse will return false for both.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants