Skip to content

[DAGCombine] Add all users of the instruction recursively into worklist when an instruction is simplified #91772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KanRobert
Copy link
Contributor

No description provided.

@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels May 10, 2024
@llvmbot
Copy link
Member

llvmbot commented May 10, 2024

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: Shengchen Kan (KanRobert)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/91772.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+3-1)
  • (modified) llvm/test/CodeGen/X86/addcarry.ll (+8-21)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 4589d201d6203..796264394c046 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -205,8 +205,10 @@ namespace {
     /// When an instruction is simplified, add all users of the instruction to
     /// the work lists because they might get more simplified now.
     void AddUsersToWorklist(SDNode *N) {
-      for (SDNode *Node : N->uses())
+      for (SDNode *Node : N->uses()) {
         AddToWorklist(Node);
+        AddUsersToWorklist(Node);
+      }
     }
 
     /// Convenient shorthand to add a node and all of its user to the worklist.
diff --git a/llvm/test/CodeGen/X86/addcarry.ll b/llvm/test/CodeGen/X86/addcarry.ll
index f8d32fc2d2925..3895d3a51b366 100644
--- a/llvm/test/CodeGen/X86/addcarry.ll
+++ b/llvm/test/CodeGen/X86/addcarry.ll
@@ -317,21 +317,13 @@ define %S @readd(ptr nocapture readonly %this, %S %arg.b) nounwind {
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    movq %rdi, %rax
 ; CHECK-NEXT:    addq (%rsi), %rdx
-; CHECK-NEXT:    movq 8(%rsi), %rdi
-; CHECK-NEXT:    adcq $0, %rdi
-; CHECK-NEXT:    setb %r10b
-; CHECK-NEXT:    movzbl %r10b, %r10d
-; CHECK-NEXT:    addq %rcx, %rdi
-; CHECK-NEXT:    adcq 16(%rsi), %r10
-; CHECK-NEXT:    setb %cl
-; CHECK-NEXT:    movzbl %cl, %ecx
-; CHECK-NEXT:    addq %r8, %r10
-; CHECK-NEXT:    adcq 24(%rsi), %rcx
-; CHECK-NEXT:    addq %r9, %rcx
-; CHECK-NEXT:    movq %rdx, (%rax)
-; CHECK-NEXT:    movq %rdi, 8(%rax)
-; CHECK-NEXT:    movq %r10, 16(%rax)
-; CHECK-NEXT:    movq %rcx, 24(%rax)
+; CHECK-NEXT:    adcq 8(%rsi), %rcx
+; CHECK-NEXT:    adcq 16(%rsi), %r8
+; CHECK-NEXT:    adcq 24(%rsi), %r9
+; CHECK-NEXT:    movq %rdx, (%rdi)
+; CHECK-NEXT:    movq %rcx, 8(%rdi)
+; CHECK-NEXT:    movq %r8, 16(%rdi)
+; CHECK-NEXT:    movq %r9, 24(%rdi)
 ; CHECK-NEXT:    retq
 entry:
   %0 = extractvalue %S %arg.b, 0
@@ -422,14 +414,9 @@ define i128 @addcarry_to_subcarry(i64 %a, i64 %b) nounwind {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    movq %rdi, %rax
 ; CHECK-NEXT:    cmpq %rsi, %rdi
-; CHECK-NEXT:    notq %rsi
+; CHECK-NEXT:    sbbq %rsi, %rax
 ; CHECK-NEXT:    setae %cl
-; CHECK-NEXT:    addb $-1, %cl
-; CHECK-NEXT:    adcq $0, %rax
-; CHECK-NEXT:    setb %cl
 ; CHECK-NEXT:    movzbl %cl, %edx
-; CHECK-NEXT:    addq %rsi, %rax
-; CHECK-NEXT:    adcq $0, %rdx
 ; CHECK-NEXT:    retq
   %notb = xor i64 %b, -1
   %notb128 = zext i64 %notb to i128

@KanRobert
Copy link
Contributor Author

This affects lots of test cases. I only updated one to prove that it may bring gain. However, the change will increase compile time.

@KanRobert KanRobert requested a review from goldsteinn May 10, 2024 17:29
@KanRobert
Copy link
Contributor Author

Maybe we can add a flag --dagcombine-max-depth= ? @RKSimon @topperc @phoebewang @e-kud @goldsteinn

@RKSimon
Copy link
Collaborator

RKSimon commented May 10, 2024

This is trying to achieve the same thing as the topological dag patches (https://github.com/RKSimon/llvm-project/tree/perf/topological-dag / #77475)

Those patches result in a reduction in compile time: (https://llvm-compile-time-tracker.com/?config=Overview&stat=instructions%3Au&remote=RKSimon)

There is the same problem with those patches as this PR - massive test churn including a large number of DAG combines that need fixing as they haven't always had to take into account that their operands might have already been combined further.

I'm not sure how best to split this work tbh.

@KanRobert
Copy link
Contributor Author

This is trying to achieve the same thing as the topological dag patches (https://github.com/RKSimon/llvm-project/tree/perf/topological-dag / #77475)

Question: what's the motivation of https://github.com/RKSimon/llvm-project/tree/perf/topological-dag / #77475 ? To reduce the compile time?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to achieve the same thing as the topological dag patches (https://github.com/RKSimon/llvm-project/tree/perf/topological-dag / #77475)

Question: what's the motivation of https://github.com/RKSimon/llvm-project/tree/perf/topological-dag / #77475 ? To reduce the compile time?

Not primarily at least. The motivation for that change is the same motivation as for this pull request, just implemented properly. If we visit nodes in the correct order, then there is no need to requeue them recursively.

(I tried to measure the compile-time impact of this patch, but it's not even possible because this basically makes time to compile stage2 clang infinite.)

@KanRobert KanRobert marked this pull request as draft May 15, 2024 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants