Skip to content

[llvm][dfa-jump-threading] Allow DFAJumpThreading with optsize #83049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ilovepi
Copy link
Contributor

@ilovepi ilovepi commented Feb 26, 2024

Though DFAJumpThreadingPass typically introduces size increases, targets
optimizing for size may want to enable the pass for certain TUs and
workloads. This change adds options to override the default behavior,
and enable the DFAJumpThreadingPass to run, even when compiling for
size.

To experiment with DFAJumpThreading and with minsize,
pass -enable-dfa-jump-optsize to the backend.

Created using spr 1.3.4
ilovepi added a commit to ilovepi/llvm-project that referenced this pull request Feb 26, 2024
@ilovepi ilovepi changed the title [llvm][dfa-jump-threading] Add option to allow DFAJumpThreading when [llvm][dfa-jump-threading] Allow DFAJumpThreading with optsize Feb 26, 2024
Created using spr 1.3.4
Comment on lines 225 to 229
static cl::opt<bool>
DFAJumpThreadingOptSize("dfa-jump-thread-optsize",
cl::desc("Enable DFA jump threading when optimizing for size"),
cl::init(false), cl::Hidden);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its rather unfortunate that we need 2 flags to enable a single behavior ... Is there a common place we can put this flag and use it here and in DFAJumpThreading.cpp?

@ilovepi ilovepi requested review from nikic, aeubanks and fhahn February 26, 2024 22:30
@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 26, 2024

cc: @petrhosek @PiJoules since I think they're interested.

@aeubanks
Copy link
Contributor

Can we have a more principled approach to determine when to allow this pass to run than a cl::opt? For example, change the CostThreshold in the pass depending on whether or not the function has optsize.

@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 26, 2024

Can we have a more principled approach to determine when to allow this pass to run than a cl::opt? For example, change the CostThreshold in the pass depending on whether or not the function has optsize.

Sure. I can take a look at doing it that way instead. it will probably be much cleaner.

@nikic
Copy link
Contributor

nikic commented Feb 27, 2024

This pass is not part of the default pipeline -- do you enable this pass downstream, or do you want to add options for all uses of minsize, even if there is no evidence of usefulness?

@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 27, 2024

This pass is not part of the default pipeline -- do you enable this pass downstream, or do you want to add options for all uses of minsize, even if there is no evidence of usefulness?

We’d like to see if this can be useful to some of our size constrained users in the embedded space. Particularly because it has been proposed to turn this on in the default pipeline. https://discourse.llvm.org/t/rfc-enable-dfa-jumpthreading-pass-by-default/77231. Currently we don’t have evidence that this is beneficial, but for these targets we also don’t have a way to vet that with the current implementation.

So to be concrete, I’d like the ability to enable this pass somehow even if compiling for size. If we can control this behavior with a threshold, then mechanically I don’t see much difference between that an the flags I’ve added in this patch. But right now we don’t have a way to evaluate if this may be useful at any threshold, and I think it’s useful if we can turn this knob in some way without changing the default behavior.

@aeubanks
Copy link
Contributor

also, we should remove pipeline checks for "optsize" and move those checks into the passes themselves

@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 27, 2024

also, we should remove pipeline checks for "optsize" and move those checks into the passes themselves

done in #83188

@ilovepi ilovepi marked this pull request as ready for review February 28, 2024 18:53
@llvmbot
Copy link
Member

llvmbot commented Feb 28, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Paul Kirth (ilovepi)

Changes

Though DFAJumpThreadingPass typically introduces size increases, targets
optimizing for size may want to enable the pass for certain TUs and
workloads. This change adds options to override the default behavior,
and enable the DFAJumpThreadingPass to run, even when compiling for
size.


Full diff: https://github.com/llvm/llvm-project/pull/83049.diff

3 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+5)
  • (modified) llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp (+6-1)
  • (modified) llvm/test/Transforms/DFAJumpThreading/negative.ll (+49)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index cbbbec0ccc8c4d..3b2d4523ef7e8e 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -222,6 +222,11 @@ static cl::opt<bool>
     EnableDFAJumpThreading("enable-dfa-jump-thread",
                            cl::desc("Enable DFA jump threading"),
                            cl::init(false), cl::Hidden);
+static cl::opt<bool>
+    DFAJumpThreadingOptSize("dfa-jump-thread-optsize",
+                           cl::desc("Enable DFA jump threading when optimizing for size"),
+                           cl::init(false), cl::Hidden);
+
 
 // TODO: turn on and remove flag
 static cl::opt<bool> EnablePGOForceFunctionAttrs(
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index 85d4065286e41f..ce5e823e531e0f 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -110,6 +110,11 @@ static cl::opt<unsigned>
                   cl::desc("Maximum cost accepted for the transformation"),
                   cl::Hidden, cl::init(50));
 
+static cl::opt<bool> DFAJumpThreadIgnoreOptSize(
+    "dfa-jump-ignore-optsize",
+    cl::desc("Enable dfa jump threading, even when optimizing for size"),
+    cl::Hidden, cl::init(false));
+
 namespace {
 
 class SelectInstToUnfold {
@@ -1244,7 +1249,7 @@ struct TransformDFA {
 bool DFAJumpThreading::run(Function &F) {
   LLVM_DEBUG(dbgs() << "\nDFA Jump threading: " << F.getName() << "\n");
 
-  if (F.hasOptSize()) {
+  if (!DFAJumpThreadIgnoreOptSize && F.hasOptSize()) {
     LLVM_DEBUG(dbgs() << "Skipping due to the 'minsize' attribute\n");
     return false;
   }
diff --git a/llvm/test/Transforms/DFAJumpThreading/negative.ll b/llvm/test/Transforms/DFAJumpThreading/negative.ll
index a9642814276992..c822b6613f7334 100644
--- a/llvm/test/Transforms/DFAJumpThreading/negative.ll
+++ b/llvm/test/Transforms/DFAJumpThreading/negative.ll
@@ -1,6 +1,7 @@
 ; RUN: opt -passes=dfa-jump-threading -dfa-cost-threshold=25 -pass-remarks-missed='dfa-jump-threading' -pass-remarks-output=%t -disable-output %s
 ; RUN: FileCheck --input-file %t --check-prefix=REMARK %s
 ; RUN: opt -S -passes=dfa-jump-threading %s | FileCheck %s
+; RUN: opt -S -passes=dfa-jump-threading -dfa-jump-ignore-optsize %s | FileCheck %s --check-prefix=IGNORESIZE
 
 ; This negative test case checks that the optimization doesn't trigger
 ; when the code size cost is too high.
@@ -186,6 +187,54 @@ define i32 @negative5(i32 %num) minsize {
 ; CHECK:       for.end:
 ; CHECK-NEXT:    ret i32 0
 ;
+; IGNORESIZE-LABEL: define i32 @negative5(
+; IGNORESIZE-SAME: i32 [[NUM:%.*]]) #[[ATTR0:[0-9]+]] {
+; IGNORESIZE-NEXT:  entry:
+; IGNORESIZE-NEXT:    br label [[FOR_BODY:%.*]]
+; IGNORESIZE:       for.body:
+; IGNORESIZE-NEXT:    [[COUNT:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_INC:%.*]] ]
+; IGNORESIZE-NEXT:    [[STATE:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ poison, [[FOR_INC]] ]
+; IGNORESIZE-NEXT:    switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
+; IGNORESIZE-NEXT:    i32 1, label [[CASE1:%.*]]
+; IGNORESIZE-NEXT:    i32 2, label [[CASE2:%.*]]
+; IGNORESIZE-NEXT:    ]
+; IGNORESIZE:       for.body.jt2:
+; IGNORESIZE-NEXT:    [[COUNT_JT2:%.*]] = phi i32 [ [[INC_JT2:%.*]], [[FOR_INC_JT2:%.*]] ]
+; IGNORESIZE-NEXT:    [[STATE_JT2:%.*]] = phi i32 [ [[STATE_NEXT_JT2:%.*]], [[FOR_INC_JT2]] ]
+; IGNORESIZE-NEXT:    br label [[CASE2]]
+; IGNORESIZE:       for.body.jt1:
+; IGNORESIZE-NEXT:    [[COUNT_JT1:%.*]] = phi i32 [ [[INC_JT1:%.*]], [[FOR_INC_JT1]] ]
+; IGNORESIZE-NEXT:    [[STATE_JT1:%.*]] = phi i32 [ [[STATE_NEXT_JT1:%.*]], [[FOR_INC_JT1]] ]
+; IGNORESIZE-NEXT:    br label [[CASE1]]
+; IGNORESIZE:       case1:
+; IGNORESIZE-NEXT:    [[COUNT2:%.*]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.*]] ], [ [[COUNT]], [[FOR_BODY]] ]
+; IGNORESIZE-NEXT:    br label [[FOR_INC_JT2]]
+; IGNORESIZE:       case2:
+; IGNORESIZE-NEXT:    [[COUNT1:%.*]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.*]] ], [ [[COUNT]], [[FOR_BODY]] ]
+; IGNORESIZE-NEXT:    [[CMP:%.*]] = icmp eq i32 [[COUNT1]], 50
+; IGNORESIZE-NEXT:    br i1 [[CMP]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE:%.*]]
+; IGNORESIZE:       si.unfold.false:
+; IGNORESIZE-NEXT:    br label [[FOR_INC_JT2]]
+; IGNORESIZE:       for.inc:
+; IGNORESIZE-NEXT:    [[INC]] = add nsw i32 undef, 1
+; IGNORESIZE-NEXT:    [[CMP_EXIT:%.*]] = icmp slt i32 [[INC]], [[NUM]]
+; IGNORESIZE-NEXT:    br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
+; IGNORESIZE:       for.inc.jt2:
+; IGNORESIZE-NEXT:    [[COUNT4:%.*]] = phi i32 [ [[COUNT1]], [[SI_UNFOLD_FALSE]] ], [ [[COUNT2]], [[CASE1]] ]
+; IGNORESIZE-NEXT:    [[STATE_NEXT_JT2]] = phi i32 [ 2, [[CASE1]] ], [ 2, [[SI_UNFOLD_FALSE]] ]
+; IGNORESIZE-NEXT:    [[INC_JT2]] = add nsw i32 [[COUNT4]], 1
+; IGNORESIZE-NEXT:    [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
+; IGNORESIZE-NEXT:    br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
+; IGNORESIZE:       for.inc.jt1:
+; IGNORESIZE-NEXT:    [[COUNT3:%.*]] = phi i32 [ [[COUNT1]], [[CASE2]] ], [ [[COUNT]], [[FOR_BODY]] ]
+; IGNORESIZE-NEXT:    [[STATE_NEXT_JT1]] = phi i32 [ 1, [[CASE2]] ], [ 1, [[FOR_BODY]] ]
+; IGNORESIZE-NEXT:    [[INC_JT1]] = add nsw i32 [[COUNT3]], 1
+; IGNORESIZE-NEXT:    [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
+; IGNORESIZE-NEXT:    br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
+; IGNORESIZE:       for.end:
+; IGNORESIZE-NEXT:    ret i32 0
+;
+
 entry:
   br label %for.body
 

@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 28, 2024

Can we have a more principled approach to determine when to allow this pass to run than a cl::opt? For example, change the CostThreshold in the pass depending on whether or not the function has optsize.

So, I've experimented with this, and I'm not sure this is the right approach to take right away. I think its premature to suppose that at -Oz we would ever want this pass on by default. I don't think its a good idea to try and guess what the threshold should be. Likely, that is something specific targets would need to tune on their own via -dfa-cost-threshold=. I don't see a better way to allow us to experiment with this, than adding this flag, which is simple(allow the pass w/ optsize), and we can still use -dfa-cost-threshold= to determine what we should choose.

I'd like to avoid running afoul of this pass being on by default in the future for size optimized targets when we're still in the experimental stage of evaluating this passes usefulness when combined with size optimizations. I agree that if we believe this is a good default in the future, we should adjust the logic to modify the threshold for optsize, but right now I don't have confidence that is the correct way forward.

I'm looking to see if there is a way to use -enable-dfa-jump-threading to control this behavior, instead of introducing a new cl::opt.

Created using spr 1.3.4
@ilovepi
Copy link
Contributor Author

ilovepi commented Feb 28, 2024

So, if DFAJumpThreading is ever on by default, then I think its pretty simple to repurpose -enable-dfa-jump-thread to enable the behavior in the pass, and we can drop -enable-dfa-jump-optsize @aeubanks, does that sound like a better plan?

@aeubanks
Copy link
Contributor

aeubanks commented Mar 5, 2024

I don't think its a good idea to try and guess what the threshold should be. Likely, that is something specific targets would need to tune on their own via -dfa-cost-threshold=.
The inliner has a target-independent threshold and it's much more important than this pass, so I don't see why this pass can't do the same thing.

As for experimenting, I was hoping that you could just experiment locally with local changes to LLVM. Is it much easier to experiment with a flag in LLVM release? If so then this is fine.

This pass is already experimental (given it's not enabled by default) so I'm not too worried about regressing it in terms of binary size.

@nikic
Copy link
Contributor

nikic commented Mar 6, 2024

As for experimenting, I was hoping that you could just experiment locally with local changes to LLVM. Is it much easier to experiment with a flag in LLVM release? If so then this is fine.

Agree that experimentation should be done with local changes to LLVM.

Options like these tend to get added and never removed, even if they have outlived their usefulness. I could kind of see the rationale for adding it to LoopRotate (which is well-known to be problematic in this regard, and it's plausible to assume that more than one organization would have interest in testing the option), but I wouldn't want to encourage this as a general pattern for exploration of minsize optimization changes.

@ilovepi
Copy link
Contributor Author

ilovepi commented Mar 19, 2024

Well, since I'm not going to have time to work on this in the near future, and it will be quite a while before I can roll out an experimental toolchain to partners, I'm going to close this for now. Its easy enough to re-open when we have better feeling for its value, but I do still think its good to have a way to opt into a transform irrespective of the optlevel.

@ilovepi ilovepi closed this Mar 19, 2024
@ilovepi ilovepi deleted the users/ilovepi/spr/llvmdfa-jump-threading-add-option-to-allow-dfajumpthreading-when branch March 19, 2024 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants