-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm][dfa-jump-threading] Allow DFAJumpThreading with optsize #83049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Created using spr 1.3.4
optimizing for size Pull Request: llvm#83049
Created using spr 1.3.4
Created using spr 1.3.4
static cl::opt<bool> | ||
DFAJumpThreadingOptSize("dfa-jump-thread-optsize", | ||
cl::desc("Enable DFA jump threading when optimizing for size"), | ||
cl::init(false), cl::Hidden); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its rather unfortunate that we need 2 flags to enable a single behavior ... Is there a common place we can put this flag and use it here and in DFAJumpThreading.cpp
?
cc: @petrhosek @PiJoules since I think they're interested. |
Can we have a more principled approach to determine when to allow this pass to run than a |
Sure. I can take a look at doing it that way instead. it will probably be much cleaner. |
This pass is not part of the default pipeline -- do you enable this pass downstream, or do you want to add options for all uses of minsize, even if there is no evidence of usefulness? |
We’d like to see if this can be useful to some of our size constrained users in the embedded space. Particularly because it has been proposed to turn this on in the default pipeline. https://discourse.llvm.org/t/rfc-enable-dfa-jumpthreading-pass-by-default/77231. Currently we don’t have evidence that this is beneficial, but for these targets we also don’t have a way to vet that with the current implementation. So to be concrete, I’d like the ability to enable this pass somehow even if compiling for size. If we can control this behavior with a threshold, then mechanically I don’t see much difference between that an the flags I’ve added in this patch. But right now we don’t have a way to evaluate if this may be useful at any threshold, and I think it’s useful if we can turn this knob in some way without changing the default behavior. |
also, we should remove pipeline checks for "optsize" and move those checks into the passes themselves |
…ilder Created using spr 1.3.4
done in #83188 |
Created using spr 1.3.4
@llvm/pr-subscribers-llvm-transforms Author: Paul Kirth (ilovepi) ChangesThough DFAJumpThreadingPass typically introduces size increases, targets Full diff: https://github.com/llvm/llvm-project/pull/83049.diff 3 Files Affected:
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index cbbbec0ccc8c4d..3b2d4523ef7e8e 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -222,6 +222,11 @@ static cl::opt<bool>
EnableDFAJumpThreading("enable-dfa-jump-thread",
cl::desc("Enable DFA jump threading"),
cl::init(false), cl::Hidden);
+static cl::opt<bool>
+ DFAJumpThreadingOptSize("dfa-jump-thread-optsize",
+ cl::desc("Enable DFA jump threading when optimizing for size"),
+ cl::init(false), cl::Hidden);
+
// TODO: turn on and remove flag
static cl::opt<bool> EnablePGOForceFunctionAttrs(
diff --git a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
index 85d4065286e41f..ce5e823e531e0f 100644
--- a/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/DFAJumpThreading.cpp
@@ -110,6 +110,11 @@ static cl::opt<unsigned>
cl::desc("Maximum cost accepted for the transformation"),
cl::Hidden, cl::init(50));
+static cl::opt<bool> DFAJumpThreadIgnoreOptSize(
+ "dfa-jump-ignore-optsize",
+ cl::desc("Enable dfa jump threading, even when optimizing for size"),
+ cl::Hidden, cl::init(false));
+
namespace {
class SelectInstToUnfold {
@@ -1244,7 +1249,7 @@ struct TransformDFA {
bool DFAJumpThreading::run(Function &F) {
LLVM_DEBUG(dbgs() << "\nDFA Jump threading: " << F.getName() << "\n");
- if (F.hasOptSize()) {
+ if (!DFAJumpThreadIgnoreOptSize && F.hasOptSize()) {
LLVM_DEBUG(dbgs() << "Skipping due to the 'minsize' attribute\n");
return false;
}
diff --git a/llvm/test/Transforms/DFAJumpThreading/negative.ll b/llvm/test/Transforms/DFAJumpThreading/negative.ll
index a9642814276992..c822b6613f7334 100644
--- a/llvm/test/Transforms/DFAJumpThreading/negative.ll
+++ b/llvm/test/Transforms/DFAJumpThreading/negative.ll
@@ -1,6 +1,7 @@
; RUN: opt -passes=dfa-jump-threading -dfa-cost-threshold=25 -pass-remarks-missed='dfa-jump-threading' -pass-remarks-output=%t -disable-output %s
; RUN: FileCheck --input-file %t --check-prefix=REMARK %s
; RUN: opt -S -passes=dfa-jump-threading %s | FileCheck %s
+; RUN: opt -S -passes=dfa-jump-threading -dfa-jump-ignore-optsize %s | FileCheck %s --check-prefix=IGNORESIZE
; This negative test case checks that the optimization doesn't trigger
; when the code size cost is too high.
@@ -186,6 +187,54 @@ define i32 @negative5(i32 %num) minsize {
; CHECK: for.end:
; CHECK-NEXT: ret i32 0
;
+; IGNORESIZE-LABEL: define i32 @negative5(
+; IGNORESIZE-SAME: i32 [[NUM:%.*]]) #[[ATTR0:[0-9]+]] {
+; IGNORESIZE-NEXT: entry:
+; IGNORESIZE-NEXT: br label [[FOR_BODY:%.*]]
+; IGNORESIZE: for.body:
+; IGNORESIZE-NEXT: [[COUNT:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[FOR_INC:%.*]] ]
+; IGNORESIZE-NEXT: [[STATE:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ poison, [[FOR_INC]] ]
+; IGNORESIZE-NEXT: switch i32 [[STATE]], label [[FOR_INC_JT1:%.*]] [
+; IGNORESIZE-NEXT: i32 1, label [[CASE1:%.*]]
+; IGNORESIZE-NEXT: i32 2, label [[CASE2:%.*]]
+; IGNORESIZE-NEXT: ]
+; IGNORESIZE: for.body.jt2:
+; IGNORESIZE-NEXT: [[COUNT_JT2:%.*]] = phi i32 [ [[INC_JT2:%.*]], [[FOR_INC_JT2:%.*]] ]
+; IGNORESIZE-NEXT: [[STATE_JT2:%.*]] = phi i32 [ [[STATE_NEXT_JT2:%.*]], [[FOR_INC_JT2]] ]
+; IGNORESIZE-NEXT: br label [[CASE2]]
+; IGNORESIZE: for.body.jt1:
+; IGNORESIZE-NEXT: [[COUNT_JT1:%.*]] = phi i32 [ [[INC_JT1:%.*]], [[FOR_INC_JT1]] ]
+; IGNORESIZE-NEXT: [[STATE_JT1:%.*]] = phi i32 [ [[STATE_NEXT_JT1:%.*]], [[FOR_INC_JT1]] ]
+; IGNORESIZE-NEXT: br label [[CASE1]]
+; IGNORESIZE: case1:
+; IGNORESIZE-NEXT: [[COUNT2:%.*]] = phi i32 [ [[COUNT_JT1]], [[FOR_BODY_JT1:%.*]] ], [ [[COUNT]], [[FOR_BODY]] ]
+; IGNORESIZE-NEXT: br label [[FOR_INC_JT2]]
+; IGNORESIZE: case2:
+; IGNORESIZE-NEXT: [[COUNT1:%.*]] = phi i32 [ [[COUNT_JT2]], [[FOR_BODY_JT2:%.*]] ], [ [[COUNT]], [[FOR_BODY]] ]
+; IGNORESIZE-NEXT: [[CMP:%.*]] = icmp eq i32 [[COUNT1]], 50
+; IGNORESIZE-NEXT: br i1 [[CMP]], label [[FOR_INC_JT1]], label [[SI_UNFOLD_FALSE:%.*]]
+; IGNORESIZE: si.unfold.false:
+; IGNORESIZE-NEXT: br label [[FOR_INC_JT2]]
+; IGNORESIZE: for.inc:
+; IGNORESIZE-NEXT: [[INC]] = add nsw i32 undef, 1
+; IGNORESIZE-NEXT: [[CMP_EXIT:%.*]] = icmp slt i32 [[INC]], [[NUM]]
+; IGNORESIZE-NEXT: br i1 [[CMP_EXIT]], label [[FOR_BODY]], label [[FOR_END:%.*]]
+; IGNORESIZE: for.inc.jt2:
+; IGNORESIZE-NEXT: [[COUNT4:%.*]] = phi i32 [ [[COUNT1]], [[SI_UNFOLD_FALSE]] ], [ [[COUNT2]], [[CASE1]] ]
+; IGNORESIZE-NEXT: [[STATE_NEXT_JT2]] = phi i32 [ 2, [[CASE1]] ], [ 2, [[SI_UNFOLD_FALSE]] ]
+; IGNORESIZE-NEXT: [[INC_JT2]] = add nsw i32 [[COUNT4]], 1
+; IGNORESIZE-NEXT: [[CMP_EXIT_JT2:%.*]] = icmp slt i32 [[INC_JT2]], [[NUM]]
+; IGNORESIZE-NEXT: br i1 [[CMP_EXIT_JT2]], label [[FOR_BODY_JT2]], label [[FOR_END]]
+; IGNORESIZE: for.inc.jt1:
+; IGNORESIZE-NEXT: [[COUNT3:%.*]] = phi i32 [ [[COUNT1]], [[CASE2]] ], [ [[COUNT]], [[FOR_BODY]] ]
+; IGNORESIZE-NEXT: [[STATE_NEXT_JT1]] = phi i32 [ 1, [[CASE2]] ], [ 1, [[FOR_BODY]] ]
+; IGNORESIZE-NEXT: [[INC_JT1]] = add nsw i32 [[COUNT3]], 1
+; IGNORESIZE-NEXT: [[CMP_EXIT_JT1:%.*]] = icmp slt i32 [[INC_JT1]], [[NUM]]
+; IGNORESIZE-NEXT: br i1 [[CMP_EXIT_JT1]], label [[FOR_BODY_JT1]], label [[FOR_END]]
+; IGNORESIZE: for.end:
+; IGNORESIZE-NEXT: ret i32 0
+;
+
entry:
br label %for.body
|
So, I've experimented with this, and I'm not sure this is the right approach to take right away. I think its premature to suppose that at I'd like to avoid running afoul of this pass being on by default in the future for size optimized targets when we're still in the experimental stage of evaluating this passes usefulness when combined with size optimizations. I agree that if we believe this is a good default in the future, we should adjust the logic to modify the threshold for optsize, but right now I don't have confidence that is the correct way forward. I'm looking to see if there is a way to use -enable-dfa-jump-threading to control this behavior, instead of introducing a new cl::opt. |
Created using spr 1.3.4
So, if DFAJumpThreading is ever on by default, then I think its pretty simple to repurpose |
As for experimenting, I was hoping that you could just experiment locally with local changes to LLVM. Is it much easier to experiment with a flag in LLVM release? If so then this is fine. This pass is already experimental (given it's not enabled by default) so I'm not too worried about regressing it in terms of binary size. |
Agree that experimentation should be done with local changes to LLVM. Options like these tend to get added and never removed, even if they have outlived their usefulness. I could kind of see the rationale for adding it to LoopRotate (which is well-known to be problematic in this regard, and it's plausible to assume that more than one organization would have interest in testing the option), but I wouldn't want to encourage this as a general pattern for exploration of minsize optimization changes. |
Well, since I'm not going to have time to work on this in the near future, and it will be quite a while before I can roll out an experimental toolchain to partners, I'm going to close this for now. Its easy enough to re-open when we have better feeling for its value, but I do still think its good to have a way to opt into a transform irrespective of the optlevel. |
Though DFAJumpThreadingPass typically introduces size increases, targets
optimizing for size may want to enable the pass for certain TUs and
workloads. This change adds options to override the default behavior,
and enable the DFAJumpThreadingPass to run, even when compiling for
size.
To experiment with DFAJumpThreading and with
minsize
,pass
-enable-dfa-jump-optsize
to the backend.