-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. #114265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect. This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp. There is no unit test provided as the issue can only be reliably reproduced in MIR (to force a spill/restore outside and inside a call sequence, respectively), but MIR cannot serialize the FPClobberedByCall field that triggers spillFPBP to run.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-x86 Author: Daniel Zabawa (daniel-zabawa) ChangesThis code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect. This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp. There is no unit test provided as the issue can only be reliably reproduced in MIR (to force a spill/restore outside and inside a call sequence, respectively), but MIR cannot serialize the FPClobberedByCall field that triggers spillFPBP to run. Full diff: https://github.com/llvm/llvm-project/pull/114265.diff 1 Files Affected:
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 38ea1f35be2b9a..e2285417a155e4 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -452,10 +452,13 @@ int X86InstrInfo::getSPAdjust(const MachineInstr &MI) const {
return -(I->getOperand(1).getImm());
}
- // Currently handle only PUSHes we can reasonably expect to see
- // in call sequences
+ // Handle other opcodes we reasonably expect to see in call
+ // sequences. Note this may include spill/restore of FP/BP.
switch (MI.getOpcode()) {
default:
+ assert(!(MI.modifiesRegister(X86::RSP, &RI) ||
+ MI.getDesc().hasImplicitDefOfPhysReg(X86::RSP)) &&
+ "Unhandled opcode in getSPAdjust");
return 0;
case X86::PUSH32r:
case X86::PUSH32rmm:
@@ -467,6 +470,30 @@ int X86InstrInfo::getSPAdjust(const MachineInstr &MI) const {
case X86::PUSH64rmr:
case X86::PUSH64i32:
return 8;
+ case X86::POP32r:
+ case X86::POP32rmm:
+ case X86::POP32rmr:
+ return -4;
+ case X86::POP64r:
+ case X86::POP64rmm:
+ case X86::POP64rmr:
+ return -8;
+ // FIXME: (implement and) use isAddImmediate in the
+ // default case instead of the following ADD/SUB cases.
+ case X86::ADD32ri:
+ case X86::ADD32ri8:
+ case X86::ADD64ri32:
+ if (MI.getOperand(0).getReg() == X86::RSP &&
+ MI.getOperand(1).getReg() == X86::RSP)
+ return -MI.getOperand(2).getImm();
+ return 0;
+ case X86::SUB32ri:
+ case X86::SUB32ri8:
+ case X86::SUB64ri32:
+ if (MI.getOperand(0).getReg() == X86::RSP &&
+ MI.getOperand(1).getReg() == X86::RSP)
+ return MI.getOperand(2).getImm();
+ return 0;
}
}
|
@daniel-zabawa any tests? |
I am open to suggestions on how to cover this. The potential bug in prolog/epilog is contingent on having a frame object referenced in a call sequence, following a previous call sequence which requires the frame pointer to be spilled. I don't know any reliable way to create the code exposing the issue without MIR, but MIR also does not support serializing the attributes (FPClobberedByCall) required for the FP spill/restore to be generated. |
Can we add this attribute to |
Thanks - I did not notice that it would be simple to add that into serialization. The test itself is still taking some time as it has to satisfy rather particular conditions for the FP to be spilled around a call and for that adjustment to be counted in prologepilog. |
pinging @RKSimon @phoebewang for review I have still not resolved the linux failure, but I believe it's unrelated. The only failed tests are in lldb-dap:
but these are also reported as unresolved in the summary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
fa874a0
to
32e26ed
Compare
force-push to respin the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - but please be vigilant as I wonder if other obscure instructions may be used as well and could fire that assert in an external test case.
@daniel-zabawa Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
We have seen some suspected miscompiles after this patch, and I've tried to reduce it a bit. What happens is that we get different code depending on if there is debug info or not (Which seems a bit weird given the code changes here? But maybe it exposes some older problem or I just haven't analyzed this patch enough?). Below is my reduced test case. It is a mir test foo.mir like this:
If I compile it using
|
I've reduced the test program even further:
I suspect the problem somehow is related to the presence of "hasPushSequences: true". |
Ping! Would be nice to know if you need more information (I could probably make a runnable C-level reproducer, but if the already provided reproducers are good enough I rather not spend time on that)? Considering that we've seen lots of failing tests (runtime failures) starting to happen with this patch, maybe it should be reverted until someone has analyzed these problems? @phoebewang, @RKSimon and @e-kud : I can't even see that @daniel-zabawa has a registered email address. So unsure if pinging here actually will lead to anything? I guess we shouldn't accept patches from people that aren't following up on problems found when landing patches? |
@bjope - my apologies as I had missed the recent email notifications. The current reproducer is fine. The only change related to hasPushSequences was to add the serialization as there was no other reliable way to recreate the situation in MIR. That may well have exposed something below. Thanks @phoebewang for reverting. |
I believe I've found the issue. When X86FrameLowering is handling the initial ADJCALLSTACKDOWN, the usual procedure is to insert the SP adjustment (SUB $rsp) at the ADJCALLSTACKDOWN site, with the returned iterator following the inserted SUB, as it's simply supplying the SP adjustment we already counted. But, the insertion point actually skips debug instructions forward, and this is not accounted for in the result. So if debug instructions are present, the SUB inserted to replace the ADJCALLSTACKDOWN is visited again in the replaceFrameIndices loop, and is counted again towards the SP adjustment. This didn't happen before because we only recognized PUSH. This is unfortunate as the prologepilog code apparently assumes the code replacing ADJCALLSTACKDOWN will not be visited in the loop. The original change to skip debug instructions when placing the replacement code traces back to https://bugs.llvm.org/show_bug.cgi?id=31319. The commit only mentions that debug instructions after the erased (replaced?) call-stack adjustment could affect codegen. Both of the test cases from the bug report produce the same results with -g with or without this change. The lits added (CodeGen/X86/frame-lowering-debug-intrinsic.ll, CodeGen/X86/frame-lowering-debug-intrinsic-2.ll) pass with or without this change. I am at a loss why the SUB should appear after debug instructions when the ADJCALLSTACKDOWN originally preceded them. I'll have to dig a bit to figure out a case that justifies this. |
@bjope Do you mean an email address shown in the GitHub? I think it's hidden by default. You can find the email address from
I actually didn't revert it. I clicked the button, but decided to write an email to you and let you decide. So I just gave up creating a PR. I have reverted it now. |
Just a heads up that one of our internal tests started failing which I bisected back to this commit. I am working on a reproducer and like the case @bjope mentioned, the problem only occurs when the program is compiled with debug information. In our case, the program seems to go into an infinite loop after your change. |
Well, I kind of thought that I would see it in the email notification I got myself when I had used @ to cc someone (i.e. that I would see who else got that email). But I guess it is completely hidden then. And I was probably a bit too harsh when talking about "people that aren't following up on problems found when landing patches". We can't expect everyone to be on high alert on any email notification X days after having submitted something (this wasn't even detected by the buildbots afaik). And Daniel did end up responding here. Just frustrating when you have a problem and don't even know if the submitter has gone fishing for the next couple of weeks (and you can't even see who the notifications are sent to). |
We should be more generous to new contributors, they will learn it sooner or later. And you don't necessarily wait for author's response before reverting it. |
@dyung please update here when you have a reproducer, even if it's not compact. That sounds like a different symptom than the existing issues. |
Sorry for the delay, I'm having trouble reducing it without completely breaking the test so I'll see if I can clean it up a bit and post it. |
@daniel-zabawa sorry for the delay, but here is a reduced C++ program which demonstrates the issue we were seeing on our internal tests. extern "C" {
void printf(char *);
void snprintf(char *, long...);
void atexit(void());
}
void a() { printf("{PASS}\n{END}\n\x1a\n"); }
__attribute__((constructor)) void b() { atexit(a); }
void c(char *, int, char *fmt...) {
__builtin_va_list args;
__builtin_va_start(args, fmt);
}
char d[1];
long e;
int main() {
struct f {
f() {
struct m {
m() {
for (long g = -10; g; ++g)
for (long h = -10; h != 10; ++h)
struct i {
long j;
i(long initial) {
char *k;
long l = initial, actual = __sync_lock_test_and_set_8(&l, j);
k = "";
snprintf(d, sizeof(d),
"%%s %%s0 expected %s/%s but got %s/%s", 0, 0, 0, 0,
0, "");
c("", 0, d, 0, "__sync_lock_test_and_set_8", initial, j,
actual, l);
}
} i(g);
struct i {
long j;
i(long initial) {
long actual = __sync_lock_test_and_set_8(&e, j);
c("", 0, "", j, 0, initial, actual, initial);
}
} i(0);
}
} m;
}
} f;
} Compile with |
@dyung - no worries, and I'm able to reproduce the issue. I've confirmed it's the same issue as the other failure: there is a debug instruction following the ADJCALLSTACKDOWN, and when frame lowering is eliminating said ADJCALLSTACKDOWN, the inserted code is placed after the debug instruction, but the iterator it returns is pointing to the debug instruction. The prolog/epilog code is written such that it tracks the SP adjustment of ADJCALLSTACKDOWN based on the pseudo, and frame-lowering is expected to insert code to affect that adjustment at the same location. When the fixup code is inserted after one or more debug instructions, prolog/epilog will visit it again. This worked before by simply not recognizing that certain instructions are SP adjustments. As I mentioned before, I have not been able to reproduce whatever issue was fixed by the frame-lowering change to insert frame pseudo replacement code after debug instructions. If anything, it may have been due to some interaction with other frame-lowering code that is trying to parse the frame setup to do some simplification, but I cannot tell what that code is. Ideally I'd like to avoid the situation where pseudo replacement code is reordered with debug instructions, but without knowing the root issue at the time it was changed, it will require thorough testing. |
This code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect.
This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp.