-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[CodeGen] Fix MachineModuleInfo
's move constructor to be more safe with MCContext
ownership.
#104834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CodeGen] Fix MachineModuleInfo
's move constructor to be more safe with MCContext
ownership.
#104834
Conversation
591846a
to
9b6e6a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any context actually using the move constructor?
@@ -124,10 +124,10 @@ class MachineModuleInfo { | |||
const LLVMTargetMachine &getTarget() const { return TM; } | |||
|
|||
const MCContext &getContext() const { | |||
return ExternalContext ? *ExternalContext : Context; | |||
return ExternalContext ? *ExternalContext : *Context; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this ExternalContext business? I don't understand why it's here. If it's necessary to construct with an external context, I would expect that to be the only way to construct MMI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I don't understand why it's here either (and my PR is not doing anything with it 😛). Any chance we can decide the fate of ExternalContext
in a separate PR since it's out of the scope of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's the same scope, for what should own the MCContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it explicitly said This is an external context
which means someone else is owning it instead of this
here, so not quite the same scope, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question that needs to be answered is "who owns the MCContext". I think the answer should be definitely MachineModuleInfo, or definitively an external owner. Having 2 modes here is a bad API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree; See my comments below. IMO it's better to have an external owner. It will bring MMI closer to how llvm::Module
is managed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having 2 modes here is a bad API
I personally agree with this in principal, but again, I don't have much context on why these two are here originally, so I'd prefer not having to decide the fate of this API while not understanding the original intension of the design. If anyone else has more context, I'd really love to hear and happy to make another PR to improve this.
I tried to remove the move constructor and compilation of llvm seems to be still fine with the projects that we are building, that is |
MachineModuleInfo
's move constructor to be more safe with MCContext
ownership.
This function is not used by anyone in the LLVM monorepo AFAIK. I have a related PR (#98770) to this, which exposes the move constructor on the MMIWP side, so that I can inspect its contents after I run passes on it without requiring the PM to stay alive. However, as discussions in the PR shows, when I used it in the use case I showed here, I would get segfaults. I suspected this was because MFs had references to the old, unmoved MMI fields deleted by the PM destructor, which @arsenm has fixed since then. I have not tested the MMIWP constructor since then due to time constraints. TLDR; I think there should be a test for this constructor in LLVM proper (something along the lines of my example use case with the PM explicitly deleted should be enough), to prove that it is working correctly in the first place. |
Thank you for context and suggestion on adding a test! I agree that a test is good to have, though do you have any suggestions where should it be added to? Your PR doesn't seem to have a real one doing the testing in the code but only code snippet in the comment, and I can't find any in |
Actually, nvm, found something in |
✅ With the latest revision this PR passed the C/C++ code formatter. |
@weiweichen @arsenm though a working move constructor will solve the issue here, I think the best solution is to disallow move construction entirely, and treat MMI just like you would an In other words, just like how an llvm-project/llvm/include/llvm/IR/Module.h Line 255 in 52bfb26
An namespace llvm {
class MachineModuleInfo {
explicit MachineModuleInfo(LLVMTargetMachine &TM, MCContext &ExtContext);
} Furthermore, llvm-project/llvm/include/llvm/CodeGen/MachineModuleInfo.h Lines 196 to 224 in b443733
The only downside of this approach is that this change will probably touch all the LLVM backends, but IMO it makes more sense and should address the use cases we already have. But again, I'm happy either way as long as it works. |
This perhaps is a bit overkill for solving my issue in particular where I just need to get the MMI out of the PM's control so that it's not destroyed when the PM is destructed. I need access to the MachineFunctions. I'll leave the suggested change to whoever think is better for their issues and perhaps move forward with this PR that's pretty self-contained and less disruptive without any API changes and also solves my problem 😬 . |
The main issue that's unclear to me for the proposed approach is that who will now be the owner of the |
Both will have to be explicitly managed by the user of the pipeline, just like the In MC-related use cases AFAIK (i.e. MC disassembler, MC AsmParsing, LLVM Bolt, etc) As for how MMI will be managed, the new PM driver seems to be initializing and managing it on its own, as it can be llvm-project/llvm/tools/llc/NewPMDriver.cpp Line 111 in 90a8e5a
So I don't think adding an MCContext to the list of explicitly managed things in the pipeline is too much to ask. Maybe there is a way to make MMI manage its MCContext without this whole internal/external shenanigans; In that case I see the point of keeping management of the MCContext the way it is in the codegen pipeline. |
I don't feel like object to what your intended design/improvement for this part, though that's really not what this PR is for and what the problem I'm trying to solve here. Any chance we can address your design in a separate PR so that I can move on with my downstream project? Happy to review and provide feedbacks on your changes if you'd like to put up a PR to reflect the suggestion! |
Your patch is entirely about the ownership of MachineModuleInfo. Just trying to move along with something that happens to avoid your problem today without a real design decision being made is a path to the project bitrotting as a whole. |
My PR is not really touching much ownership ( Can we please fix the bug first and then the design? Do they have to be done all in one go? It gets a bit frustrating here to be honest. |
Could you also just change the move constructor to move the MCContext in place without switching to the unique_ptr?
This isn't big enough to call for an RFC
Since when is software not frustrating? |
@weiweichen @arsenm I am investigating how much work needs to be done for the context ownership change. |
I don't think this is the right solution for fixing the bug though. Yes, the MCContext can be moved constructed instead of directly constructed in
Not trying to be difficult, but I'm curious what is the criteria for API changing with an RFC. Is there a community guideline about this? Also, regarding the context of |
@weiweichen this is the diff that added it in: |
@hgreving2304, I think you were the original author who puts the |
Sound good! Thank you for investigating this! I'll wait. |
This was done in order to have an external driver owning the MCContext pointer (and the data it points to). I am not at Google anymore so I can't talk for them, but I know that changing this back will create a lot of trouble there. Before this change, a lot of cumbersome copying was needed. |
@weiweichen @arsenm #105541 addresses the issues raised in the review here. |
Thank you very much @hgreving2304 for the quick response and the context about the design! |
@matinraayai, the changes in #105541 makes sense to me that it reflects the design change you suggested here as well. You did mention that the PR is WIP. I'm wondering how soon do you expect it, or at least the first 2 parts to get merged? (selfishness, I do really want the move constructor's bug fixed as soon as possible, either with a small patch or changing the way how MMI is used so that the move constructor is irrelevant with your PR 🙏) |
The only reason I made it a WIP is because I'm not sure if everyone using the API like the changes or not (especially regarding the TargetMachine interface) + I might have forgotten to refactor parts of the LLVM monorepo that uses MMIs, or I may have broken something by accident. Once I get the green light that this change is favored, I will change it to a normal PR. The ETA is dependent on how long the codeowners will take to respond to it. |
I think it would be hard to write a specific rule. This case isn't changing the IR, and is only changing a niche mostly internal API. |
MCContext
inMachineModuleInfo
is an object that is not copyable nor movable, and eachMachineFunction
has a reference to this object. WhileMachineModuleInfo
has a move constructor that can move theMachineFunctions
map to a newMachineModuleInfo
object, theMCContext
is reconstructed instead of keeping the original one. This is dangerous because it's possible that after themove
, theMachineFunction
inMachineFunctions
are now having deadMCContext
which is different from the newly constructed one.Make
MCContext
a unique_ptr inMachineModuleInfo
and move it in the move constructor ofMachineModuleInfo
to preserve theMCContext
in used inMachineFunctions
.