Skip to content

[CodeGen] Fix MachineModuleInfo's move constructor to be more safe with MCContext ownership. #104834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

weiweichen
Copy link
Contributor

MCContext in MachineModuleInfo is an object that is not copyable nor movable, and each MachineFunction has a reference to this object. While MachineModuleInfo has a move constructor that can move the MachineFunctions map to a new MachineModuleInfo object, the MCContext is reconstructed instead of keeping the original one. This is dangerous because it's possible that after the move, the MachineFunction in MachineFunctions are now having dead MCContext which is different from the newly constructed one.

Make MCContext a unique_ptr in MachineModuleInfo and move it in the move constructor of MachineModuleInfo to preserve the MCContext in used in MachineFunctions.

@weiweichen weiweichen changed the title [CodeGen] Make MMI in MachineModuleInfoWrapperPass a unique_ptr. [CodeGen] Make MCContext in MachineModuleInfo a unique_ptr. Aug 19, 2024
@weiweichen weiweichen force-pushed the weiweic/make-mmiwp-mmi-unique-ptr branch from 591846a to 9b6e6a0 Compare August 19, 2024 19:10
@weiweichen weiweichen marked this pull request as ready for review August 19, 2024 19:11
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any context actually using the move constructor?

@@ -124,10 +124,10 @@ class MachineModuleInfo {
const LLVMTargetMachine &getTarget() const { return TM; }

const MCContext &getContext() const {
return ExternalContext ? *ExternalContext : Context;
return ExternalContext ? *ExternalContext : *Context;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this ExternalContext business? I don't understand why it's here. If it's necessary to construct with an external context, I would expect that to be the only way to construct MMI

Copy link
Contributor Author

@weiweichen weiweichen Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I don't understand why it's here either (and my PR is not doing anything with it 😛). Any chance we can decide the fate of ExternalContext in a separate PR since it's out of the scope of this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's the same scope, for what should own the MCContext

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it explicitly said This is an external context which means someone else is owning it instead of this here, so not quite the same scope, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question that needs to be answered is "who owns the MCContext". I think the answer should be definitely MachineModuleInfo, or definitively an external owner. Having 2 modes here is a bad API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; See my comments below. IMO it's better to have an external owner. It will bring MMI closer to how llvm::Module is managed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having 2 modes here is a bad API

I personally agree with this in principal, but again, I don't have much context on why these two are here originally, so I'd prefer not having to decide the fate of this API while not understanding the original intension of the design. If anyone else has more context, I'd really love to hear and happy to make another PR to improve this.

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 19, 2024

Is there any context actually using the move constructor?

I tried to remove the move constructor and compilation of llvm seems to be still fine with the projects that we are building, that is mlir, clang, compiler-rt and lldb. So I guess it is not actually being used in the upstream repo and maybe some downstream projects are using this API (which we very much like to use it with the fix here for the Mojo compiler 😃 )?

@weiweichen weiweichen changed the title [CodeGen] Make MCContext in MachineModuleInfo a unique_ptr. [CodeGen] Fix MachineModuleInfo's move constructor to be more safe with MCContext ownership. Aug 19, 2024
@matinraayai
Copy link
Contributor

matinraayai commented Aug 19, 2024

This function is not used by anyone in the LLVM monorepo AFAIK. I have a related PR (#98770) to this, which exposes the move constructor on the MMIWP side, so that I can inspect its contents after I run passes on it without requiring the PM to stay alive.

However, as discussions in the PR shows, when I used it in the use case I showed here, I would get segfaults. I suspected this was because MFs had references to the old, unmoved MMI fields deleted by the PM destructor, which @arsenm has fixed since then. I have not tested the MMIWP constructor since then due to time constraints.

TLDR; I think there should be a test for this constructor in LLVM proper (something along the lines of my example use case with the PM explicitly deleted should be enough), to prove that it is working correctly in the first place.

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 20, 2024

This function is not used by anyone in the LLVM monorepo AFAIK. I have a related PR (#98770) to this, which exposes the move constructor on the MMIWP side, so that I can inspect its contents after I run passes on it without requiring the PM to stay alive.

However, as discussions in the PR shows, when I used it in the use case I showed here, I would get segfaults. I suspected this was because MFs had references to the old, unmoved MMI fields deleted by the PM destructor, which @arsenm has fixed since then. I have not tested the MMIWP constructor since then due to time constraints.

TLDR; I think there should be a test for this constructor in LLVM proper (something along the lines of my example use case with the PM explicitly deleted should be enough), to prove that it is working correctly in the first place.

Thank you for context and suggestion on adding a test! I agree that a test is good to have, though do you have any suggestions where should it be added to? Your PR doesn't seem to have a real one doing the testing in the code but only code snippet in the comment, and I can't find any in llvm/test at all that's remotely testing this sort of thing?

@weiweichen
Copy link
Contributor Author

This function is not used by anyone in the LLVM monorepo AFAIK. I have a related PR (#98770) to this, which exposes the move constructor on the MMIWP side, so that I can inspect its contents after I run passes on it without requiring the PM to stay alive.
However, as discussions in the PR shows, when I used it in the use case I showed here, I would get segfaults. I suspected this was because MFs had references to the old, unmoved MMI fields deleted by the PM destructor, which @arsenm has fixed since then. I have not tested the MMIWP constructor since then due to time constraints.
TLDR; I think there should be a test for this constructor in LLVM proper (something along the lines of my example use case with the PM explicitly deleted should be enough), to prove that it is working correctly in the first place.

Thank you for context and suggestion on adding a test! I agree that a test is good to have, though do you have any suggestions where should it be added to? Your PR doesn't seem to have a real one doing the testing in the code but only code snippet in the comment, and I can't find any in llvm/test at all that's remotely testing this sort of thing?

Actually, nvm, found something in llvm/unittests/CodeGen. Will push a commit to test the change in this PR.

Copy link

github-actions bot commented Aug 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@matinraayai
Copy link
Contributor

matinraayai commented Aug 20, 2024

@weiweichen @arsenm though a working move constructor will solve the issue here, I think the best solution is to disallow move construction entirely, and treat MMI just like you would an llvm::Module.

In other words, just like how an llvm::Module can be constructed only by passing a reference to the llvm::LLVMContext it uses:

explicit Module(StringRef ModuleID, LLVMContext& C);

An llvm::MachineModuleInfo should also be constructed only by passing a reference to the llvm::MCContext and the llvm::LLVMTargetMachine it will use:

namespace llvm {
  class MachineModuleInfo { 
explicit MachineModuleInfo(LLVMTargetMachine &TM, MCContext &ExtContext);
}

Furthermore, llvm::MachineModuleInfoWrapperPass should also not manage the lifetime of its llvm::MachineModuleInfo, just like
the new PM analysis provider of MMI here:

class MachineModuleAnalysis : public AnalysisInfoMixin<MachineModuleAnalysis> {
friend AnalysisInfoMixin<MachineModuleAnalysis>;
static AnalysisKey Key;
MachineModuleInfo &MMI;
public:
class Result {
MachineModuleInfo &MMI;
Result(MachineModuleInfo &MMI) : MMI(MMI) {}
friend class MachineModuleAnalysis;
public:
MachineModuleInfo &getMMI() { return MMI; }
// MMI owes MCContext. It should never be invalidated.
bool invalidate(Module &, const PreservedAnalyses &,
ModuleAnalysisManager::Invalidator &) {
return false;
}
};
MachineModuleAnalysis(MachineModuleInfo &MMI) : MMI(MMI) {}
/// Run the analysis pass and produce machine module information.
Result run(Module &M, ModuleAnalysisManager &);
};
} // end namespace llvm

The only downside of this approach is that this change will probably touch all the LLVM backends, but IMO it makes more sense and should address the use cases we already have. But again, I'm happy either way as long as it works.

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 20, 2024

@weiweichen @arsenm though a working move constructor will solve the issue here, I think the best solution is to disallow move construction entirely, and treat MMI just like you would an llvm::Module.

In other words, just like how an llvm::Module can be constructed only by passing a reference to the llvm::LLVMContext it uses:

explicit Module(StringRef ModuleID, LLVMContext& C);

An llvm::MachineModuleInfo should also be constructed only by passing a reference to the llvm::MCContext and the llvm::LLVMTargetMachine it will use:

namespace llvm {
  class MachineModuleInfo { 
explicit MachineModuleInfo(LLVMTargetMachine &TM, MCContext &ExtContext);
}

Furthermore, llvm::MachineModuleInfoWrapperPass should also not manage the lifetime of its llvm::MachineModuleInfo, just like the new PM analysis provider of MMI here:

class MachineModuleAnalysis : public AnalysisInfoMixin<MachineModuleAnalysis> {
friend AnalysisInfoMixin<MachineModuleAnalysis>;
static AnalysisKey Key;
MachineModuleInfo &MMI;
public:
class Result {
MachineModuleInfo &MMI;
Result(MachineModuleInfo &MMI) : MMI(MMI) {}
friend class MachineModuleAnalysis;
public:
MachineModuleInfo &getMMI() { return MMI; }
// MMI owes MCContext. It should never be invalidated.
bool invalidate(Module &, const PreservedAnalyses &,
ModuleAnalysisManager::Invalidator &) {
return false;
}
};
MachineModuleAnalysis(MachineModuleInfo &MMI) : MMI(MMI) {}
/// Run the analysis pass and produce machine module information.
Result run(Module &M, ModuleAnalysisManager &);
};
} // end namespace llvm

The only downside of this approach is that this change will probably touch all the LLVM backends, but IMO it makes more sense and should address the use cases we already have. But again, I'm happy either way as long as it works.

This perhaps is a bit overkill for solving my issue in particular where I just need to get the MMI out of the PM's control so that it's not destroyed when the PM is destructed. I need access to the MachineFunctions. I'll leave the suggested change to whoever think is better for their issues and perhaps move forward with this PR that's pretty self-contained and less disruptive without any API changes and also solves my problem 😬 .

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 20, 2024

@weiweichen @arsenm though a working move constructor will solve the issue here, I think the best solution is to disallow move construction entirely, and treat MMI just like you would an llvm::Module.

In other words, just like how an llvm::Module can be constructed only by passing a reference to the llvm::LLVMContext it uses:

explicit Module(StringRef ModuleID, LLVMContext& C);

An llvm::MachineModuleInfo should also be constructed only by passing a reference to the llvm::MCContext and the llvm::LLVMTargetMachine it will use:

namespace llvm {
  class MachineModuleInfo { 
explicit MachineModuleInfo(LLVMTargetMachine &TM, MCContext &ExtContext);
}

Furthermore, llvm::MachineModuleInfoWrapperPass should also not manage the lifetime of its llvm::MachineModuleInfo, just like the new PM analysis provider of MMI here:

class MachineModuleAnalysis : public AnalysisInfoMixin<MachineModuleAnalysis> {
friend AnalysisInfoMixin<MachineModuleAnalysis>;
static AnalysisKey Key;
MachineModuleInfo &MMI;
public:
class Result {
MachineModuleInfo &MMI;
Result(MachineModuleInfo &MMI) : MMI(MMI) {}
friend class MachineModuleAnalysis;
public:
MachineModuleInfo &getMMI() { return MMI; }
// MMI owes MCContext. It should never be invalidated.
bool invalidate(Module &, const PreservedAnalyses &,
ModuleAnalysisManager::Invalidator &) {
return false;
}
};
MachineModuleAnalysis(MachineModuleInfo &MMI) : MMI(MMI) {}
/// Run the analysis pass and produce machine module information.
Result run(Module &M, ModuleAnalysisManager &);
};
} // end namespace llvm

The only downside of this approach is that this change will probably touch all the LLVM backends, but IMO it makes more sense and should address the use cases we already have. But again, I'm happy either way as long as it works.

The main issue that's unclear to me for the proposed approach is that who will now be the owner of the MCContext if we change the API for llvm::MachineModuleInfo to just take a reference to MCContext. And if llvm::MachineModuleInfoWrapperPass no longer maintain the lifetime of llvm::MachineModuleInfo, then who is going to do that?

@matinraayai
Copy link
Contributor

matinraayai commented Aug 20, 2024

The main issue that's unclear to me for the proposed approach is that who will now be the owner of the MCContext if we change the API for llvm::MachineModuleInfo to just take a reference to MCContext. And if llvm::MachineModuleInfoWrapperPass no longer maintain the lifetime of llvm::MachineModuleInfo, then who is going to do that?

Both will have to be explicitly managed by the user of the pipeline, just like the llvm::LLVMContext and llvm::Module are.

In MC-related use cases AFAIK (i.e. MC disassembler, MC AsmParsing, LLVM Bolt, etc) llvm::MCContext is explicitly created and manged. What I suggested here just extends this to this particular case here.

As for how MMI will be managed, the new PM driver seems to be initializing and managing it on its own, as it can be
seen here in LLC:

MachineModuleInfo MMI(&LLVMTM);

So I don't think adding an MCContext to the list of explicitly managed things in the pipeline is too much to ask.

Maybe there is a way to make MMI manage its MCContext without this whole internal/external shenanigans; In that case I see the point of keeping management of the MCContext the way it is in the codegen pipeline.

@weiweichen
Copy link
Contributor Author

The main issue that's unclear to me for the proposed approach is that who will now be the owner of the MCContext if we change the API for llvm::MachineModuleInfo to just take a reference to MCContext. And if llvm::MachineModuleInfoWrapperPass no longer maintain the lifetime of llvm::MachineModuleInfo, then who is going to do that?

Both will have to be explicitly managed by the user of the pipeline, just like the llvm::LLVMContext and llvm::Module are.

In MC-related use cases AFAIK (i.e. MC disassembler, MC AsmParsing, LLVM Bolt, etc) llvm::MCContext is explicitly created and manged. What I suggested here just extends this to this particular case here.

As for how MMI will be managed, the new PM driver seems to be initializing and managing it on its own, as it can be seen here in LLC:

MachineModuleInfo MMI(&LLVMTM);

So I don't think adding an MCContext to the list of explicitly managed things in the pipeline is too much to ask.

Maybe there is a way to make MMI manage its MCContext without this whole internal/external shenanigans; In that case I see the point of keeping management of the MCContext the way it is in the codegen pipeline.

I don't feel like object to what your intended design/improvement for this part, though that's really not what this PR is for and what the problem I'm trying to solve here. Any chance we can address your design in a separate PR so that I can move on with my downstream project? Happy to review and provide feedbacks on your changes if you'd like to put up a PR to reflect the suggestion!

@arsenm
Copy link
Contributor

arsenm commented Aug 20, 2024

I don't feel like object to what your intended design/improvement for this part, though that's really not what this PR is for and what the problem I'm trying to solve here.

Your patch is entirely about the ownership of MachineModuleInfo. Just trying to move along with something that happens to avoid your problem today without a real design decision being made is a path to the project bitrotting as a whole.

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 20, 2024

I don't feel like object to what your intended design/improvement for this part, though that's really not what this PR is for and what the problem I'm trying to solve here.

Your patch is entirely about the ownership of MachineModuleInfo. Just trying to move along with something that happens to avoid your problem today without a real design decision being made is a path to the project bitrotting as a whole.

My PR is not really touching much ownership (llvm::MachineModuleInfo is still the owner of MCConext) but more like fixing a dangerous existing bug. I DO NOT DISAGREE that it worthwhile reconsidering the actual ownership of the design if you see fit, but again, it's out of the scope of this bug fixing PR IMO, and I don't feel comfortable to moving ahead without understanding the context of the original design. Also, for API change and design change like this, should there be an RFC to get community inputs before moving forward?

Can we please fix the bug first and then the design? Do they have to be done all in one go? It gets a bit frustrating here to be honest.

@arsenm
Copy link
Contributor

arsenm commented Aug 20, 2024

My PR is not really touching much ownership but more like fixing a dangerous existing bug. I DO NOT DISAGREE that it worthwhile reconsidering the actual ownership of the design if you see fit, but again, it's out of the scope of this bug fixing PR IMO, and I don't feel comfortable to moving ahead without understanding the context of the original design.

Could you also just change the move constructor to move the MCContext in place without switching to the unique_ptr?

Also, for API change and design change like this, should there be an RFC to get community inputs before moving forward?

This isn't big enough to call for an RFC

Can we please fix the bug first and then the design? Do they have to be done all in one go? It gets a bit frustrating here to be honest.

Since when is software not frustrating?

@matinraayai
Copy link
Contributor

@weiweichen @arsenm I am investigating how much work needs to be done for the context ownership change.
I'll let you know by the end of the day (EST) about how it goes.
@weiweichen I understand the frustration, but this is an issue that has been gone unsolved for a while now. Even with the context issue fixed I don't think there's a guarantee the move constructor will work without segfaulting, and as @arsenm said, it will only add to the things patched haphazardly in MMI.

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 20, 2024

Could you also just change the move constructor to move the MCContext in place without switching to the unique_ptr?

I don't think this is the right solution for fixing the bug though. Yes, the MCContext can be moved constructed instead of directly constructed in llvm::MachineModuleInfo's move constructor. But the MachineFunction in MachineFunctions are still referencing to the original MCContext object (now moved to the new one), that would still have the issue that the now moved MachineFunction in MachineFunctions are referencing a possible dead MCContext depending on the lifetime of the original llvm::MachineModuleInfo. Or am I missing something here??

This isn't big enough to call for an RFC

Not trying to be difficult, but I'm curious what is the criteria for API changing with an RFC. Is there a community guideline about this? Also, regarding the context of ExternContext, what would be the best way to find out why it is there at the first place?

@matinraayai
Copy link
Contributor

matinraayai commented Aug 20, 2024

@weiweichen this is the diff that added it in:
https://reviews.llvm.org/D91313
@arsenm do you know why they don't clear the external context in this patch here?
https://reviews.llvm.org/D91313?id=306289#inline-855997

@weiweichen
Copy link
Contributor Author

weiweichen commented Aug 20, 2024

@hgreving2304, I think you were the original author who puts the ExternContext in llvm::MachineModuleInfo. Would you mind giving us a bit more context on why it was added originally, and if there is any concerns on removing it with the change of make MCContext not owned by llvm::MachineModuleInfo entirely anymore?

@weiweichen
Copy link
Contributor Author

@weiweichen @arsenm I am investigating how much work needs to be done for the context ownership change.
I'll let you know by the end of the day (EST) about how it goes.

Sound good! Thank you for investigating this! I'll wait.

@hgreving2304
Copy link
Member

@hgreving2304, I think you were the original author who puts the ExternContext in llvm::MachineModuleInfo. Would you mind giving us a bit more context on why it was added originally, and if there is any concerns on removing it with the change of make MCContext not owned by llvm::MachineModuleInfo entirely anymore?

This was done in order to have an external driver owning the MCContext pointer (and the data it points to). I am not at Google anymore so I can't talk for them, but I know that changing this back will create a lot of trouble there. Before this change, a lot of cumbersome copying was needed.

@matinraayai
Copy link
Contributor

@weiweichen @arsenm #105541 addresses the issues raised in the review here.
If it proves challenging to merge due to MCContext management issues, we can merge only the first two items on the list regarding MMI's constructors. Let me know what you think.

@weiweichen
Copy link
Contributor Author

@hgreving2304, I think you were the original author who puts the ExternContext in llvm::MachineModuleInfo. Would you mind giving us a bit more context on why it was added originally, and if there is any concerns on removing it with the change of make MCContext not owned by llvm::MachineModuleInfo entirely anymore?

This was done in order to have an external driver owning the MCContext pointer (and the data it points to). I am not at Google anymore so I can't talk for them, but I know that changing this back will create a lot of trouble there. Before this change, a lot of cumbersome copying was needed.

Thank you very much @hgreving2304 for the quick response and the context about the design!

@weiweichen
Copy link
Contributor Author

@weiweichen @arsenm #105541 addresses the issues raised in the review here. If it proves challenging to merge due to MCContext management issues, we can merge only the first two items on the list regarding MMI's constructors. Let me know what you think.

@matinraayai, the changes in #105541 makes sense to me that it reflects the design change you suggested here as well. You did mention that the PR is WIP. I'm wondering how soon do you expect it, or at least the first 2 parts to get merged? (selfishness, I do really want the move constructor's bug fixed as soon as possible, either with a small patch or changing the way how MMI is used so that the move constructor is irrelevant with your PR 🙏)

@matinraayai
Copy link
Contributor

@weiweichen @arsenm #105541 addresses the issues raised in the review here. If it proves challenging to merge due to MCContext management issues, we can merge only the first two items on the list regarding MMI's constructors. Let me know what you think.

@matinraayai, the changes in #105541 makes sense to me that it reflects the design change you suggested here as well. You did mention that the PR is WIP. I'm wondering how soon do you expect it, or at least the first 2 parts to get merged? (selfishness, I do really want the move constructor's bug fixed as soon as possible, either with a small patch or changing the way how MMI is used so that the move constructor is irrelevant with your PR 🙏)

The only reason I made it a WIP is because I'm not sure if everyone using the API like the changes or not (especially regarding the TargetMachine interface) + I might have forgotten to refactor parts of the LLVM monorepo that uses MMIs, or I may have broken something by accident. Once I get the green light that this change is favored, I will change it to a normal PR. The ETA is dependent on how long the codeowners will take to respond to it.

@arsenm
Copy link
Contributor

arsenm commented Aug 21, 2024

This isn't big enough to call for an RFC

Not trying to be difficult, but I'm curious what is the criteria for API changing with an RFC. Is there a community guideline about this?

I think it would be hard to write a specific rule. This case isn't changing the IR, and is only changing a niche mostly internal API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants