-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AST] Don't merge memory locations in AliasSetTracker #65731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AST] Don't merge memory locations in AliasSetTracker #65731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it makes more sense to retrieve a MemoryAccess
through a MemoryLocation
rather than a pointer.
@llvm/pr-subscribers-llvm-transforms ChangesThis changes the AliasSetTracker to track memory locations instead of pointers in its alias sets. The motivation for this is outlined in an RFC posted on LLVM discourse. This commit does not yet change method names and code comments to refer to "memory location(s)" instead of "pointer(s)", to keep the changeset focused, and to first gain feedback on the idea. An additional commit can take care of this. In the data structures of the AST implementation, I made the choice to replace the linked list of
|
@llvm/pr-subscribers-llvm-analysis ChangesThis changes the AliasSetTracker to track memory locations instead of pointers in its alias sets. The motivation for this is outlined in an RFC posted on LLVM discourse. This commit does not yet change method names and code comments to refer to "memory location(s)" instead of "pointer(s)", to keep the changeset focused, and to first gain feedback on the idea. An additional commit can take care of this. In the data structures of the AST implementation, I made the choice to replace the linked list of
|
Tagging some possible reviewers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left comments. Generally, I think this is the right way to go. We had the custom rolled stuff that was expanded from MemLoc, and used to create MemLoc all the time anyway.
Do we have compile time data? Maybe like this and w/ a SetVector instead of a plain vector?
void addPointer(AliasSetTracker &AST, PointerRec &Entry, LocationSize Size, | ||
const AAMDNodes &AAInfo, bool KnownMustAlias = false, | ||
bool SkipSizeUpdate = false); | ||
bool isMustAliasMergeWith(AliasSet &AS, BatchAAResults &BatchAA) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needs documentation. isMustAlias and MergeWith? I'm confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation was added in commit e1c2fbf. Actually, the auxiliary function was only introduced to exit elegantly from a nested loop. I still removed the function from the class, and made it internal to the AliasSetTracker.cpp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the auxiliary function was only introduced to exit elegantly from a nested loop.
This was perhaps unnecessarily heavy. I still removed the function, and replaced it by an immediately-invoked lambda, so the logic is directly visible in AliasSet::mergeSetIn
.
bool empty() const { return PtrList == nullptr; } | ||
using iterator = std::vector<MemoryLocation>::const_iterator; | ||
iterator begin() const { return MemoryLocs.begin(); } | ||
iterator end() const { return MemoryLocs.end(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it reduce the casts for the user if we had non-const iterators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source of the const-casts is not the const-iterator, it is inside MemoryLocation
.
I think it would be possible to provide a convenience version of getPointers
that integrates the const-cast so the client would not be burdened with this, but the caller would still be responsible to ensure the cast is valid. Is there a precedent of this in LLVM? I'm only aware of methods that intregrate const-casts when we know it is valid (example).
@@ -307,7 +173,7 @@ class AliasSetTracker { | |||
BatchAAResults &AA; | |||
ilist<AliasSet> AliasSets; | |||
|
|||
using PointerMapType = DenseMap<AssertingVH<Value>, AliasSet::PointerRec *>; | |||
using PointerMapType = DenseMap<MemoryLocation, AliasSet *>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A MemoryLocation is not super small, I think 7 or 8 pointer sized objects, roughly. Unsure if this could cause issues at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why this needs to be indexed by MemoryLocation. I'd still expect all MemoryLocations for the same Value to be in the same AliasSet, so it seems like the existing indexing should work fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still expect all MemoryLocations for the same Value to be in the same AliasSet
I was unsure if this always holds. In case of full restrict support, I think the same pointer value is used for the two stores in the following snippet:
int * restrict r = p;
*p = 123;
*r = 456;
But they have different provenance and metadata, and ScopedNoAliasAA will report that they do not alias. But this snippet has undefined behavior (because of course, the accesses do alias).
I guess that BasicAA will quickly return MustAlias for a query with the same pointer Value regardless of LocationSize, metadata, etc.? Then indeed another AA can only say NoAlias in an illegal situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why this needs to be indexed by MemoryLocation [...] it seems like the existing indexing should work fine...
I still tried this, see this branch. The pointer map can be smaller, but this is traded for extra work in AliasSetTracker::getAliasSetFor
, see commit 23443f3. It turns out that two memory locations for the same UndefValue
do end up in different alias sets (see this BasicAA logic) so I have to exclude them from the pointer map, which complicates the logic in AliasSetTracker::getAliasSetFor
quite a bit. On the plus side, a lots of tests can be reverted, see commit 92f01a2, since we can restore the "pointer values" output in the alias set dump, but I don't know if that is much of an argument.
Let me know if you consider the version on the branch an improvement, then I will include it in this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still realized there was a problem with commit 23443f3. When a memory location with an undef pointer value is added to an alias set, that alias set is not referenced from the pointer map entry of that pointer value (because the pointer value may appear in multiple alias sets). But without the reference to the alias set from some access structure, the reference count of the set may reach zero, and the alias set may be dropped. (For similar reason, the reference count of an alias set is increased by 1 when at least one unknown instruction is registered in it.)
I solved this problem (on the aforementioned branch) by keeping a collection with alias sets that contain undef pointer values. This keeps those alias sets referenced, so they are not dropped. It also allows to scan only those alias sets, when we are trying to find if a memory location with an undef pointer value is already registered or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Yes, AA treats undef as noalias, but as far as I know AST specifically wants the undefs to be part of the same set, or at least that is how I interpret the comment at
llvm-project/llvm/lib/Analysis/AliasSetTracker.cpp
Lines 346 to 350 in 5f254eb
// If the size changed, we may need to merge several alias sets. | |
// Note that we can *not* return the result of mergeAliasSetsForPointer | |
// due to a quirk of alias analysis behavior. Since alias(undef, undef) | |
// is NoAlias, mergeAliasSetsForPointer(undef, ...) will not find the | |
// the right set for undef, even if it exists. |
More generally I would expect that all MemLocs that have the same pointer will end up in the same alias set, even if AA reports them as NoAlias (as implied by metadata or undef). After all they are also MustAlias, and treating them as such makes things a lot simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that when using a map from pointer values, it is more logical to simply place MemLocs in an existing set of the pointer value, even if AA reports NoAlias for this set. I've updated the ast-pointer-map branch to this effect, in particular, see:
https://github.com/brunodf-snps/llvm-project/blob/9e9ebcd53aff011cff2b4d7ebdf439855806894b/llvm/lib/Analysis/AliasSetTracker.cpp#L312-L319
But be aware that in a program where AA indicates that two instruction using an undef pointer are not aliasing, putting them in the same alias set is information loss (although it may not be relevant):
%val = load i32, ptr undef
...
store ptr null, ptr undef
So I would disagree with the cited code comment, or at least find it misleading:
Since alias(undef, undef) is NoAlias, mergeAliasSetsForPointer(undef, ...) will not find the the right set for undef, even if it exists.
mergeAliasSetsForPointer
is correctly indicating (with a nullptr
result) that there is no set that the second memory location aliases with, according to AA. The existing set for the first memory location with an undef pointer is only a "right" set insofar that the implementation with a pointer map cannot accurately represent the situation. (But it may be a fine tradeoff to be inaccurate here.)
It seems like you are going out of your way to have undefs in different sets -- why?
Well, I have roughly 2 objectives for AST with this patch: (1) avoid information loss and (2) minimal knowledge about AA or MemoryLocation internals. So I am reluctant to add information loss for an implementation organization (the pointer map). But you seem to favor the pointer map for conceptual reasons too, so then I would propose what is on my ast-pointer-map branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Information loss is unavoidable here, because both NoAlias and MustAlias are correct results for two undef pointers. You're always going to lose one or the other. We can chose which one we pick.
The current AST implementation picks MustAlias, and I don't see a reason why we would want to change that. Creating a new set for every undef pointer doesn't make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that both NoAlias and MustAlias are correct results here and you can choose which one to pick, but my reasoning was that AA already picks NoAlias and so I attempted to represent the same thing in AST (when I say I strive for no information loss, I essentially mean that AST should construct alias sets consistent with AA results -- the motivating example is a case where AA is indicating NoAlias, and current AST is ending up with MayAlias due to implementation organization).
But I am fine with an implementation where all MemoryLocations for the same pointer Value are by construction in the same AliasSet, i.e. where the PointerMap stays. The discussion has shown that the impact is only for undef pointer values, and I expect the difference is of very little practical relevance (as you basically write, creating a new set for every undef pointer does not seem to have added value). So I have merged my ast-pointer-map branch into this PR now, so the combined set of changes that I propose is visible here, and we can move forward with that.
bool AliasSet::isMustAliasMergeWith(AliasSet &AS, BatchAAResults &BatchAA) const { | ||
for (const MemoryLocation &MemLoc : MemoryLocs) | ||
for (const MemoryLocation &ASMemLoc : AS.MemoryLocs) | ||
if (!BatchAA.isMustAlias(MemLoc, ASMemLoc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must-alias is not a transitive relationship? Before we picked a representative and only checked those, why do we need to check all entries now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is discussed in the RFC, using a representative also implies merging of the LocationSize, AAInfo,... of the other accesses into it: https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336#mustalias-sets-saturation-4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -551,7 +472,7 @@ AliasSet &AliasSetTracker::addPointer(MemoryLocation Loc, | |||
AliasSet &AS = getAliasSetFor(Loc); | |||
AS.Access |= E; | |||
|
|||
if (!AliasAnyAS && (TotalMayAliasSetSize > SaturationThreshold)) { | |||
if (!AliasAnyAS && (TotalAliasSetSize > SaturationThreshold)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear if we should increase the threshold now. I guess we can wait and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do no action at the moment.
Thanks @jdoerfert for your comments. There is compile-time data here: https://llvm-compile-time-tracker.com/compare.php?from=fab25949688d7b038ee047d4e8b44a23803db3bd&to=aa7ccadde93f78148026103a03c282e23acbf0b3&stat=instructions:u I will address your other comments. |
The proposal SGTM! I'll follow up with inline comments. |
AST checks aliasing with MustAlias sets by only checking the representative pointer (getSomePointer). This is only correct if the Size and AATags information of that pointer also includes the Size/AATags of all other pointers in the set. When we add a new pointer to the AliasSet, we do perform this update (see the code in AliasSet::addPointer). However, if a pointer already in the MustAlias set is used with a new size, we currently do not update the representative pointer, resulting in miscompilations. Fix this by adding the missing update. This is targeted fix using the current representation. There are a couple of alternatives: * For MustAlias sets, don't store per-pointer Size/AATags at all. This would make it clear that there is only one set of common Size/AATags for all pointers. * Check against all pointers in the set even for MustAlias. This is what llvm#65731 proposes to do as part of a larger change to AST representation. Fixes llvm#64897.
AST checks aliasing with MustAlias sets by only checking the representative pointer (getSomePointer). This is only correct if the Size and AATags information of that pointer also includes the Size/AATags of all other pointers in the set. When we add a new pointer to the AliasSet, we do perform this update (see the code in AliasSet::addPointer). However, if a pointer already in the MustAlias set is used with a new size, we currently do not update the representative pointer, resulting in miscompilations. Fix this by adding the missing update. This is a targeted fix using the current representation. There are a couple of alternatives: * For MustAlias sets, don't store per-pointer Size/AATags at all. This would make it clear that there is only one set of common Size/AATags for all pointers. * Check against all pointers in the set even for MustAlias. This is what #65731 proposes to do as part of a larger change to AST representation. Fixes #64897.
54371b9
to
7caf0bc
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
93d4b9e
to
9304962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just some comments on the test, before it can be pre-committed.)
54a97d1
to
2af6cb3
Compare
2af6cb3
to
56a6a53
Compare
This allows to scan only these sets when we want to check if a memory locations was already registered, but importantly, it also keeps these alias sets referenced.
Apart from updating the method names and comments (which is the reason why I still keep the PR marked as draft), I think I have now addressed all of the comments up to this point. |
@@ -307,7 +173,7 @@ class AliasSetTracker { | |||
BatchAAResults &AA; | |||
ilist<AliasSet> AliasSets; | |||
|
|||
using PointerMapType = DenseMap<AssertingVH<Value>, AliasSet::PointerRec *>; | |||
using PointerMapType = DenseMap<MemoryLocation, AliasSet *>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Yes, AA treats undef as noalias, but as far as I know AST specifically wants the undefs to be part of the same set, or at least that is how I interpret the comment at
llvm-project/llvm/lib/Analysis/AliasSetTracker.cpp
Lines 346 to 350 in 5f254eb
// If the size changed, we may need to merge several alias sets. | |
// Note that we can *not* return the result of mergeAliasSetsForPointer | |
// due to a quirk of alias analysis behavior. Since alias(undef, undef) | |
// is NoAlias, mergeAliasSetsForPointer(undef, ...) will not find the | |
// the right set for undef, even if it exists. |
More generally I would expect that all MemLocs that have the same pointer will end up in the same alias set, even if AA reports them as NoAlias (as implied by metadata or undef). After all they are also MustAlias, and treating them as such makes things a lot simpler.
Simply place them in the earlier alias set that was known for them, although this entails information loss.
// (This is only known to occur for undef pointer values, which AA treats as | ||
// noalias.) | ||
AS = MapEntry->getForwardedTarget(*this); | ||
AS->Alias = AliasSet::SetMayAlias; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to use SetMayAlias here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, AA has indicated that the new memory location is no-alias with any set, including the set from the map entry. If that set was a must-alias set, it would seem that we have to downgrade to may-alias when adding the new memory location there to be consistent with the no-alias result?
I suppose your reasoning is that any two memory locations with the same pointer value are always must-alias (even if AA indicates they are also no-alias), so we know that the new memory location is also must-alias with the earlier memory location in the set for which map entry was added, and by transitivity with all memory locations in the set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I doubt this really matters in practice, but I think keeping the MustAlias here should be safe (and very marginally simpler...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But then I think it is better to fully build in that the we assume a must-alias result for the existing alias set with a memory location using the same pointer value, and use this in mergeAliasSetsForPointer
. I've pushed an updated version, see new commits b7fbb33 and ad4fa84. (This is maybe what you were after with your earlier remark about not calling mergeAliasSetsForPointer
when there is a MapEntry: while I think we must call the method and do merging, we can save some alias queries there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, your updated version looks very clean!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good toe me.
Integrate this private method with updating the map entry in unified procedure.
Thanks! I will still do a pass over the method names and comments to use "memory location" terminology. @jdoerfert I was hoping to have this landed before the llvm-18 branch is cut (Jan 23, plus account for some days margin). Do you still have remarks? |
Change "pointer(s)" to "memory location(s)" or "element(s)" as appropriate.
I have updated the comments and method names to update the terminology, and I have cleared the draft label now. To me, this is ready to be merged, I don't think there is anything I can still address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for committing. main has a new test since my branch point llvm/test/Analysis/BasicAA/separate_storage-alias-sets.ll |
See fix in commit brunodf-snps@f90637d on branch https://github.com/brunodf-snps/llvm-project/commits/ast-fix-new-test/ |
@brunodf-snps Thanks, committed in 509f634. |
This changes the AliasSetTracker to track memory locations instead of pointers in its alias sets. The motivation for this is outlined in an RFC posted on LLVM discourse: https://discourse.llvm.org/t/rfc-dont-merge-memory-locations-in-aliassettracker/73336 In the data structures of the AST implementation, I made the choice to replace the linked list of `PointerRec` entries (that had to go anyway) with a simple flat vector of `MemoryLocation` objects, but for the `AliasSet` objects referenced from a lookup table, I retained the mechanism of a linked list, reference counting, forwarding, etc. The data structures could be revised in a follow-up change.
This changes the AliasSetTracker to track memory locations instead of pointers in its alias sets. The motivation for this is outlined in an RFC posted on LLVM discourse.
This commit does not yet change method names and code comments to refer to "memory location(s)" instead of "pointer(s)", to keep the changeset focused, and to first gain feedback on the idea. An additional commit can take care of this.
In the data structures of the AST implementation, I made the choice to replace the linked list of
PointerRec
entries (that had to go anyway) with a simple flat vector ofMemoryLocation
objects, but for theAliasSet
objects referenced from a lookup table, I retained the mechanism of a linked list, reference counting, forwarding, etc. The data structures could be revised in a follow-up change.