-
Notifications
You must be signed in to change notification settings - Fork 35
Fix data race in the umfIpcOpenedCacheDestroy function #1111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1f0b3e9
to
bd2eb9f
Compare
#ifndef NDEBUG | ||
check_if_tracker_is_empty(p->hTracker, p->pool); | ||
#endif /* NDEBUG */ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldorau I was required to delete this check because TSAN and Valgrind reported a lot of data races when several pools are destroyed concurrently. It looks like we never tested concurrent pools destroy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I insist on keeping it under additional ifdef not defined in our CI (maybe UMF_DEBUG_MODE
?) - it can be very useful during debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can't we just use a lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we need to introduce a lock in other places where trackers are used. I am not sure that it is a good idea to introduce lock for the debugging purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not, if this is the only place?
these checks could help us catch bugs very early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code wrong, or this is only false positive?
If it is false positive, than this is an issue that when we took critnib from PMDK we did not imported implementation of the "VALGRIND_HG_DRD_DISABLE_CHECKING" macros, which are used in critnib to fix false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code wrong, or this is only false positive? If it is false positive, than this is an issue that when we took critnib from PMDK we did not imported implementation of the "VALGRIND_HG_DRD_DISABLE_CHECKING" macros, which are used in critnib to fix false positives.
I am not 100% sure but I think it is a real issue rather than false-postive. Btw the same flow with critnib is used by the disjoint pool, see these issues for reference: #1114, #1115.
The test do the following:
- Thread 1 destroys Pool A and as a result, the
check_if_tracker_is_empty
is called and it iterates over the records in the memory tracker (whichcritnib
map). - Thread 2 destroys Pool B and the pool deallocates memory blocks (that were cached) and these blocks are removed from the tracker.
- As a result Thread 2 deallocates from the tracker only records that corresponds to Pool B. But Thread 1 touches all entries in the tracker. And it is the problem because thread 1 might read the entry which is in a process of deletion by thread 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not, if this is the only place?
these checks could help us catch bugs very early
@bratpiorka I do not want to introduce locks in other places just to support this debugging functionality. And I do not think that this debugging check is really needed because we are checking if the tracker is empty when the tracker itself is destructed (when libumf is unloaded). And this removed check is called when the pool is destroyed and checks that the tracker does not contain records from the corresponding pool. But without checking at the pool destruction stage we will be able to catch the same issue when the tracker itself is destructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinser52 ok if we check the tracker at libumf unload then we could remove it from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinser52 ok if we check the tracker at libumf unload then we could remove it from here
Done
d7ca29e
to
4ddf260
Compare
constexpr size_t NUM_ALLOCS = 100; | ||
constexpr size_t NUM_POOLS = 10; | ||
void *ptrs[NUM_ALLOCS]; | ||
void *openedPtrs[NUM_POOLS][NUM_ALLOCS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You use camel and snake case when naming variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, now variables in the test use camel case and constants use capital letters.
3ec80d0
to
826b496
Compare
826b496
to
eb640c6
Compare
eb640c6
to
b860ee1
Compare
Description
This PR fixes data race (found by Coverity) in the umfIpcOpenedCacheDestroy function.
Checklist