-
Notifications
You must be signed in to change notification settings - Fork 35
Fix data race in the umfIpcOpenedCacheDestroy function #1111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -593,4 +593,73 @@ TEST_P(umfIpcTest, ConcurrentOpenCloseHandles) { | |
EXPECT_EQ(stat.openCount, stat.closeCount); | ||
} | ||
|
||
TEST_P(umfIpcTest, ConcurrentDestroyIpcHandlers) { | ||
constexpr size_t SIZE = 100; | ||
constexpr size_t NUM_ALLOCS = 100; | ||
constexpr size_t NUM_POOLS = 10; | ||
void *ptrs[NUM_ALLOCS]; | ||
void *openedPtrs[NUM_POOLS][NUM_ALLOCS]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: You use camel and snake case when naming variables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, now variables in the test use camel case and constants use capital letters. |
||
std::vector<umf::pool_unique_handle_t> consumerPools; | ||
umf::pool_unique_handle_t producerPool = makePool(); | ||
ASSERT_NE(producerPool.get(), nullptr); | ||
|
||
for (size_t i = 0; i < NUM_POOLS; ++i) { | ||
consumerPools.push_back(makePool()); | ||
} | ||
|
||
for (size_t i = 0; i < NUM_ALLOCS; ++i) { | ||
void *ptr = umfPoolMalloc(producerPool.get(), SIZE); | ||
ASSERT_NE(ptr, nullptr); | ||
ptrs[i] = ptr; | ||
} | ||
|
||
for (size_t i = 0; i < NUM_ALLOCS; ++i) { | ||
umf_ipc_handle_t ipcHandle = nullptr; | ||
size_t handleSize = 0; | ||
umf_result_t ret = umfGetIPCHandle(ptrs[i], &ipcHandle, &handleSize); | ||
ASSERT_EQ(ret, UMF_RESULT_SUCCESS); | ||
|
||
for (size_t poolId = 0; poolId < NUM_POOLS; poolId++) { | ||
void *ptr = nullptr; | ||
umf_ipc_handler_handle_t ipcHandler = nullptr; | ||
ret = | ||
umfPoolGetIPCHandler(consumerPools[poolId].get(), &ipcHandler); | ||
ASSERT_EQ(ret, UMF_RESULT_SUCCESS); | ||
ASSERT_NE(ipcHandler, nullptr); | ||
|
||
ret = umfOpenIPCHandle(ipcHandler, ipcHandle, &ptr); | ||
ASSERT_EQ(ret, UMF_RESULT_SUCCESS); | ||
openedPtrs[poolId][i] = ptr; | ||
} | ||
|
||
ret = umfPutIPCHandle(ipcHandle); | ||
ASSERT_EQ(ret, UMF_RESULT_SUCCESS); | ||
} | ||
|
||
for (size_t poolId = 0; poolId < NUM_POOLS; poolId++) { | ||
for (size_t i = 0; i < NUM_ALLOCS; ++i) { | ||
umf_result_t ret = umfCloseIPCHandle(openedPtrs[poolId][i]); | ||
EXPECT_EQ(ret, UMF_RESULT_SUCCESS); | ||
} | ||
} | ||
|
||
for (size_t i = 0; i < NUM_ALLOCS; ++i) { | ||
umf_result_t ret = umfFree(ptrs[i]); | ||
EXPECT_EQ(ret, UMF_RESULT_SUCCESS); | ||
} | ||
|
||
// Destroy pools in parallel to cause IPC cache cleanup in parallel. | ||
umf_test::syncthreads_barrier syncthreads(NUM_POOLS); | ||
auto poolDestroyFn = [&consumerPools, &syncthreads](size_t tid) { | ||
syncthreads(); | ||
consumerPools[tid].reset(nullptr); | ||
}; | ||
umf_test::parallel_exec(NUM_POOLS, poolDestroyFn); | ||
|
||
producerPool.reset(nullptr); | ||
|
||
EXPECT_EQ(stat.putCount, stat.getCount); | ||
EXPECT_EQ(stat.openCount, stat.closeCount); | ||
} | ||
|
||
#endif /* UMF_TEST_IPC_FIXTURES_HPP */ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldorau I was required to delete this check because TSAN and Valgrind reported a lot of data races when several pools are destroyed concurrently. It looks like we never tested concurrent pools destroy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I insist on keeping it under additional ifdef not defined in our CI (maybe
UMF_DEBUG_MODE
?) - it can be very useful during debugging.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can't we just use a lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we need to introduce a lock in other places where trackers are used. I am not sure that it is a good idea to introduce lock for the debugging purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not, if this is the only place?
these checks could help us catch bugs very early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code wrong, or this is only false positive?
If it is false positive, than this is an issue that when we took critnib from PMDK we did not imported implementation of the "VALGRIND_HG_DRD_DISABLE_CHECKING" macros, which are used in critnib to fix false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure but I think it is a real issue rather than false-postive. Btw the same flow with critnib is used by the disjoint pool, see these issues for reference: #1114, #1115.
The test do the following:
check_if_tracker_is_empty
is called and it iterates over the records in the memory tracker (whichcritnib
map).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bratpiorka I do not want to introduce locks in other places just to support this debugging functionality. And I do not think that this debugging check is really needed because we are checking if the tracker is empty when the tracker itself is destructed (when libumf is unloaded). And this removed check is called when the pool is destroyed and checks that the tracker does not contain records from the corresponding pool. But without checking at the pool destruction stage we will be able to catch the same issue when the tracker itself is destructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinser52 ok if we check the tracker at libumf unload then we could remove it from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done