Fix data race in the umfIpcOpenedCacheDestroy function #1111

vinser52 · 2025-02-19T14:26:53Z

Description

This PR fixes data race (found by Coverity) in the umfIpcOpenedCacheDestroy function.

Checklist

Code compiles without errors locally
All tests pass locally
CI workflows execute properly
New tests added, especially if they will fail without my changes

vinser52 · 2025-02-19T23:37:19Z

src/provider/provider_tracking.c

-#ifndef NDEBUG
-    check_if_tracker_is_empty(p->hTracker, p->pool);
-#endif /* NDEBUG */
-


@ldorau I was required to delete this check because TSAN and Valgrind reported a lot of data races when several pools are destroyed concurrently. It looks like we never tested concurrent pools destroy.

I insist on keeping it under additional ifdef not defined in our CI (maybe UMF_DEBUG_MODE?) - it can be very useful during debugging.

but can't we just use a lock here?

This means that we need to introduce a lock in other places where trackers are used. I am not sure that it is a good idea to introduce lock for the debugging purposes

why not, if this is the only place?
these checks could help us catch bugs very early

Is this code wrong, or this is only false positive?
If it is false positive, than this is an issue that when we took critnib from PMDK we did not imported implementation of the "VALGRIND_HG_DRD_DISABLE_CHECKING" macros, which are used in critnib to fix false positives.

Is this code wrong, or this is only false positive? If it is false positive, than this is an issue that when we took critnib from PMDK we did not imported implementation of the "VALGRIND_HG_DRD_DISABLE_CHECKING" macros, which are used in critnib to fix false positives.

I am not 100% sure but I think it is a real issue rather than false-postive. Btw the same flow with critnib is used by the disjoint pool, see these issues for reference: #1114, #1115.

The test do the following:

Thread 1 destroys Pool A and as a result, the check_if_tracker_is_empty is called and it iterates over the records in the memory tracker (which critnib map).

Thread 2 destroys Pool B and the pool deallocates memory blocks (that were cached) and these blocks are removed from the tracker.

As a result Thread 2 deallocates from the tracker only records that corresponds to Pool B. But Thread 1 touches all entries in the tracker. And it is the problem because thread 1 might read the entry which is in a process of deletion by thread 2

why not, if this is the only place?
these checks could help us catch bugs very early

@bratpiorka I do not want to introduce locks in other places just to support this debugging functionality. And I do not think that this debugging check is really needed because we are checking if the tracker is empty when the tracker itself is destructed (when libumf is unloaded). And this removed check is called when the pool is destroyed and checks that the tracker does not contain records from the corresponding pool. But without checking at the pool destruction stage we will be able to catch the same issue when the tracker itself is destructed.

@vinser52 ok if we check the tracker at libumf unload then we could remove it from here

@vinser52 ok if we check the tracker at libumf unload then we could remove it from here

Done

test/supp/drd-umf_test-provider_devdax_memory_ipc.supp

kswiecicki · 2025-02-20T12:56:42Z

test/ipcFixtures.hpp

+    constexpr size_t NUM_ALLOCS = 100;
+    constexpr size_t NUM_POOLS = 10;
+    void *ptrs[NUM_ALLOCS];
+    void *openedPtrs[NUM_POOLS][NUM_ALLOCS];


nit: You use camel and snake case when naming variables.

Done, now variables in the test use camel case and constants use capital letters.

vinser52 requested a review from a team as a code owner February 19, 2025 14:26

vinser52 requested review from ldorau and bratpiorka February 19, 2025 14:27

vinser52 mentioned this pull request Feb 19, 2025

coverity fixes #1106

Merged

vinser52 force-pushed the svinogra_tests branch from 1f0b3e9 to bd2eb9f Compare February 19, 2025 14:50

lplewa approved these changes Feb 19, 2025

View reviewed changes

vinser52 commented Feb 19, 2025

View reviewed changes

vinser52 force-pushed the svinogra_tests branch 2 times, most recently from d7ca29e to 4ddf260 Compare February 20, 2025 10:28

bratpiorka reviewed Feb 20, 2025

View reviewed changes

test/supp/drd-umf_test-provider_devdax_memory_ipc.supp Show resolved Hide resolved

ldorau approved these changes Feb 20, 2025

View reviewed changes

kswiecicki approved these changes Feb 20, 2025

View reviewed changes

vinser52 force-pushed the svinogra_tests branch from 3ec80d0 to 826b496 Compare February 20, 2025 14:22

vinser52 added 2 commits February 20, 2025 15:24

Fix data race in the umfIpcOpenedCacheDestroy function

1380620

Suppress Valgrind errors in jemalloc and tbbmalloc

c925acb

vinser52 force-pushed the svinogra_tests branch from 826b496 to eb640c6 Compare February 20, 2025 14:24

vinser52 requested review from kswiecicki and bratpiorka February 20, 2025 14:32

Remove check_if_tracker_is_empty from trackingFinalize

b860ee1

vinser52 force-pushed the svinogra_tests branch from eb640c6 to b860ee1 Compare February 20, 2025 16:30

bratpiorka approved these changes Feb 20, 2025

View reviewed changes

lukaszstolarczuk merged commit a9ff7a8 into oneapi-src:main Feb 20, 2025
79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix data race in the umfIpcOpenedCacheDestroy function #1111

Fix data race in the umfIpcOpenedCacheDestroy function #1111

Uh oh!

vinser52 commented Feb 19, 2025 •

edited

Loading

Uh oh!

vinser52 Feb 19, 2025

Uh oh!

ldorau Feb 20, 2025

Uh oh!

bratpiorka Feb 20, 2025

Uh oh!

vinser52 Feb 20, 2025

Uh oh!

bratpiorka Feb 20, 2025

Uh oh!

lplewa Feb 20, 2025

Uh oh!

vinser52 Feb 20, 2025

Uh oh!

vinser52 Feb 20, 2025

Uh oh!

bratpiorka Feb 20, 2025

Uh oh!

vinser52 Feb 20, 2025

Uh oh!

Uh oh!

kswiecicki Feb 20, 2025

Uh oh!

vinser52 Feb 20, 2025

Uh oh!

Uh oh!

Uh oh!

Fix data race in the umfIpcOpenedCacheDestroy function #1111

Fix data race in the umfIpcOpenedCacheDestroy function #1111

Uh oh!

Conversation

vinser52 commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vinser52 commented Feb 19, 2025 •

edited

Loading