Skip to content

[SYCL][L0] Fix the Pi Platform Cache memory leak #2881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -811,11 +811,11 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,
// runtime from a global destructor, and such a call could eventually
// access these variables. Therefore, there is no safe time when
// "PiPlatformsCache" and "PiPlatformsCacheMutex" could be deleted.
static auto PiPlatformsCache = new std::vector<pi_platform>;
static auto PiPlatformsCacheMutex = new std::mutex;
static std::vector<pi_platform> PiPlatformsCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the comment above why we used "new". To avoid the leak we need to implement proper tear-down process, similar to how SYCL RT does it now: #2516

Copy link
Contributor Author

@bso-intel bso-intel Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I misunderstood you.
This PR is only fixing the memory leak for the PiPlatformCache and PiPlatformCacheMutex objects.
PiPlatformCache is a vector of pi_platform pointers which are allocated on the heap.
This PR is not intended to fix the leak of pi_platforms that are allocated somewhere else.
If there is an API called piPlatformRelease, I can easily implement a deallocation.
But, there is no such PI API, so SYCL RT never informs to plugins that platform_impl objects are deallocated so the underlying pi_platform can be deallocated, too.
Anyway, I am still working on the memory leak of pi_platform objects, which is not intended to fix in this PR.
This PR is intended only to fix the memory leak of vector itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read lines 809-813 to see why it is not OK to use static globals for the cache (non-trivial data object): the order of static objects destruction is undefined.

Copy link
Contributor Author

@bso-intel bso-intel Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. then I did not misunderstand you.
Let me explain what is going on with this static variable.
First of all PiPlatformsCache is already a "static" variable.
static auto PiPlatformsCache = new std::vector<pi_platform>;
Secondly, this is not a global variable. It is a local static variable. It seems like misunderstanding.
This "static" pointer variable points to the vector that is allocated on the heap.
As you may know, the heap allocated variable is not automatically deallocated unless we call "delete" for this pointer variable, right?
My PR changes this static pointer variable to a static local variable.
The static local variable persists in the static memory region (which is not on stack or heap) even after the function returns, but it will be automatically deallocated when the whole program process terminates.
This way the static local variable is deallocated when the program terminates.
As I said, this PR is not intended to deallocate the memory allocated for the pi_platform vector itself.
Hope this makes sense.

Copy link
Contributor Author

@bso-intel bso-intel Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me help you understand how my PR fixes the leak in PiPlatformsCache with an example.
Here is a simple code that shows the memory leaks for the local static variable.

#include <vector>

int main() {
  int *a = new int;    // leak
  int *b = new int;    // leak
  int *c = new int;    // leak
  static auto PiPlatformsCache = new std::vector<int*>;   // leak
  PiPlatformsCache->push_back(a);
  PiPlatformsCache->push_back(b);
  PiPlatformsCache->push_back(c);
  return 0;
}

This code allocates 3 integers, a, b, c on the heap, and then create a new vector, vec, on the heap.
Since there was no “delete” for these heap variables, all those heap memory are leaked.

==26717== HEAP SUMMARY:
==26717==     in use at exit: 68 bytes in 5 blocks
==26717==   total heap usage: 8 allocs, 3 frees, 72,796 bytes allocated
==26717== LEAK SUMMARY:
==26717==    definitely lost: 0 bytes in 0 blocks
==26717==    indirectly lost: 0 bytes in 0 blocks
==26717==      possibly lost: 0 bytes in 0 blocks
==26717==    still reachable: 68 bytes in 5 blocks
==26717==         suppressed: 0 bytes in 0 blocks

Now I changed the variable ‘PiPlatformsCache’ to a local static vector.

#include <vector>

int main() {
  int *a = new int;    // leak
  int *b = new int;    // leak
  int *c = new int;     // leak
  static std::vector<int*> PiPlatformsCache;    // NO leak
  PiPlatformsCache.push_back(a);
  PiPlatformsCache.push_back(b);
  PiPlatformsCache.push_back(c);
  return 0;
}

Now, valgrind only detects memory leak for a, b, c.

==26709== HEAP SUMMARY:
==26709==     in use at exit: 12 bytes in 3 blocks
==26709==   total heap usage: 7 allocs, 4 frees, 72,772 bytes allocated
==26709==
==26709== 4 bytes in 1 blocks are definitely lost in loss record 1 of 3
==26709==    at 0x4C305B3: operator new(unsigned long) (vg_replace_malloc.c:342)
==26709==    by 0x10923A: main (simple2.cpp:4)
==26709==
==26709== 4 bytes in 1 blocks are definitely lost in loss record 2 of 3
==26709==    at 0x4C305B3: operator new(unsigned long) (vg_replace_malloc.c:342)
==26709==    by 0x109248: main (simple2.cpp:5)
==26709==
==26709== 4 bytes in 1 blocks are definitely lost in loss record 3 of 3
==26709==    at 0x4C305B3: operator new(unsigned long) (vg_replace_malloc.c:342)
==26709==    by 0x109256: main (simple2.cpp:6)
==26709==
==26709== LEAK SUMMARY:
==26709==    definitely lost: 12 bytes in 3 blocks
==26709==    indirectly lost: 0 bytes in 0 blocks
==26709==      possibly lost: 0 bytes in 0 blocks
==26709==    still reachable: 0 bytes in 0 blocks
==26709==         suppressed: 0 bytes in 0 blocks

Does it makes sense to you now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static local variable persists in the static memory region (which is not on stack or heap) even after the function returns, but it will be automatically deallocated when the whole program process terminates.
This way the static local variable is deallocated when the program terminates.

That's the problem. The order of destruction of such static object (at program termination) is undefined. There might be other globals that depend on this one, but are deallocated later than this one. Allocating with new "resolves" this ordering problem by introducing an informed leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem. The order of destruction of such static object (at program termination) is undefined.

Could you tell me where you got this info from?
As far as I know, the static variables have a well-defined construction and destruction order.
The destruction order of static variables is the reverse of the construction order of them.
And, all global variables are constructed before main() starts, whereas static variables are constructed only when the specific scope is reached (i.e., the defining function is called).
I know "global" objects have undefined destruction order, but static variables are not global variables.

static std::mutex PiPlatformsCacheMutex;
static bool PiPlatformCachePopulated = false;

std::lock_guard<std::mutex> Lock(*PiPlatformsCacheMutex);
std::lock_guard<std::mutex> Lock(PiPlatformsCacheMutex);
if (!PiPlatformCachePopulated) {
const char *CommandListCacheSize =
std::getenv("SYCL_PI_LEVEL_ZERO_MAX_COMMAND_LIST_CACHE");
Expand Down Expand Up @@ -867,7 +867,7 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,

Platform->ZeMaxCommandListCache = CommandListCacheSizeValue;
// Save a copy in the cache for future uses.
PiPlatformsCache->push_back(Platform);
PiPlatformsCache.push_back(Platform);
PiPlatformCachePopulated = true;
}
} catch (const std::bad_alloc &) {
Expand All @@ -879,7 +879,7 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,

if (Platforms && NumEntries > 0) {
uint32_t I = 0;
for (const pi_platform &CachedPlatform : *PiPlatformsCache) {
for (const pi_platform &CachedPlatform : PiPlatformsCache) {
if (I < NumEntries) {
*Platforms++ = CachedPlatform;
I++;
Expand All @@ -890,7 +890,7 @@ pi_result piPlatformsGet(pi_uint32 NumEntries, pi_platform *Platforms,
}

if (NumPlatforms)
*NumPlatforms = PiPlatformsCache->size();
*NumPlatforms = PiPlatformsCache.size();

return PI_SUCCESS;
}
Expand Down