Skip to content

[SYCL] Add environment variable to disable in-memory program caching #11751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 17, 2023

Conversation

jzc
Copy link
Contributor

@jzc jzc commented Nov 1, 2023

This PR adds an environment variable SYCL_CACHE_IN_MEM to control the in-memory caching of programs. Currently, every program/kernel is saved in the global KernelProgramCache, which means that every program/kernel will not be released until end of the program when the destructor of KernelProgramCache is ran. By enabling this environment variables, caching is not performed and the resources program and kernels use are freed after use.

@jzc jzc requested a review from a team as a code owner November 1, 2023 21:07
@jzc jzc requested a review from cperkinsintel November 1, 2023 21:07
@cperkinsintel
Copy link
Contributor

If a new environment variable is added, the EnvironmentVariables.md file should be updated to document it.

But, also, when I look at llvm/sycl/doc/EnvironmentVariables.md I see there are already several environment variables for controlling cacheing there.
SYCL_CACHE_DIR, SYCL_CACHE_PERSISTENT, SYCL_CACHE_EVICTION_DISABLE, SYCL_CACHE_MAX_SIZE , SYCL_CACHE_THRESHOLD, SYCL_CACHE_MIN_DEVICE_IMAGE_SIZE, SYCL_CACHE_MAX_DEVICE_IMAGE_SIZE

Are we sure we need another? How is this new environment variable different than neutering the SYCL_CACHE_DIR or disabling via SYCL_CACHE_PERSISTENT ?

@jzc
Copy link
Contributor Author

jzc commented Nov 7, 2023

@cperkinsintel Those environment variables control the persistent caching i.e. the storing of compiled code on disk, while this new environment variable intends to give an option to disable the in-memory cache. Even if persistent caching is disabled, programs still go through the in-memory cache.

@jzc jzc temporarily deployed to WindowsCILock November 7, 2023 20:50 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock November 7, 2023 21:32 — with GitHub Actions Inactive
@maarquitos14
Copy link
Contributor

@cperkinsintel Those environment variables control the persistent caching i.e. the storing of compiled code on disk, while this new environment variable intends to give an option to disable the in-memory cache. Even if persistent caching is disabled, programs still go through the in-memory cache.

Should we make it explicit in the name that this refers to in-memory, non-persistent, caching? E.g. SYCL_IN_MEM_CACHE_DISABLE?

@jzc jzc requested a review from a team as a code owner November 14, 2023 18:56
@jzc jzc temporarily deployed to WindowsCILock November 14, 2023 19:02 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock November 14, 2023 20:25 — with GitHub Actions Inactive
@jzc jzc temporarily deployed to WindowsCILock November 14, 2023 21:17 — with GitHub Actions Inactive
Copy link
Contributor

@dm-vodopyanov dm-vodopyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM

// nullptr for the mutex.
auto [Kernel, ArgMask] = BuildF();
return make_tuple(Kernel, nullptr, ArgMask, Program);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this logic be moved to getOrBuild to avoid having to do it before each call to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I think it'd probably be better suited for another PR. There are really two types of uses of getOrBuild, one with kernels and one with programs. The returns types differ between the two case, so it'd be a little awkward to insert this special logic for the program case in there. Alternatively, one could make a getOrBuildProgram and getOrBuildKernel program and then put that special logic in the getOrBuildKernel function, but there would still be a oddity with the return type: when caching, we wrap the value in a BuildResult and return a pointer to that (owned by the cache), but when are not caching, we bypass the BuildResult object, and would only want to return the value. This can still be resolved by further modifying these getOrBuild functions to return only the values we extract from the BuildResult anyways, but I believe this'll most likely create a lot of changes unrelated to the original goal of PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I remember of the code, it is due for a bit of an overhaul anyway. 👍

Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@steffenlarsen steffenlarsen merged commit 9322d14 into intel:sycl Nov 17, 2023
AlexeySachkov pushed a commit that referenced this pull request Nov 30, 2023
In #11751, ref counting of kernels objects was changed to be more
accurate in order to allow for in-memory caching to be disabled. When
getting a kernel form the cache, the ref count the kernel handle is now
incremented (when caching is enabled). Thus, a method like
`ProgramManager::getOrCreateKernel` will increment the ref count of the
kernel it gets. However, in `enqueueImpKernel`, when enqueuing a kernel
with a kernel bundle,`ProgramManager::getOrCreateKernel` is called
twice, first indirectly by:


https://github.com/intel/llvm/blob/c43a90f28eebfcdf1bc1d55430485e2834790a60/sycl/source/detail/scheduler/commands.cpp#L2527-L2528

and second directly by:


https://github.com/intel/llvm/blob/c43a90f28eebfcdf1bc1d55430485e2834790a60/sycl/source/detail/scheduler/commands.cpp#L2538-L2548

This means that the ref count of the acquired kernel is incremented
twice, yet the rest of the function will only free once, which leads to
a leak of the kernel. As the second comment and asserts say, the only
need for the second call to `getOrCreateKernel` is to fetch the mutex
associated to the cached kernel retrieved from the first call, so this
PR adjusts `get_kernel` to save this mutex and forgo this extra
`getOrCreateKernel` call and unintentional additional ref count.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants