Skip to content

[SYCL] Optimize kernel name based cache lookup #18081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
May 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
8dde49e
[SYCL] Optimize kernel name based cache lookup
sergey-semenov Apr 17, 2025
e837478
Apply clang-format
sergey-semenov Apr 17, 2025
0c9f44e
Adjust comment
sergey-semenov Apr 17, 2025
6c2e229
Add newline
sergey-semenov Apr 17, 2025
c8d69fb
Typo fix
sergey-semenov Apr 17, 2025
2e834ca
Simplify stored pointer
sergey-semenov Apr 17, 2025
00d5461
Fix test
sergey-semenov Apr 17, 2025
6b19383
Invalidate moved-from wrappers
sergey-semenov Apr 17, 2025
a07593b
Apply clang-format
sergey-semenov Apr 17, 2025
897521c
Misc stylistic changes
sergey-semenov Apr 17, 2025
8ec295a
One more wrapper move fix
sergey-semenov Apr 17, 2025
4b60007
Clang-format
sergey-semenov Apr 17, 2025
a18c986
Update Windows ABI dump
sergey-semenov Apr 17, 2025
0a15e0d
Fix issue with multiple contexts
vinser52 Apr 23, 2025
c32d910
Fix KernelProgramCache::reset() to properly cleanup cached kernels
vinser52 Apr 25, 2025
cac94ab
Properly cleanup fast kernel cache when KernelProgramCache is destructed
vinser52 Apr 25, 2025
994c9f5
Merge branch 'sycl' into kernelnamebasedcacheopt
vinser52 Apr 28, 2025
67e926c
Fix issue in the FastKernelCacheWrapper destructor
vinser52 Apr 28, 2025
8cfaf2c
Fix multiple_source test
sergey-semenov Apr 30, 2025
0eff24c
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov Apr 30, 2025
07bea96
Adjust expected kernel name in the test
sergey-semenov Apr 30, 2025
f85591e
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 6, 2025
7eaa526
Add a struct for the kernel name based cache
sergey-semenov May 6, 2025
a311cb2
Apply comment
sergey-semenov May 6, 2025
9ad056b
Fix warning
sergey-semenov May 6, 2025
2e53abe
Update Windows ABI symbols
sergey-semenov May 6, 2025
1cd087a
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 6, 2025
1310fda
Apply clang-format
sergey-semenov May 6, 2025
650dd6c
Add newline at EOF
sergey-semenov May 6, 2025
4a68d29
Manage the struct lifetime in global handler
sergey-semenov May 7, 2025
0890de6
Store maps instead of pointers
sergey-semenov May 7, 2025
f783c0e
Update Linux ABI dump
sergey-semenov May 7, 2025
65fff8c
Update Windows ABI dump
sergey-semenov May 7, 2025
bf8a365
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 7, 2025
f34185d
Clean up leftover code
sergey-semenov May 8, 2025
a0a7394
Remove a TODO comment
sergey-semenov May 8, 2025
b0b8cfe
Drop an unneeded include
sergey-semenov May 8, 2025
1016a4b
Adjust comment
sergey-semenov May 8, 2025
de5d9b3
Remove unneeded include
sergey-semenov May 8, 2025
8f51de7
Add a mutex to each subcache
sergey-semenov May 8, 2025
812abf3
Use a struct instead of pair to improve readability
sergey-semenov May 8, 2025
8e325d8
Drop unneeded header
sergey-semenov May 8, 2025
5e5be16
Align fast kernel cache naming
sergey-semenov May 9, 2025
9d8fca1
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 9, 2025
4c480b6
Fix subcache eviction + minor renaming
sergey-semenov May 9, 2025
446e0de
Apply clang-format
sergey-semenov May 9, 2025
48f4959
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 14, 2025
7eb9327
Apply comment
sergey-semenov May 14, 2025
6198318
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 15, 2025
bf1911e
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 15, 2025
0805355
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 16, 2025
8bbffdc
Merge branch 'sycl' into kernelnamebasedcacheopt
sergey-semenov May 19, 2025
777639f
Apply clang-format
sergey-semenov May 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions sycl/include/sycl/detail/kernel_name_based_cache.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
//==--------------------- kernel_name_based_cache.hpp ----------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#pragma once

#include <sycl/detail/export.hpp>

namespace sycl {
inline namespace _V1 {
namespace detail {

struct KernelNameBasedCacheT;
__SYCL_EXPORT KernelNameBasedCacheT *createKernelNameBasedCache();

// Retrieves a cache pointer unique to a kernel name type that can be used to
// avoid kernel name based lookup in the runtime.
template <typename KernelName>
KernelNameBasedCacheT *getKernelNameBasedCache() {
static KernelNameBasedCacheT *Instance = createKernelNameBasedCache();
return Instance;
}

} // namespace detail
} // namespace _V1
} // namespace sycl
5 changes: 5 additions & 0 deletions sycl/include/sycl/handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include <sycl/detail/id_queries_fit_in_int.hpp>
#include <sycl/detail/impl_utils.hpp>
#include <sycl/detail/kernel_desc.hpp>
#include <sycl/detail/kernel_name_based_cache.hpp>
#include <sycl/detail/kernel_name_str_t.hpp>
#include <sycl/detail/reduction_forward.hpp>
#include <sycl/detail/string.hpp>
Expand Down Expand Up @@ -813,6 +814,7 @@ class __SYCL_EXPORT handler {
// later during finalize.
setArgsToAssociatedAccessors();
}
setKernelNameBasedCachePtr(detail::getKernelNameBasedCache<KernelName>());

// If the kernel lambda is callable with a kernel_handler argument, manifest
// the associated kernel handler.
Expand Down Expand Up @@ -3789,6 +3791,9 @@ class __SYCL_EXPORT handler {
sycl::handler &h, size_t size,
const ext::oneapi::experimental::memory_pool &pool);

void setKernelNameBasedCachePtr(
detail::KernelNameBasedCacheT *KernelNameBasedCachePtr);

protected:
/// Registers event dependencies in this command group.
void depends_on(const detail::EventImplPtr &Event);
Expand Down
1 change: 1 addition & 0 deletions sycl/source/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ set(SYCL_COMMON_SOURCES
"detail/kernel_compiler/kernel_compiler_opencl.cpp"
"detail/kernel_compiler/kernel_compiler_sycl.cpp"
"detail/kernel_impl.cpp"
"detail/kernel_name_based_cache.cpp"
"detail/kernel_program_cache.cpp"
"detail/memory_manager.cpp"
"detail/pipes.cpp"
Expand Down
6 changes: 5 additions & 1 deletion sycl/source/detail/cg.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ class CGExecKernel : public CG {
std::shared_ptr<detail::kernel_bundle_impl> MKernelBundle;
std::vector<ArgDesc> MArgs;
KernelNameStrT MKernelName;
KernelNameBasedCacheT *MKernelNameBasedCachePtr;
std::vector<std::shared_ptr<detail::stream_impl>> MStreams;
std::vector<std::shared_ptr<const void>> MAuxiliaryResources;
/// Used to implement ext_oneapi_graph dynamic_command_group. Stores the list
Expand All @@ -272,6 +273,7 @@ class CGExecKernel : public CG {
std::shared_ptr<detail::kernel_bundle_impl> KernelBundle,
CG::StorageInitHelper CGData, std::vector<ArgDesc> Args,
KernelNameStrT KernelName,
KernelNameBasedCacheT *KernelNameBasedCachePtr,
std::vector<std::shared_ptr<detail::stream_impl>> Streams,
std::vector<std::shared_ptr<const void>> AuxiliaryResources,
CGType Type, ur_kernel_cache_config_t KernelCacheConfig,
Expand All @@ -281,7 +283,9 @@ class CGExecKernel : public CG {
MNDRDesc(std::move(NDRDesc)), MHostKernel(std::move(HKernel)),
MSyclKernel(std::move(SyclKernel)),
MKernelBundle(std::move(KernelBundle)), MArgs(std::move(Args)),
MKernelName(std::move(KernelName)), MStreams(std::move(Streams)),
MKernelName(std::move(KernelName)),
MKernelNameBasedCachePtr(KernelNameBasedCachePtr),
MStreams(std::move(Streams)),
MAuxiliaryResources(std::move(AuxiliaryResources)),
MAlternativeKernels{}, MKernelCacheConfig(std::move(KernelCacheConfig)),
MKernelIsCooperative(KernelIsCooperative),
Expand Down
10 changes: 10 additions & 0 deletions sycl/source/detail/global_handler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include <detail/adapter.hpp>
#include <detail/config.hpp>
#include <detail/global_handler.hpp>
#include <detail/kernel_name_based_cache_t.hpp>
#include <detail/platform_impl.hpp>
#include <detail/program_manager/program_manager.hpp>
#include <detail/scheduler/scheduler.hpp>
Expand Down Expand Up @@ -252,6 +253,15 @@ ThreadPool &GlobalHandler::getHostTaskThreadPool() {
return TP;
}

KernelNameBasedCacheT *GlobalHandler::createKernelNameBasedCache() {
static std::vector<std::unique_ptr<KernelNameBasedCacheT>>
&KernelNameBasedCaches = getOrCreate(MKernelNameBasedCaches);
LockGuard LG{MKernelNameBasedCaches.Lock};
return KernelNameBasedCaches
.emplace_back(std::make_unique<KernelNameBasedCacheT>())
.get();
}

void GlobalHandler::releaseDefaultContexts() {
// Release shared-pointers to SYCL objects.
// Note that on Windows the destruction of the default context
Expand Down
5 changes: 4 additions & 1 deletion sycl/source/detail/global_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class Adapter;
class ods_target_list;
class XPTIRegistry;
class ThreadPool;
struct KernelNameBasedCacheT;

using ContextImplPtr = std::shared_ptr<context_impl>;
using AdapterPtr = std::shared_ptr<Adapter>;
Expand Down Expand Up @@ -73,7 +74,7 @@ class GlobalHandler {
ods_target_list &getOneapiDeviceSelectorTargets(const std::string &InitValue);
XPTIRegistry &getXPTIRegistry();
ThreadPool &getHostTaskThreadPool();

KernelNameBasedCacheT *createKernelNameBasedCache();
static void registerStaticVarShutdownHandler();

bool isOkToDefer() const;
Expand Down Expand Up @@ -129,6 +130,8 @@ class GlobalHandler {
InstWithLock<XPTIRegistry> MXPTIRegistry;
// Thread pool for host task and event callbacks execution
InstWithLock<ThreadPool> MHostTaskThreadPool;
InstWithLock<std::vector<std::unique_ptr<KernelNameBasedCacheT>>>
MKernelNameBasedCaches;
};
} // namespace detail
} // namespace _V1
Expand Down
7 changes: 4 additions & 3 deletions sycl/source/detail/graph_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -813,8 +813,8 @@ exec_graph_impl::enqueueNodeDirect(sycl::context Ctx,
CGExec->MLine, CGExec->MColumn);
std::tie(CmdTraceEvent, InstanceID) = emitKernelInstrumentationData(
StreamID, CGExec->MSyclKernel, CodeLoc, CGExec->MIsTopCodeLoc,
CGExec->MKernelName.data(), nullptr, CGExec->MNDRDesc,
CGExec->MKernelBundle, CGExec->MArgs);
CGExec->MKernelName.data(), CGExec->MKernelNameBasedCachePtr, nullptr,
CGExec->MNDRDesc, CGExec->MKernelBundle, CGExec->MArgs);
if (CmdTraceEvent)
sycl::detail::emitInstrumentationGeneral(
StreamID, InstanceID, CmdTraceEvent, xpti::trace_task_begin, nullptr);
Expand Down Expand Up @@ -1502,7 +1502,8 @@ void exec_graph_impl::populateURKernelUpdateStructs(
ur_program_handle_t UrProgram = nullptr;
std::tie(UrKernel, std::ignore, EliminatedArgMask, UrProgram) =
sycl::detail::ProgramManager::getInstance().getOrCreateKernel(
ContextImpl, DeviceImpl, ExecCG.MKernelName);
ContextImpl, DeviceImpl, ExecCG.MKernelName,
ExecCG.MKernelNameBasedCachePtr);
BundleObjs = std::make_pair(UrProgram, UrKernel);
}

Expand Down
3 changes: 3 additions & 0 deletions sycl/source/detail/handler_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,9 @@ class handler_impl {
detail::kernel_param_desc_t (*MKernelParamDescGetter)(int) = nullptr;
bool MKernelIsESIMD = false;
bool MKernelHasSpecialCaptures = true;

// A pointer to a kernel name based cache retrieved on the application side.
KernelNameBasedCacheT *MKernelNameBasedCachePtr;
};

} // namespace detail
Expand Down
22 changes: 22 additions & 0 deletions sycl/source/detail/kernel_name_based_cache.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
//==--------------------- kernel_name_based_cache.cpp ----------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include <detail/global_handler.hpp>
#include <sycl/detail/kernel_name_based_cache.hpp>

namespace sycl {
inline namespace _V1 {
namespace detail {

KernelNameBasedCacheT *createKernelNameBasedCache() {
return GlobalHandler::instance().createKernelNameBasedCache();
}

} // namespace detail
} // namespace _V1
} // namespace sycl
43 changes: 43 additions & 0 deletions sycl/source/detail/kernel_name_based_cache_t.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
//==-------------------- kernel_name_based_cache_t.hpp ---------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#pragma once

#include <detail/kernel_arg_mask.hpp>
#include <sycl/detail/spinlock.hpp>
#include <sycl/detail/ur.hpp>

#include <mutex>

#include <boost/unordered/unordered_flat_map.hpp>

namespace sycl {
inline namespace _V1 {
namespace detail {
using FastKernelCacheKeyT = std::pair<ur_device_handle_t, ur_context_handle_t>;
using FastKernelCacheValT =
std::tuple<ur_kernel_handle_t, std::mutex *, const KernelArgMask *,
ur_program_handle_t>;
using FastKernelSubcacheMapT =
::boost::unordered_flat_map<FastKernelCacheKeyT, FastKernelCacheValT>;

using FastKernelSubcacheMutexT = SpinLock;
using FastKernelSubcacheReadLockT = std::lock_guard<FastKernelSubcacheMutexT>;
using FastKernelSubcacheWriteLockT = std::lock_guard<FastKernelSubcacheMutexT>;

struct FastKernelSubcacheT {
FastKernelSubcacheMapT Map;
FastKernelSubcacheMutexT Mutex;
};

struct KernelNameBasedCacheT {
FastKernelSubcacheT FastKernelSubcache;
};

} // namespace detail
} // namespace _V1
} // namespace sycl
5 changes: 5 additions & 0 deletions sycl/source/detail/kernel_program_cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ namespace detail {
const AdapterPtr &KernelProgramCache::getAdapter() {
return MParentContext->getAdapter();
}

ur_context_handle_t KernelProgramCache::getURContext() const {
return MParentContext->getHandleRef();
}

} // namespace detail
} // namespace _V1
} // namespace sycl
Loading
Loading