Skip to content

Commit a068b15

Browse files
authored
[SYCL][XPTI] Report memory allocation info from SYCL runtime (#5172)
1 parent cbcb756 commit a068b15

File tree

9 files changed

+397
-23
lines changed

9 files changed

+397
-23
lines changed

sycl/doc/SYCLInstrumentationUsingXPTI.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,3 +256,12 @@ All trace point types in bold provide semantic information about the graph, node
256256
| `wait_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::wait_end` that marks the beginning of the wait on an `event`</li> <li> **parent**: `nullptr`</li> <li> **event**: The event ID will reflect the ID of the command group object submission that created this event or a new event based on the combination of the string "queue.wait" and the address of the event. </li> <li> **instance**: Unique ID to allow the correlation of the `wait_begin` event with the `wait_end` event. </li> <li> **user_data**: String indicating `queue.wait` and the address of the event as `const char *` </li></div> | **`sycl_device`**, `sym_function_name`, `sym_source_file_name`, `sym_line_no` |
257257
| `barrier_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_begin` that marks the beginning of a barrier while enqueuing a command group object</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
258258
| `barrier_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_end` that marks the end of the barrier that is encountered during enqueue.</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
259+
260+
## Level Zero Plugin Stream `"oneapi.level_zero.experimental.mem_alloc"` Notification Signatures
261+
262+
| Trace Point Type | Parameter Description | Metadata |
263+
| :------------------------: | :-------------------- | :------- |
264+
| `mem_alloc_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_alloc_begin` that marks the beginning of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_alloc_begin` event with the `mem_alloc_end` event. </li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any), allocation size, and guard zone size (if any). </li></div> | None |
265+
| `mem_alloc_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_alloc_end` that marks the end of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_alloc_begin` event with the `mem_alloc_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `mem_alloc_begin`.</li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any), allocated pointer, allocation size, and guard zone size (if any). </li></div> | None |
266+
| `mem_release_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_release_begin` that marks the beginning of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_release_begin` event with the `mem_release_end` event. </li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any) and released pointer. </li></div> | None |
267+
| `mem_release_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::mem_release_end` that marks the end of memory allocation process</li> <li> **parent**: Event ID created for all functions in the `oneapi.level_zero.experimental.mem_alloc` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: Unique ID to allow the correlation of the `mem_release_begin` event with the `mem_release_end` event. This value is guaranteed to be the same value received by the trace event for the corresponding `mem_release_begin`.</li> <li> **user_data**: A pointer to `mem_alloc_data_t` object, that includes memory object ID (if any) and released pointer. </li></div> | None |

sycl/source/detail/device_image_impl.hpp

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include <detail/context_impl.hpp>
1818
#include <detail/device_impl.hpp>
1919
#include <detail/kernel_id_impl.hpp>
20+
#include <detail/mem_alloc_helper.hpp>
2021
#include <detail/plugin.hpp>
2122
#include <detail/program_manager/program_manager.hpp>
2223

@@ -185,11 +186,11 @@ class device_image_impl {
185186
std::lock_guard<std::mutex> Lock{MSpecConstAccessMtx};
186187
if (nullptr == MSpecConstsBuffer && !MSpecConstsBlob.empty()) {
187188
const detail::plugin &Plugin = getSyclObjImpl(MContext)->getPlugin();
188-
Plugin.call<PiApiKind::piMemBufferCreate>(
189-
detail::getSyclObjImpl(MContext)->getHandleRef(),
190-
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
191-
MSpecConstsBlob.size(), MSpecConstsBlob.data(), &MSpecConstsBuffer,
192-
nullptr);
189+
memBufferCreateHelper(Plugin,
190+
detail::getSyclObjImpl(MContext)->getHandleRef(),
191+
PI_MEM_FLAGS_ACCESS_RW | PI_MEM_FLAGS_HOST_PTR_USE,
192+
MSpecConstsBlob.size(), MSpecConstsBlob.data(),
193+
&MSpecConstsBuffer, nullptr);
193194
}
194195
return MSpecConstsBuffer;
195196
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
//==-------- mem_alloc_helper.hpp - SYCL mem alloc helper ------------------==//
2+
//
3+
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4+
// See https://llvm.org/LICENSE.txt for license information.
5+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6+
//
7+
//===----------------------------------------------------------------------===//
8+
9+
#pragma once
10+
11+
#include <CL/sycl/detail/pi.h>
12+
13+
__SYCL_INLINE_NAMESPACE(cl) {
14+
namespace sycl {
15+
namespace detail {
16+
void memBufferCreateHelper(const plugin &Plugin, pi_context Ctx,
17+
pi_mem_flags Flags, size_t Size, void *HostPtr,
18+
pi_mem *RetMem,
19+
const pi_mem_properties *Props = nullptr);
20+
void memReleaseHelper(const plugin &Plugin, pi_mem Mem);
21+
void memBufferMapHelper(const plugin &Plugin, pi_queue command_queue,
22+
pi_mem buffer, pi_bool blocking_map,
23+
pi_map_flags map_flags, size_t offset, size_t size,
24+
pi_uint32 num_events_in_wait_list,
25+
const pi_event *event_wait_list, pi_event *event,
26+
void **ret_map);
27+
void memUnmapHelper(const plugin &Plugin, pi_queue command_queue, pi_mem memobj,
28+
void *mapped_ptr, pi_uint32 num_events_in_wait_list,
29+
const pi_event *event_wait_list, pi_event *event);
30+
} // namespace detail
31+
} // namespace sycl
32+
} // __SYCL_INLINE_NAMESPACE(cl)

sycl/source/detail/memory_manager.cpp

Lines changed: 188 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,106 @@
99
#include <CL/sycl/detail/memory_manager.hpp>
1010
#include <detail/context_impl.hpp>
1111
#include <detail/event_impl.hpp>
12+
#include <detail/mem_alloc_helper.hpp>
1213
#include <detail/queue_impl.hpp>
1314

1415
#include <algorithm>
1516
#include <cassert>
1617
#include <cstring>
1718
#include <vector>
1819

20+
#ifdef XPTI_ENABLE_INSTRUMENTATION
21+
#include <xpti/xpti_data_types.h>
22+
#include <xpti/xpti_trace_framework.hpp>
23+
#endif
24+
1925
__SYCL_INLINE_NAMESPACE(cl) {
2026
namespace sycl {
2127
namespace detail {
2228

29+
#ifdef XPTI_ENABLE_INSTRUMENTATION
30+
uint8_t GMemAllocStreamID;
31+
xpti::trace_event_data_t *GMemAllocEvent;
32+
#endif
33+
34+
uint64_t emitMemAllocBeginTrace(uintptr_t ObjHandle, size_t AllocSize,
35+
size_t GuardZone) {
36+
(void)ObjHandle;
37+
(void)AllocSize;
38+
(void)GuardZone;
39+
uint64_t CorrelationID = 0;
40+
#ifdef XPTI_ENABLE_INSTRUMENTATION
41+
if (xptiTraceEnabled()) {
42+
xpti::mem_alloc_data_t MemAlloc{ObjHandle, 0 /* alloc ptr */, AllocSize,
43+
GuardZone};
44+
45+
CorrelationID = xptiGetUniqueId();
46+
xptiNotifySubscribers(
47+
GMemAllocStreamID,
48+
static_cast<uint16_t>(xpti::trace_point_type_t::mem_alloc_begin),
49+
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
50+
}
51+
#endif
52+
return CorrelationID;
53+
}
54+
55+
void emitMemAllocEndTrace(uintptr_t ObjHandle, uintptr_t AllocPtr,
56+
size_t AllocSize, size_t GuardZone,
57+
uint64_t CorrelationID) {
58+
(void)ObjHandle;
59+
(void)AllocPtr;
60+
(void)AllocSize;
61+
(void)GuardZone;
62+
(void)CorrelationID;
63+
#ifdef XPTI_ENABLE_INSTRUMENTATION
64+
if (xptiTraceEnabled()) {
65+
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, AllocSize, GuardZone};
66+
67+
xptiNotifySubscribers(
68+
GMemAllocStreamID,
69+
static_cast<uint16_t>(xpti::trace_point_type_t::mem_alloc_end),
70+
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
71+
}
72+
#endif
73+
}
74+
75+
uint64_t emitMemReleaseBeginTrace(uintptr_t ObjHandle, uintptr_t AllocPtr) {
76+
(void)ObjHandle;
77+
(void)AllocPtr;
78+
#ifdef XPTI_ENABLE_INSTRUMENTATION
79+
uint64_t CorrelationID = 0;
80+
if (xptiTraceEnabled()) {
81+
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, 0 /* alloc size */,
82+
0 /* guard zone */};
83+
84+
CorrelationID = xptiGetUniqueId();
85+
xptiNotifySubscribers(
86+
GMemAllocStreamID,
87+
static_cast<uint16_t>(xpti::trace_point_type_t::mem_release_begin),
88+
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
89+
}
90+
#endif
91+
return CorrelationID;
92+
}
93+
94+
void emitMemReleaseEndTrace(uintptr_t ObjHandle, uintptr_t AllocPtr,
95+
uint64_t CorrelationID) {
96+
(void)ObjHandle;
97+
(void)AllocPtr;
98+
(void)CorrelationID;
99+
#ifdef XPTI_ENABLE_INSTRUMENTATION
100+
if (xptiTraceEnabled()) {
101+
xpti::mem_alloc_data_t MemAlloc{ObjHandle, AllocPtr, 0 /* alloc size */,
102+
0 /* guard zone */};
103+
104+
xptiNotifySubscribers(
105+
GMemAllocStreamID,
106+
static_cast<uint16_t>(xpti::trace_point_type_t::mem_release_end),
107+
GMemAllocEvent, nullptr, CorrelationID, &MemAlloc);
108+
}
109+
#endif
110+
}
111+
23112
static void waitForEvents(const std::vector<EventImplPtr> &Events) {
24113
// Assuming all events will be on the same device or
25114
// devices associated with the same Backend.
@@ -34,6 +123,97 @@ static void waitForEvents(const std::vector<EventImplPtr> &Events) {
34123
}
35124
}
36125

126+
void memBufferCreateHelper(const plugin &Plugin, pi_context Ctx,
127+
pi_mem_flags Flags, size_t Size, void *HostPtr,
128+
pi_mem *RetMem, const pi_mem_properties *Props) {
129+
uint64_t CorrID = 0;
130+
// We only want to instrument piMemBufferCreate
131+
{
132+
CorrID =
133+
emitMemAllocBeginTrace(0 /* mem object */, Size, 0 /* guard zone */);
134+
xpti::utils::finally _{[&] {
135+
// C-style cast is required for MSVC
136+
uintptr_t MemObjID = (uintptr_t)(*RetMem);
137+
pi_native_handle Ptr = 0;
138+
// Always use call_nocheck here, because call may throw an exception,
139+
// and this lambda will be called from destructor, which in combination
140+
// rewards us with UB.
141+
Plugin.call_nocheck<PiApiKind::piextMemGetNativeHandle>(*RetMem, &Ptr);
142+
emitMemAllocEndTrace(MemObjID, (uintptr_t)(Ptr), Size, 0 /* guard zone */,
143+
CorrID);
144+
}};
145+
Plugin.call<PiApiKind::piMemBufferCreate>(Ctx, Flags, Size, HostPtr, RetMem,
146+
Props);
147+
}
148+
}
149+
150+
void memReleaseHelper(const plugin &Plugin, pi_mem Mem) {
151+
// FIXME piMemRelease does not guarante memory release. It is only true if
152+
// reference counter is 1. However, SYCL runtime currently only calls
153+
// piMemRetain only for OpenCL interop
154+
uint64_t CorrID = 0;
155+
// C-style cast is required for MSVC
156+
uintptr_t MemObjID = (uintptr_t)(Mem);
157+
uintptr_t Ptr = 0;
158+
// Do not make unnecessary PI calls without instrumentation enabled
159+
if (xptiTraceEnabled()) {
160+
pi_native_handle PtrHandle = 0;
161+
Plugin.call<PiApiKind::piextMemGetNativeHandle>(Mem, &PtrHandle);
162+
Ptr = (uintptr_t)(PtrHandle);
163+
}
164+
// We only want to instrument piMemRelease
165+
{
166+
CorrID = emitMemReleaseBeginTrace(MemObjID, Ptr);
167+
xpti::utils::finally _{
168+
[&] { emitMemReleaseEndTrace(MemObjID, Ptr, CorrID); }};
169+
Plugin.call<PiApiKind::piMemRelease>(Mem);
170+
}
171+
}
172+
173+
void memBufferMapHelper(const plugin &Plugin, pi_queue Queue, pi_mem Buffer,
174+
pi_bool Blocking, pi_map_flags Flags, size_t Offset,
175+
size_t Size, pi_uint32 NumEvents,
176+
const pi_event *WaitList, pi_event *Event,
177+
void **RetMap) {
178+
uint64_t CorrID = 0;
179+
uintptr_t MemObjID = (uintptr_t)(Buffer);
180+
// We only want to instrument piEnqueueMemBufferMap
181+
{
182+
CorrID = emitMemAllocBeginTrace(MemObjID, Size, 0 /* guard zone */);
183+
xpti::utils::finally _{[&] {
184+
emitMemAllocEndTrace(MemObjID, (uintptr_t)(*RetMap), Size,
185+
0 /* guard zone */, CorrID);
186+
}};
187+
Plugin.call<PiApiKind::piEnqueueMemBufferMap>(
188+
Queue, Buffer, Blocking, Flags, Offset, Size, NumEvents, WaitList,
189+
Event, RetMap);
190+
}
191+
}
192+
193+
void memUnmapHelper(const plugin &Plugin, pi_queue Queue, pi_mem Mem,
194+
void *MappedPtr, pi_uint32 NumEvents,
195+
const pi_event *WaitList, pi_event *Event) {
196+
uint64_t CorrID = 0;
197+
uintptr_t MemObjID = (uintptr_t)(Mem);
198+
uintptr_t Ptr = (uintptr_t)(MappedPtr);
199+
// We only want to instrument piEnqueueMemUnmap
200+
{
201+
CorrID = emitMemReleaseBeginTrace(MemObjID, Ptr);
202+
xpti::utils::finally _{[&] {
203+
// There's no way for SYCL to know, when the pointer is freed, so we have
204+
// to explicitly wait for the end of data transfers here in order to
205+
// report correct events.
206+
// Always use call_nocheck here, because call may throw an exception,
207+
// and this lambda will be called from destructor, which in combination
208+
// rewards us with UB.
209+
Plugin.call_nocheck<PiApiKind::piEventsWait>(1, Event);
210+
emitMemReleaseEndTrace(MemObjID, Ptr, CorrID);
211+
}};
212+
Plugin.call<PiApiKind::piEnqueueMemUnmap>(Queue, Mem, MappedPtr, NumEvents,
213+
WaitList, Event);
214+
}
215+
}
216+
37217
void MemoryManager::release(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
38218
void *MemAllocation,
39219
std::vector<EventImplPtr> DepEvents,
@@ -67,7 +247,7 @@ void MemoryManager::releaseMemObj(ContextImplPtr TargetContext,
67247
}
68248

69249
const detail::plugin &Plugin = TargetContext->getPlugin();
70-
Plugin.call<PiApiKind::piMemRelease>(pi::cast<RT::PiMem>(MemAllocation));
250+
memReleaseHelper(Plugin, pi::cast<RT::PiMem>(MemAllocation));
71251
}
72252

73253
void *MemoryManager::allocate(ContextImplPtr TargetContext, SYCLMemObjI *MemObj,
@@ -165,9 +345,8 @@ MemoryManager::allocateBufferObject(ContextImplPtr TargetContext, void *UserPtr,
165345

166346
RT::PiMem NewMem = nullptr;
167347
const detail::plugin &Plugin = TargetContext->getPlugin();
168-
Plugin.call<PiApiKind::piMemBufferCreate>(TargetContext->getHandleRef(),
169-
CreationFlags, Size, UserPtr,
170-
&NewMem, nullptr);
348+
memBufferCreateHelper(Plugin, TargetContext->getHandleRef(), CreationFlags,
349+
Size, UserPtr, &NewMem, nullptr);
171350
return NewMem;
172351
}
173352

@@ -623,10 +802,9 @@ void *MemoryManager::map(SYCLMemObjI *, void *Mem, QueueImplPtr Queue,
623802
void *MappedPtr = nullptr;
624803
const size_t BytesToMap = AccessRange[0] * AccessRange[1] * AccessRange[2];
625804
const detail::plugin &Plugin = Queue->getPlugin();
626-
Plugin.call<PiApiKind::piEnqueueMemBufferMap>(
627-
Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem), CL_FALSE, Flags,
628-
AccessOffset[0], BytesToMap, DepEvents.size(), DepEvents.data(),
629-
&OutEvent, &MappedPtr);
805+
memBufferMapHelper(Plugin, Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem),
806+
CL_FALSE, Flags, AccessOffset[0], BytesToMap,
807+
DepEvents.size(), DepEvents.data(), &OutEvent, &MappedPtr);
630808
return MappedPtr;
631809
}
632810

@@ -639,9 +817,8 @@ void MemoryManager::unmap(SYCLMemObjI *, void *Mem, QueueImplPtr Queue,
639817
// Using the plugin of the Queue.
640818

641819
const detail::plugin &Plugin = Queue->getPlugin();
642-
Plugin.call<PiApiKind::piEnqueueMemUnmap>(
643-
Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem), MappedPtr,
644-
DepEvents.size(), DepEvents.data(), &OutEvent);
820+
memUnmapHelper(Plugin, Queue->getHandleRef(), pi::cast<RT::PiMem>(Mem),
821+
MappedPtr, DepEvents.size(), DepEvents.data(), &OutEvent);
645822
}
646823

647824
void MemoryManager::copy_usm(const void *SrcMem, QueueImplPtr SrcQueue,

0 commit comments

Comments
 (0)