Skip to content

Commit 8f9d0d2

Browse files
authored
[SYCL] Add XPTI instrumentation for SYCL buffer (#5161)
Add tracing for buffer events: - buffer creation; - allocation of BE-specific handler for the buffer; - release of BE-specific handler for the buffer; - buffer destruction.
1 parent ba9fb05 commit 8f9d0d2

File tree

14 files changed

+355
-42
lines changed

14 files changed

+355
-42
lines changed

sycl/doc/SYCLInstrumentationUsingXPTI.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,15 @@ All trace point types in bold provide semantic information about the graph, node
257257
| `barrier_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_begin` that marks the beginning of a barrier while enqueuing a command group object</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
258258
| `barrier_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_end` that marks the end of the barrier that is encountered during enqueue.</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
259259

260+
## Buffer management stream `"sycl.experimental.buffer"` Notification Signatures
261+
262+
| Trace Point Type | Parameter Description | Metadata |
263+
| :------------------------: | :-------------------- | :------- |
264+
| `offload_alloc_construct` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_data_t` that marks offload buffer createtion point</li> <li> **parent**: Event ID created for all functions in the `oneapi.experimental.buffer` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed. </li> <li> **user_data**: A pointer to `offload_buffer_data_t` object, that includes user object ID, source code location (file name (if available), function name, line number) where the buffer object is created. </li></div> | None |
265+
| `offload_alloc_associate` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_association_data_t` that provides association between user level buffer object and platform specific memory object</li> <li> **parent**: Event ID created for all functions in the `oneapi.experimental.buffer` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed.</li> <li> **user_data**: A pointer to `offload_buffer_association_data_t` object, that includes user object ID and platform-specific representation for offload buffer. </li></div> | None |
266+
| `offload_alloc_release` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_release_data_t` that provides information about release of platform specific memory object</li> <li> **parent**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed.</li> <li> **user_data**: A pointer to `offload_buffer_association_data_t` object, that includes user object ID and platform-specific representation for offload buffer. </li></div> | None |
267+
| `offload_alloc_construct` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_data_t` that marks offload buffer createtion point</li> <li> **parent**: Event ID created for all functions in the `oneapi.experimental.buffer` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed. </li> <li> **user_data**: A pointer to `offload_buffer_data_t` object, that includes user object ID. </li></div>| None |
268+
260269
## Level Zero Plugin Stream `"oneapi.level_zero.experimental.mem_alloc"` Notification Signatures
261270

262271
| Trace Point Type | Parameter Description | Metadata |

sycl/include/CL/sycl/buffer.hpp

100755100644
Lines changed: 71 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,8 @@ class buffer {
6060
template <class Container>
6161
using EnableIfContiguous =
6262
detail::void_t<detail::enable_if_t<std::is_convertible<
63-
detail::remove_pointer_t<decltype(
64-
std::declval<Container>().data())> (*)[],
63+
detail::remove_pointer_t<
64+
decltype(std::declval<Container>().data())> (*)[],
6565
const T (*)[]>::value>,
6666
decltype(std::declval<Container>().size())>;
6767
template <class It>
@@ -73,157 +73,187 @@ class buffer {
7373
std::is_same<ItA, ItB>::value && !std::is_const<ItA>::value, ItA>;
7474

7575
buffer(const range<dimensions> &bufferRange,
76-
const property_list &propList = {})
76+
const property_list &propList = {},
77+
const detail::code_location CodeLoc = detail::code_location::current())
7778
: Range(bufferRange) {
7879
impl = std::make_shared<detail::buffer_impl>(
7980
size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)), propList,
8081
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
82+
impl->constructorNotification(CodeLoc, (void *)impl.get());
8183
}
8284

8385
buffer(const range<dimensions> &bufferRange, AllocatorT allocator,
84-
const property_list &propList = {})
86+
const property_list &propList = {},
87+
const detail::code_location CodeLoc = detail::code_location::current())
8588
: Range(bufferRange) {
8689
impl = std::make_shared<detail::buffer_impl>(
8790
size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)), propList,
8891
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
8992
allocator));
93+
impl->constructorNotification(CodeLoc, (void *)impl.get());
9094
}
9195

9296
buffer(T *hostData, const range<dimensions> &bufferRange,
93-
const property_list &propList = {})
97+
const property_list &propList = {},
98+
const detail::code_location CodeLoc = detail::code_location::current())
9499
: Range(bufferRange) {
95100
impl = std::make_shared<detail::buffer_impl>(
96101
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
97102
propList,
98103
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
104+
impl->constructorNotification(CodeLoc, (void *)impl.get());
99105
}
100106

101107
buffer(T *hostData, const range<dimensions> &bufferRange,
102-
AllocatorT allocator, const property_list &propList = {})
108+
AllocatorT allocator, const property_list &propList = {},
109+
const detail::code_location CodeLoc = detail::code_location::current())
103110
: Range(bufferRange) {
104111
impl = std::make_shared<detail::buffer_impl>(
105112
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
106113
propList,
107114
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
108115
allocator));
116+
impl->constructorNotification(CodeLoc, (void *)impl.get());
109117
}
110118

111119
template <typename _T = T>
112120
buffer(EnableIfSameNonConstIterators<T, _T> const *hostData,
113121
const range<dimensions> &bufferRange,
114-
const property_list &propList = {})
122+
const property_list &propList = {},
123+
const detail::code_location CodeLoc = detail::code_location::current())
115124
: Range(bufferRange) {
116125
impl = std::make_shared<detail::buffer_impl>(
117126
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
118127
propList,
119128
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
129+
impl->constructorNotification(CodeLoc, (void *)impl.get());
120130
}
121131

122132
template <typename _T = T>
123133
buffer(EnableIfSameNonConstIterators<T, _T> const *hostData,
124134
const range<dimensions> &bufferRange, AllocatorT allocator,
125-
const property_list &propList = {})
135+
const property_list &propList = {},
136+
const detail::code_location CodeLoc = detail::code_location::current())
126137
: Range(bufferRange) {
127138
impl = std::make_shared<detail::buffer_impl>(
128139
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
129140
propList,
130141
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
131142
allocator));
143+
impl->constructorNotification(CodeLoc, (void *)impl.get());
132144
}
133145

134146
buffer(const std::shared_ptr<T> &hostData,
135147
const range<dimensions> &bufferRange, AllocatorT allocator,
136-
const property_list &propList = {})
148+
const property_list &propList = {},
149+
const detail::code_location CodeLoc = detail::code_location::current())
137150
: Range(bufferRange) {
138151
impl = std::make_shared<detail::buffer_impl>(
139152
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
140153
propList,
141154
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
142155
allocator));
156+
impl->constructorNotification(CodeLoc, (void *)impl.get());
143157
}
144158

145159
buffer(const std::shared_ptr<T[]> &hostData,
146160
const range<dimensions> &bufferRange, AllocatorT allocator,
147-
const property_list &propList = {})
161+
const property_list &propList = {},
162+
const detail::code_location CodeLoc = detail::code_location::current())
148163
: Range(bufferRange) {
149164
impl = std::make_shared<detail::buffer_impl>(
150165
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
151166
propList,
152167
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
153168
allocator));
169+
impl->constructorNotification(CodeLoc, (void *)impl.get());
154170
}
155171

156172
buffer(const std::shared_ptr<T> &hostData,
157173
const range<dimensions> &bufferRange,
158-
const property_list &propList = {})
174+
const property_list &propList = {},
175+
const detail::code_location CodeLoc = detail::code_location::current())
159176
: Range(bufferRange) {
160177
impl = std::make_shared<detail::buffer_impl>(
161178
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
162179
propList,
163180
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
181+
impl->constructorNotification(CodeLoc, (void *)impl.get());
164182
}
165183

166184
buffer(const std::shared_ptr<T[]> &hostData,
167185
const range<dimensions> &bufferRange,
168-
const property_list &propList = {})
186+
const property_list &propList = {},
187+
const detail::code_location CodeLoc = detail::code_location::current())
169188
: Range(bufferRange) {
170189
impl = std::make_shared<detail::buffer_impl>(
171190
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
172191
propList,
173192
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
193+
impl->constructorNotification(CodeLoc, (void *)impl.get());
174194
}
175195

176196
template <class InputIterator, int N = dimensions,
177197
typename = EnableIfOneDimension<N>,
178198
typename = EnableIfItInputIterator<InputIterator>>
179199
buffer(InputIterator first, InputIterator last, AllocatorT allocator,
180-
const property_list &propList = {})
200+
const property_list &propList = {},
201+
const detail::code_location CodeLoc = detail::code_location::current())
181202
: Range(range<1>(std::distance(first, last))) {
182203
impl = std::make_shared<detail::buffer_impl>(
183204
first, last, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
184205
propList,
185206
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
186207
allocator));
208+
impl->constructorNotification(CodeLoc, (void *)impl.get());
187209
}
188210

189211
template <class InputIterator, int N = dimensions,
190212
typename = EnableIfOneDimension<N>,
191213
typename = EnableIfItInputIterator<InputIterator>>
192214
buffer(InputIterator first, InputIterator last,
193-
const property_list &propList = {})
215+
const property_list &propList = {},
216+
const detail::code_location CodeLoc = detail::code_location::current())
194217
: Range(range<1>(std::distance(first, last))) {
195218
impl = std::make_shared<detail::buffer_impl>(
196219
first, last, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
197220
propList,
198221
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
222+
impl->constructorNotification(CodeLoc, (void *)impl.get());
199223
}
200224

201225
// This constructor is a prototype for a future SYCL specification
202226
template <class Container, int N = dimensions,
203227
typename = EnableIfOneDimension<N>,
204228
typename = EnableIfContiguous<Container>>
205229
buffer(Container &container, AllocatorT allocator,
206-
const property_list &propList = {})
230+
const property_list &propList = {},
231+
const detail::code_location CodeLoc = detail::code_location::current())
207232
: Range(range<1>(container.size())) {
208233
impl = std::make_shared<detail::buffer_impl>(
209234
container.data(), size() * sizeof(T),
210235
detail::getNextPowerOfTwo(sizeof(T)), propList,
211236
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
212237
allocator));
238+
impl->constructorNotification(CodeLoc, (void *)impl.get());
213239
}
214240

215241
// This constructor is a prototype for a future SYCL specification
216242
template <class Container, int N = dimensions,
217243
typename = EnableIfOneDimension<N>,
218244
typename = EnableIfContiguous<Container>>
219-
buffer(Container &container, const property_list &propList = {})
220-
: buffer(container, {}, propList) {}
245+
buffer(Container &container, const property_list &propList = {},
246+
const detail::code_location CodeLoc = detail::code_location::current())
247+
: buffer(container, {}, propList, CodeLoc) {}
221248

222249
buffer(buffer<T, dimensions, AllocatorT> &b, const id<dimensions> &baseIndex,
223-
const range<dimensions> &subRange)
250+
const range<dimensions> &subRange,
251+
const detail::code_location CodeLoc = detail::code_location::current())
224252
: impl(b.impl), Range(subRange),
225253
OffsetInBytes(getOffsetInBytes<T>(baseIndex, b.Range)),
226254
IsSubBuffer(true) {
255+
impl->constructorNotification(CodeLoc, (void *)impl.get());
256+
227257
if (b.is_sub_buffer())
228258
throw cl::sycl::invalid_object_error(
229259
"Cannot create sub buffer from sub buffer.", PI_INVALID_VALUE);
@@ -239,7 +269,8 @@ class buffer {
239269
#ifdef __SYCL_INTERNAL_API
240270
template <int N = dimensions, typename = EnableIfOneDimension<N>>
241271
buffer(cl_mem MemObject, const context &SyclContext,
242-
event AvailableEvent = {})
272+
event AvailableEvent = {},
273+
const detail::code_location CodeLoc = detail::code_location::current())
243274
: Range{0} {
244275

245276
size_t BufSize = detail::SYCLMemObjT::getBufSizeForContext(
@@ -250,12 +281,23 @@ class buffer {
250281
detail::pi::cast<pi_native_handle>(MemObject), SyclContext, BufSize,
251282
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(),
252283
AvailableEvent);
284+
impl->constructorNotification(CodeLoc, (void *)impl.get());
253285
}
254286
#endif
255287

256-
buffer(const buffer &rhs) = default;
288+
buffer(const buffer &rhs,
289+
const detail::code_location CodeLoc = detail::code_location::current())
290+
: impl(rhs.impl), Range(rhs.Range), OffsetInBytes(rhs.OffsetInBytes),
291+
IsSubBuffer(rhs.IsSubBuffer) {
292+
impl->constructorNotification(CodeLoc, (void *)impl.get());
293+
}
257294

258-
buffer(buffer &&rhs) = default;
295+
buffer(buffer &&rhs,
296+
const detail::code_location CodeLoc = detail::code_location::current())
297+
: impl(std::move(rhs.impl)), Range(rhs.Range),
298+
OffsetInBytes(rhs.OffsetInBytes), IsSubBuffer(rhs.IsSubBuffer) {
299+
impl->constructorNotification(CodeLoc, (void *)impl.get());
300+
}
259301

260302
buffer &operator=(const buffer &rhs) = default;
261303

@@ -424,7 +466,8 @@ class buffer {
424466
// Interop constructor
425467
template <int N = dimensions, typename = EnableIfOneDimension<N>>
426468
buffer(pi_native_handle MemObject, const context &SyclContext,
427-
event AvailableEvent = {})
469+
event AvailableEvent = {},
470+
const detail::code_location CodeLoc = detail::code_location::current())
428471
: Range{0} {
429472

430473
size_t BufSize = detail::SYCLMemObjT::getBufSizeForContext(
@@ -435,14 +478,18 @@ class buffer {
435478
MemObject, SyclContext, BufSize,
436479
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(),
437480
AvailableEvent);
481+
impl->constructorNotification(CodeLoc, (void *)impl.get());
438482
}
439483

440484
// Reinterpret contructor
441485
buffer(std::shared_ptr<detail::buffer_impl> Impl,
442486
range<dimensions> reinterpretRange, size_t reinterpretOffset,
443-
bool isSubBuffer)
487+
bool isSubBuffer,
488+
const detail::code_location CodeLoc = detail::code_location::current())
444489
: impl(Impl), Range(reinterpretRange), OffsetInBytes(reinterpretOffset),
445-
IsSubBuffer(isSubBuffer){};
490+
IsSubBuffer(isSubBuffer) {
491+
impl->constructorNotification(CodeLoc, (void *)impl.get());
492+
}
446493

447494
template <typename Type, int N>
448495
size_t getOffsetInBytes(const id<N> &offset, const range<N> &range) {

sycl/include/CL/sycl/detail/buffer_impl.hpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,9 @@ class __SYCL_EXPORT buffer_impl final : public SYCLMemObjT {
155155

156156
void *allocateMem(ContextImplPtr Context, bool InitFromUserData,
157157
void *HostPtr, RT::PiEvent &OutEventToWait) override;
158+
void constructorNotification(const detail::code_location &CodeLoc,
159+
void *UserObj);
160+
void destructorNotification(void *UserObj);
158161

159162
MemObjType getType() const override { return MemObjType::Buffer; }
160163

@@ -163,6 +166,7 @@ class __SYCL_EXPORT buffer_impl final : public SYCLMemObjT {
163166
BaseT::updateHostMemory();
164167
} catch (...) {
165168
}
169+
destructorNotification(this);
166170
}
167171

168172
void resize(size_t size) { BaseT::MSizeInBytes = size; }

sycl/source/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ set(SYCL_SOURCES
156156
"detail/sycl_mem_obj_t.cpp"
157157
"detail/usm/usm_impl.cpp"
158158
"detail/util.cpp"
159+
"detail/xpti_registry.cpp"
159160
"accessor.cpp"
160161
"context.cpp"
161162
"device.cpp"

0 commit comments

Comments
 (0)