Skip to content

[SYCL][L0] Implementation of ext_oneapi_prod API #13555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
0ed4727
Preliminary implementation of ext_oneapi_prod API
lbushi25 Apr 24, 2024
64b05b3
Fix incomplete type compilation error
lbushi25 Apr 25, 2024
82b0f3b
Update symbols
lbushi25 Apr 25, 2024
f7a8b81
Add Linux symbols
lbushi25 Apr 25, 2024
564710b
Throw if queue flushing called while graph is recording
lbushi25 Apr 25, 2024
0bd1486
Add uniot test
lbushi25 Apr 26, 2024
5b893f4
Add unit test
lbushi25 Apr 26, 2024
5824a3d
Add unit test
lbushi25 Apr 26, 2024
7ba9aef
Add unit test
lbushi25 Apr 26, 2024
de11c48
Modify CMake files
lbushi25 Apr 26, 2024
4945716
Add E2E test for oneapi_prod extension
lbushi25 Apr 29, 2024
752d0da
Move extension to supported and update UR tag to point to fork
lbushi25 Apr 29, 2024
7a273eb
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 Apr 29, 2024
04f51e9
Update CMakeLists.txt
lbushi25 Apr 29, 2024
50231c3
Update sycl/doc/extensions/supported/sycl_ext_oneapi_prod.asciidoc
lbushi25 Apr 29, 2024
39b8b5e
Delete sycl/doc/extensions/proposed/sycl_ext_oneapi_prod.asciidoc
lbushi25 Apr 29, 2024
8b9be33
Update sycl_ext_oneapi_prod.asciidoc
lbushi25 Apr 29, 2024
0a98e8e
Update sycl_ext_oneapi_prod.asciidoc
lbushi25 Apr 29, 2024
9c79a77
Update queue_impl.hpp
lbushi25 Apr 29, 2024
97c1968
Update CMakeLists.txt
lbushi25 Apr 30, 2024
2cdc9b3
Update CMakeLists.txt
lbushi25 Apr 30, 2024
80cfbf3
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 Apr 30, 2024
4cda8a8
Update sycl/unittests/Extensions/OneAPIProd.cpp
lbushi25 May 2, 2024
f89fde0
Verify exception code and flush all underlying queues
lbushi25 May 2, 2024
e09659d
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 May 2, 2024
ede6359
Updat extension location
lbushi25 May 2, 2024
06ec39d
Delete sycl/doc/extensions/proposed/sycl_ext_oneapi_prod.asciidoc
lbushi25 May 2, 2024
7b3f037
Add extension to proposed
lbushi25 May 2, 2024
3b8f838
Add extension to proposed
lbushi25 May 2, 2024
fbfe848
Add extension to proposed
lbushi25 May 2, 2024
bdcd475
Add extension to proposed
lbushi25 May 2, 2024
8d8bbb3
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 May 2, 2024
d0c7860
Delete sycl/doc/extensions/proposed/sycl_ext_oneapi_prod.asciidoc
lbushi25 May 2, 2024
94b535f
Update CMakeLists.txt
lbushi25 May 2, 2024
4c55051
Update sycl_ext_oneapi_prod.asciidoc
lbushi25 May 2, 2024
4377c8c
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 May 2, 2024
28e031a
[UR] Bump L0 tag to e25ada9c
kbenzie May 15, 2024
d200b7c
Merge remote-tracking branch 'origin/sycl' into ext_oneapi_prod
kbenzie May 15, 2024
d2e286b
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 May 15, 2024
24eb52a
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 May 16, 2024
c274a1c
Merge branch 'sycl' into ext_oneapi_prod
lbushi25 May 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,13 @@ https://github.com/intel/llvm/issues

== Dependencies

This extension is written against the SYCL 2020 revision 5 specification. All
This extension is written against the SYCL 2020 revision 8 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

== Status

This is a proposed extension specification, intended to gather community
feedback. Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state. The specification itself may also change in
incompatible ways before it is finalized. *Shipping software products should
not rely on APIs defined in this specification.*
This extension is implemented and fully supported by DPC++.

== Overview

Expand Down
5 changes: 5 additions & 0 deletions sycl/include/sycl/queue.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2590,6 +2590,11 @@ class __SYCL_EXPORT queue : public detail::OwnerLessBase<queue> {
CodeLoc);
}

/// Provides a hint to the runtime that previously issued commands to this
/// queue should begin executing once their prerequisites have been satisfied.
///
void ext_oneapi_prod();

/// Returns whether the queue is in order or OoO
///
/// Equivalent to has_property<property::queue::in_order>()
Expand Down
14 changes: 14 additions & 0 deletions sycl/source/detail/queue_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,20 @@ class queue_impl {
template <typename Param>
typename Param::return_type get_backend_info() const;

/// Provides a hint to the backend to execute previously issued commands on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not a guarantee then it is useless users like OpenMP will be assuming the job is submitted and inserting barriers to add more dependent tasks.

Copy link
Contributor Author

@lbushi25 lbushi25 Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. The spec of the extension simply says that eventual execution is not guaranteed by calling ext_oneapi_prod(). Whether an app uses or not this API should not affect the correctness of the program. Barriers or higher level calls like wait should still be used to sync with commands. This is merely an encouragement to increase performance. Perhaps @gmlueck can help interpret the spec and its intended usage if I'm misunderstanding!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbushi25's interpretation of the spec is correct.

@RaviNarayanaswamy, I don't understand your concern with barriers and dependent tasks. How does that relate to whether the commands submitted prior to the "prod" call are guaranteed to start executing?

The thing that "prod" does not guarantee is that the commands are necessarily running at the same time as the host thread. For example, it would be invalid for the host thread to spin reading host memory waiting until the kernel wrote some value there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck when we call fflush we want it to submit the previous commands before returning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaviNarayanaswamy But I don't think there is a need for that. The queue class itself is an abstraction that allows you to submit commands and get an event back for synchronization. From your end you can think of the kernel as submitted. The flush functionality in this PR is concerned with encouraging execution of the kernel on the device itself, a lower level than what SYCL exposes..

Copy link
Contributor

@RaviNarayanaswamy RaviNarayanaswamy Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbushi25 If so OpenMP which calls MKL which submits the kernel and returns to OpenMP. Now OpenMP needs to make sure the command is submitted, we cannot use SYCL events to order the scheduling in OpenMP.
If this is not for supporting the above JIRA and just ignore my comments.

Copy link
Contributor Author

@lbushi25 lbushi25 May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RaviNarayanaswamy Based on internal discussion, it seems like the prod API is indeed sufficient for your use-case. Can you confirm?
If so, please leave a review/approval.

/// this queue. Overrides normal batching behaviour. Note that this is merely
/// a hint and not a guarantee.
void flush() {
if (MGraph.lock()) {
throw sycl::exception(make_error_code(errc::invalid),
"flush cannot be called for a queue which is "
"recording to a command graph.");
}
for (const auto &queue : MQueues) {
getPlugin()->call<PiApiKind::piQueueFlush>(queue);
}
}

using SubmitPostProcessF = std::function<void(bool, bool, event &)>;

/// Submits a command group function object to the queue, in order to be
Expand Down
1 change: 1 addition & 0 deletions sycl/source/feature_test.hpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ inline namespace _V1 {
#define SYCL_EXT_ONEAPI_PRIVATE_ALLOCA 1
#define SYCL_EXT_ONEAPI_FORWARD_PROGRESS 1
#define SYCL_EXT_ONEAPI_FREE_FUNCTION_KERNELS 1
#define SYCL_EXT_ONEAPI_PROD 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, defining this macro indicates to the user that feature is supported. But it is not supported yet because you have to bring UR changes.
Also once feature is supported we should move the extension document to the supported directory.
So may be it makes sense to merge changes to UR first and then combine these changes with UR tag update + move the document to supported. Or probably we shouldn't define the macro for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait until the UR changes are merged and then merge this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is a hint and implementation may decide to ignore it completely. Therefore, I think that it is fine to say that this is supported, even if we have some bugs on UR side


#ifndef __has_include
#define __has_include(x) 0
Expand Down
2 changes: 2 additions & 0 deletions sycl/source/queue.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,8 @@ backend queue::get_backend() const noexcept { return getImplBackend(impl); }

bool queue::ext_oneapi_empty() const { return impl->ext_oneapi_empty(); }

void queue::ext_oneapi_prod() { impl->flush(); }

pi_native_handle queue::getNative(int32_t &NativeHandleDesc) const {
return impl->getNative(NativeHandleDesc);
}
Expand Down
85 changes: 85 additions & 0 deletions sycl/test-e2e/QueueFlushing/queue_flushing.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
// RUN: %{build} -o %t.out
// RUN: %t.out

#include <stdlib.h>
#include <sycl/sycl.hpp>
#include <vector>

using namespace sycl;
using namespace std;

// This test checks the ext_oneapi_prod extension. This extension introduces
// only one new function, queue::ext_oneapi_prod(), which serves as a hint to
// the compiler to flush the queue. Since it is simply a hint, we cannot really
// test what the backend is doing but we can at least make sure that code
// involving this function compiles and runs successfully in various contexts.
int main() {
sycl::queue q;

// Test on an empty queue multiple times.
q.ext_oneapi_prod();
q.ext_oneapi_prod();

// Test on a queue after we've submitted a kernel in various contexts.
q.single_task([]() {});
q.ext_oneapi_prod();

q.parallel_for(range<1>{}, [=](auto &idx) {});
q.ext_oneapi_prod();
q.wait();

// Test that the result of an in-progress addition kernel is not affected by
// calling ext_oneapi_prod.
srand(time(0));
constexpr int N = 16;
int A[N];
int B[N];
int add[N];
int mult[N];
for (int i = 0; i < N; ++i) {
A[i] = rand();
B[i] = rand();
}
{
buffer<int> bufA{A, N};
buffer<int> bufB{B, N};
buffer<int> bufadd{add, N};

q.submit([&](handler &cgh) {
accessor accA{bufA, cgh};
accessor accB{bufB, cgh};
accessor accadd{bufadd, cgh};
cgh.parallel_for(N, [=](id<1> i) { accadd[i] = accA[i] + accB[i]; });
});
q.ext_oneapi_prod();
}
for (int i = 0; i < N; ++i) {
assert(add[i] == (A[i] + B[i]));
}

// Test that the result of an in-progress addition and multiplication kernel
// is not affected by calling ext_oneapi_prod
{
buffer<int> bufA{A, N};
buffer<int> bufB{B, N};
buffer<int> bufadd{add, N};
buffer<int> bufmult{mult, N};
q.submit([&](handler &cgh) {
accessor accA{bufA, cgh};
accessor accB{bufB, cgh};
accessor accadd{bufadd, cgh};
cgh.parallel_for(N, [=](id<1> i) { accadd[i] = accA[i] + accB[i]; });
});

q.submit([&](handler &cgh) {
accessor accA{bufA, cgh};
accessor accB{bufB, cgh};
accessor accmult{bufmult, cgh};
cgh.parallel_for(N, [=](id<1> i) { accmult[i] = accA[i] * accB[i]; });
});
q.ext_oneapi_prod();
}
for (int i = 0; i < N; ++i) {
assert(add[i] == (A[i] + B[i]) && mult[i] == (A[i] * B[i]));
}
}
1 change: 1 addition & 0 deletions sycl/test/abi/sycl_symbols_linux.dump
Original file line number Diff line number Diff line change
Expand Up @@ -3747,6 +3747,7 @@ _ZNK4sycl3_V15queue12has_propertyINS0_8property5queue16enable_profilingEEEbv
_ZNK4sycl3_V15queue12has_propertyINS0_8property5queue4cuda18use_default_streamEEEbv
_ZNK4sycl3_V15queue12has_propertyINS0_8property5queue8in_orderEEEbv
_ZNK4sycl3_V15queue16ext_oneapi_emptyEv
_ZN4sycl3_V15queue15ext_oneapi_prodEv
_ZNK4sycl3_V15queue16get_backend_infoINS0_4info6device15backend_versionEEENS0_6detail20is_backend_info_descIT_E11return_typeEv
_ZNK4sycl3_V15queue16get_backend_infoINS0_4info6device7versionEEENS0_6detail20is_backend_info_descIT_E11return_typeEv
_ZNK4sycl3_V15queue16get_backend_infoINS0_4info8platform7versionEEENS0_6detail20is_backend_info_descIT_E11return_typeEv
Expand Down
1 change: 1 addition & 0 deletions sycl/test/abi/sycl_symbols_windows.dump
Original file line number Diff line number Diff line change
Expand Up @@ -4085,6 +4085,7 @@
?ext_oneapi_copy@handler@_V1@sycl@@QEAAXUimage_mem_handle@experimental@oneapi@ext@23@0AEBUimage_descriptor@56723@@Z
?ext_oneapi_copy@handler@_V1@sycl@@QEAAXUimage_mem_handle@experimental@oneapi@ext@23@PEAXAEBUimage_descriptor@56723@@Z
?ext_oneapi_copy@handler@_V1@sycl@@QEAAXUimage_mem_handle@experimental@oneapi@ext@23@V?$range@$02@23@AEBUimage_descriptor@56723@PEAX111@Z
?ext_oneapi_prod@queue@_V1@sycl@@QEAAXXZ
?ext_oneapi_copy@queue@_V1@sycl@@QEAA?AVevent@23@PEAX0AEBUimage_descriptor@experimental@oneapi@ext@23@_KAEBUcode_location@detail@23@@Z
?ext_oneapi_copy@queue@_V1@sycl@@QEAA?AVevent@23@PEAX0AEBUimage_descriptor@experimental@oneapi@ext@23@_KAEBV?$vector@Vevent@_V1@sycl@@V?$allocator@Vevent@_V1@sycl@@@std@@@std@@AEBUcode_location@detail@23@@Z
?ext_oneapi_copy@queue@_V1@sycl@@QEAA?AVevent@23@PEAX0AEBUimage_descriptor@experimental@oneapi@ext@23@_KV423@AEBUcode_location@detail@23@@Z
Expand Down
1 change: 1 addition & 0 deletions sycl/unittests/Extensions/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ add_sycl_unittest(ExtensionsTests OBJECT
OneAPISubGroupMask.cpp
USMP2P.cpp
CompositeDevice.cpp
OneAPIProd.cpp
)

add_subdirectory(CommandGraph)
42 changes: 42 additions & 0 deletions sycl/unittests/Extensions/OneAPIProd.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
//==-------- OneAPIProd.cpp --- sycl_ext_oneapi_prod unit tests ------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include <sycl/sycl.hpp>

#include <helpers/PiMock.hpp>

#include <gtest/gtest.h>

using namespace sycl;

static bool QueueFlushed = false;

static pi_result redefinedQueueFlush(pi_queue Queue) {
QueueFlushed = true;
return PI_SUCCESS;
}

TEST(OneAPIProdTest, PiQueueFlush) {
sycl::unittest::PiMock Mock(backend::ext_oneapi_level_zero);
sycl::platform Plt = Mock.getPlatform();
Mock.redefine<detail::PiApiKind::piQueueFlush>(redefinedQueueFlush);
context Ctx{Plt};
queue Queue{Ctx, default_selector_v};
Queue.ext_oneapi_prod();
EXPECT_TRUE(QueueFlushed);
sycl::ext::oneapi::experimental::command_graph Graph(Ctx, Queue.get_device());
Graph.begin_recording(Queue);
try {
Queue.ext_oneapi_prod(); // flushing while graph is recording is not allowed
FAIL() << "Expected exception when calling ext_oneapi_prod() during graph "
"recording not seen.";
} catch (exception &ex) {
EXPECT_EQ(ex.code(), make_error_code(errc::invalid));
}
Graph.end_recording(Queue);
}