Skip to content

[SYCL] Adds support for atomic fence capabilities device queries #8586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
435d760
Adds preliminar support for atomic_fence_order_capabilities.
maarquitos14 Mar 1, 2023
d126996
Adds support for atomic fence capabilities device queries.
maarquitos14 Mar 8, 2023
fbb2998
Adds tests for atomic fence capabilities device queries.
maarquitos14 Mar 8, 2023
8e07de6
Updateds atomic fence capabilities unittest.
maarquitos14 Mar 9, 2023
8327cb0
Updates ABI test.
maarquitos14 Mar 9, 2023
46e315a
Reverts involuntary change.
maarquitos14 Mar 9, 2023
4bad01d
Updates atomic fence capabilities unittest.
maarquitos14 Mar 9, 2023
3a24574
Updates PI version.
maarquitos14 Mar 9, 2023
eeb02e9
Fixes compilation error in CUDA.
maarquitos14 Mar 9, 2023
8829169
Update atomic fence capabilities unittest header.
maarquitos14 Mar 9, 2023
fc36ecc
Merge remote-tracking branch 'intel/origin/sycl' into maronas/atomic_…
maarquitos14 Mar 10, 2023
0ce9beb
Fixes clang-format issue.
maarquitos14 Mar 10, 2023
60132b7
Fixes clang-format issues.
maarquitos14 Mar 10, 2023
36192ee
SYCL should always return memory_scope::work_item.
maarquitos14 Mar 10, 2023
c1cf809
Addressing code review concerncs.
maarquitos14 Mar 13, 2023
bf12a92
Reverts unrelated changes.
maarquitos14 Mar 16, 2023
9548073
Addresses code review comments.
maarquitos14 Mar 16, 2023
4c59edb
Reverts unrelated clang-format changes.
maarquitos14 Mar 17, 2023
3ab7991
Reverts unrelated clang-format changes.
maarquitos14 Mar 17, 2023
f19eb4c
Reverts unrelated clang-format changes.
maarquitos14 Mar 17, 2023
9e7947c
Sets return values correctly in piDeviceGetInfo for atomic fence capa…
maarquitos14 Mar 17, 2023
806e054
Addresses code review comments.
maarquitos14 Mar 22, 2023
c55f52a
Addresses code review comments.
maarquitos14 Mar 22, 2023
b01fef5
Merge remote-tracking branch 'intel/origin/sycl' into maronas/atomic_…
maarquitos14 Mar 22, 2023
30bc569
Merge remote-tracking branch 'intel/origin/sycl' into maronas/atomic_…
maarquitos14 Mar 23, 2023
1571410
Addresses code review comments.
maarquitos14 Mar 23, 2023
cda1cd3
Merge remote-tracking branch 'intel/origin/sycl' into maronas/atomic_…
maarquitos14 Mar 23, 2023
45dd762
Addresses code review comments.
maarquitos14 Mar 23, 2023
d4a5d37
Fixes compilation error in AtomicMemoryOrderCapabilities unittest.
maarquitos14 Mar 23, 2023
918b923
Fixes compilation error in AtomicFenceCapabilities unittest.
maarquitos14 Mar 23, 2023
6490a36
Updates Windows ABI.
maarquitos14 Mar 23, 2023
924b9b7
Removes file pushed accidentally.
maarquitos14 Mar 23, 2023
879c096
Adds ur2pi value conversion.
maarquitos14 Mar 23, 2023
aea9486
Merges branches with common code.
maarquitos14 Mar 23, 2023
2522df7
Merge remote-tracking branch 'intel/origin/sycl' into maronas/atomic_…
maarquitos14 Mar 24, 2023
8d37dfb
Fixes merge issue.
maarquitos14 Mar 24, 2023
2e1d848
Addresses code review comments.
maarquitos14 Mar 24, 2023
38b415f
Adds context query for atomic_fence_capabilities.
maarquitos14 Mar 27, 2023
20f5e18
Updates Windows ABI.
maarquitos14 Mar 27, 2023
bef2e36
Returns minimum mandated capabilities for atomic capabilities in HIP.
maarquitos14 Mar 27, 2023
626b231
Merge remote-tracking branch 'intel/origin/sycl' into maronas/atomic_…
maarquitos14 Mar 27, 2023
cc6166e
Raise errors when querying context for atomic capabilities.
maarquitos14 Mar 27, 2023
ffc9b92
Fixes compilation error in HIP.
maarquitos14 Mar 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions sycl/include/sycl/detail/pi.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,11 @@
// 12.22 Add piGetDeviceAndHostTimer to query device wall-clock timestamp
// 12.23 Added new piextEnqueueDeviceGlobalVariableWrite and
// piextEnqueueDeviceGlobalVariableRead functions.
// 12.24 Added PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbenzie: please take this change to Unified Runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES for piDeviceGetInfo.

#define _PI_H_VERSION_MAJOR 12
#define _PI_H_VERSION_MINOR 23
#define _PI_H_VERSION_MINOR 24

#define _PI_STRING_HELPER(a) #a
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
Expand Down Expand Up @@ -186,7 +188,7 @@ typedef enum : pi_uint64 {
PI_DEVICE_TYPE_CPU = (1 << 1), ///< A PI device that is the host processor.
PI_DEVICE_TYPE_GPU = (1 << 2), ///< A PI device that is a GPU.
PI_DEVICE_TYPE_ACC = (1 << 3), ///< A PI device that is a
///< dedicated accelerator.
///< dedicated accelerator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change the layout of those comments, unless clang format complains at it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local clang-format overreacted. I will revert the changes.

PI_DEVICE_TYPE_CUSTOM = (1 << 4) ///< A PI device that is a custom device.
} _pi_device_type;

Expand Down Expand Up @@ -313,6 +315,8 @@ typedef enum {
PI_DEVICE_INFO_ATOMIC_64 = 0x10110,
PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES = 0x10111,
PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES = 0x11000,
PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES = 0x10114,
PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES = 0x10115,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rename these and use contiguous values from 0x1FFFF just below

Suggested change
PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES = 0x10114,
PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES = 0x10115,
PI_EXT_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES = 0x10114,
PI_EXT_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES = 0x10115,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file already has PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES and PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES. Should I change those to add EXT too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's up to you if you want to fix unrelated misses.

PI_DEVICE_INFO_GPU_HW_THREADS_PER_EU = 0x10112,
PI_DEVICE_INFO_BACKEND_VERSION = 0x10113,
// Return whether bfloat16 math functions are supported by device
Expand Down Expand Up @@ -561,6 +565,19 @@ constexpr pi_memory_scope_capabilities PI_MEMORY_SCOPE_WORK_GROUP = 0x04;
constexpr pi_memory_scope_capabilities PI_MEMORY_SCOPE_DEVICE = 0x08;
constexpr pi_memory_scope_capabilities PI_MEMORY_SCOPE_SYSTEM = 0x10;

// CL equivalents are only available for OpenCL version 3.0
#define PI_DEVICE_ATOMIC_FENCE_CAPABILITIES 0x1064
using pi_device_atomic_capabilities = pi_bitfield;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_ORDER_RELAXED = 0x01;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_ORDER_ACQ_REL = 0x02;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_ORDER_SEQ_CST = 0x04;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_SCOPE_WORK_ITEM = 0x08;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_SCOPE_WORK_GROUP =
0x10;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_SCOPE_DEVICE = 0x20;
constexpr pi_device_atomic_capabilities PI_DEVICE_ATOMIC_SCOPE_ALL_DEVICES =
0x40;

typedef enum {
PI_PROFILING_INFO_COMMAND_QUEUED = 0x1280,
PI_PROFILING_INFO_COMMAND_SUBMIT = 0x1281,
Expand Down Expand Up @@ -611,13 +628,13 @@ using pi_queue_properties = pi_bitfield;
constexpr pi_queue_properties PI_QUEUE_FLAGS = -1;
constexpr pi_queue_properties PI_QUEUE_COMPUTE_INDEX = -2;
// clang-format off
constexpr pi_queue_properties PI_QUEUE_FLAG_OUT_OF_ORDER_EXEC_MODE_ENABLE = (1 << 0);
constexpr pi_queue_properties PI_QUEUE_FLAG_PROFILING_ENABLE = (1 << 1);
constexpr pi_queue_properties PI_QUEUE_FLAG_ON_DEVICE = (1 << 2);
constexpr pi_queue_properties PI_QUEUE_FLAG_ON_DEVICE_DEFAULT = (1 << 3);
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_DISCARD_EVENTS = (1 << 4);
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_PRIORITY_LOW = (1 << 5);
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_PRIORITY_HIGH = (1 << 6);
constexpr pi_queue_properties PI_QUEUE_FLAG_OUT_OF_ORDER_EXEC_MODE_ENABLE = (1 << 0);
constexpr pi_queue_properties PI_QUEUE_FLAG_PROFILING_ENABLE = (1 << 1);
constexpr pi_queue_properties PI_QUEUE_FLAG_ON_DEVICE = (1 << 2);
constexpr pi_queue_properties PI_QUEUE_FLAG_ON_DEVICE_DEFAULT = (1 << 3);
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_DISCARD_EVENTS = (1 << 4);
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_PRIORITY_LOW = (1 << 5);
constexpr pi_queue_properties PI_EXT_ONEAPI_QUEUE_FLAG_PRIORITY_HIGH = (1 << 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the whitespace change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local clang-format overreacted. I will revert the changes.

// clang-format on

using pi_result = _pi_result;
Expand Down
6 changes: 6 additions & 0 deletions sycl/include/sycl/info/device_traits.def
Original file line number Diff line number Diff line change
Expand Up @@ -112,9 +112,15 @@ __SYCL_PARAM_TRAITS_SPEC(device, host_unified_memory, bool,
__SYCL_PARAM_TRAITS_SPEC(device, atomic_memory_order_capabilities,
std::vector<sycl::memory_order>,
PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES)
__SYCL_PARAM_TRAITS_SPEC(device, atomic_fence_order_capabilities,
std::vector<sycl::memory_order>,
PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES)
__SYCL_PARAM_TRAITS_SPEC(device, atomic_memory_scope_capabilities,
std::vector<sycl::memory_scope>,
PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES)
__SYCL_PARAM_TRAITS_SPEC(device, atomic_fence_scope_capabilities,
std::vector<sycl::memory_scope>,
PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES)
__SYCL_PARAM_TRAITS_SPEC(device, profiling_timer_resolution, size_t,
PI_DEVICE_INFO_PROFILING_TIMER_RESOLUTION)
__SYCL_PARAM_TRAITS_SPEC(device, is_endian_little, bool,
Expand Down
3 changes: 3 additions & 0 deletions sycl/include/sycl/info/info_desc.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ namespace device {
// atomic_fence_order_capabilities, atomic_fence_scope_capabilities, aspects,
// il_version.

struct atomic_fence_order_capabilities;
struct atomic_fence_scope_capabilities;

#define __SYCL_PARAM_TRAITS_DEPRECATED(Desc, Message) \
struct __SYCL2020_DEPRECATED(Message) Desc;
#include <sycl/info/device_traits_deprecated.def>
Expand Down
4 changes: 4 additions & 0 deletions sycl/plugins/cuda/pi_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1315,6 +1315,10 @@ pi_result cuda_piDeviceGetInfo(pi_device device, pi_device_info param_name,
return getInfo(param_value_size, param_value, param_value_size_ret,
capabilities);
}
case PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES:
case PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES:
// There is no way to query this in the backend
return PI_ERROR_INVALID_ARG_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this error reported to the end user? Should we better use set a plugin specific error and have SYCL RT use piPluginGetLastError to retrieve/report it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jchlanda @npmiller could you please take a look at the changes I did in HIP/CUDA to address this comment?

case PI_EXT_ONEAPI_DEVICE_INFO_BFLOAT16_MATH_FUNCTIONS: {
int major = 0;
sycl::detail::pi::assertion(
Expand Down
2 changes: 2 additions & 0 deletions sycl/plugins/esimd_emulator/pi_esimd_emulator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -805,6 +805,8 @@ pi_result piDeviceGetInfo(pi_device Device, pi_device_info ParamName,
CASE_PI_UNSUPPORTED(PI_DEVICE_INFO_IMAGE_SRGB)
CASE_PI_UNSUPPORTED(PI_DEVICE_INFO_ATOMIC_64)
CASE_PI_UNSUPPORTED(PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES)
CASE_PI_UNSUPPORTED(PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES)
CASE_PI_UNSUPPORTED(PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES)
CASE_PI_UNSUPPORTED(PI_EXT_ONEAPI_DEVICE_INFO_MAX_GLOBAL_WORK_GROUPS)
CASE_PI_UNSUPPORTED(PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_1D)
CASE_PI_UNSUPPORTED(PI_EXT_ONEAPI_DEVICE_INFO_MAX_WORK_GROUPS_2D)
Expand Down
15 changes: 9 additions & 6 deletions sycl/plugins/hip/pi_hip.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -529,9 +529,9 @@ hipStream_t _pi_queue::get_next_transfer_stream() {
_pi_event::_pi_event(pi_command_type type, pi_context context, pi_queue queue,
hipStream_t stream, pi_uint32 stream_token)
: commandType_{type}, refCount_{1}, hasBeenWaitedOn_{false},
isRecorded_{false}, isStarted_{false},
streamToken_{stream_token}, evEnd_{nullptr}, evStart_{nullptr},
evQueued_{nullptr}, queue_{queue}, stream_{stream}, context_{context} {
isRecorded_{false}, isStarted_{false}, streamToken_{stream_token},
evEnd_{nullptr}, evStart_{nullptr}, evQueued_{nullptr}, queue_{queue},
stream_{stream}, context_{context} {

assert(type != PI_COMMAND_TYPE_USER);

Expand Down Expand Up @@ -685,8 +685,8 @@ pi_result enqueueEventWait(pi_queue queue, pi_event event) {
}

_pi_program::_pi_program(pi_context ctxt)
: module_{nullptr}, binary_{},
binarySizeInBytes_{0}, refCount_{1}, context_{ctxt} {
: module_{nullptr}, binary_{}, binarySizeInBytes_{0}, refCount_{1},
context_{ctxt} {
hip_piContextRetain(context_);
}

Expand Down Expand Up @@ -1865,6 +1865,9 @@ pi_result hip_piDeviceGetInfo(pi_device device, pi_device_info param_name,
}
// TODO: Investigate if this information is available on HIP.
case PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES:
// There is no way to query this in the backend
case PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES:
case PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES:
case PI_DEVICE_INFO_DEVICE_ID:
case PI_DEVICE_INFO_PCI_ADDRESS:
case PI_DEVICE_INFO_GPU_EU_COUNT:
Expand Down Expand Up @@ -5324,7 +5327,7 @@ pi_result hip_piextEnqueueDeviceGlobalVariableRead(
// Windows: dynamically loaded plugins might have been unloaded already
// when this is called. Sycl RT holds onto the PI plugin so it can be
// called safely. But this is not transitive. If the PI plugin in turn
// dynamically loaded a different DLL, that may have been unloaded.
// dynamically loaded a different DLL, that may have been unloaded.
// TODO: add a global variable lifetime management code here (see
// pi_level_zero.cpp for reference) Currently this is just a NOOP.
pi_result hip_piTearDown(void *PluginParameter) {
Expand Down
93 changes: 93 additions & 0 deletions sycl/plugins/opencl/pi_opencl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,99 @@ pi_result piDeviceGetInfo(pi_device device, pi_device_info paramName,
case PI_DEVICE_INFO_ATOMIC_MEMORY_ORDER_CAPABILITIES:
case PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES:
return PI_ERROR_INVALID_VALUE;
case PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES: {
// Initialize result to minimum mandated capabilities according to
// SYCL2020 4.6.3.2
pi_memory_order_capabilities result =
PI_MEMORY_ORDER_RELAXED | PI_MEMORY_ORDER_ACQUIRE |
PI_MEMORY_ORDER_RELEASE | PI_MEMORY_ORDER_ACQ_REL;

OCLV::OpenCLVersion devVer;

cl_device_id deviceID = cast<cl_device_id>(device);
cl_int ret_err = getDeviceVersion(deviceID, devVer);
if (ret_err != CL_SUCCESS)
return static_cast<pi_result>(ret_err);

pi_device_atomic_capabilities devCapabilities = 0;
if (devVer >= OCLV::V3_0) {
ret_err = clGetDeviceInfo(deviceID, PI_DEVICE_ATOMIC_FENCE_CAPABILITIES,
sizeof(pi_device_atomic_capabilities),
&devCapabilities, nullptr);
if (ret_err != CL_SUCCESS)
return static_cast<pi_result>(ret_err);
assert(devCapabilities && PI_DEVICE_ATOMIC_ORDER_RELAXED &&
"Violates minimum mandated guarantee");
assert(devCapabilities && PI_DEVICE_ATOMIC_ORDER_ACQ_REL &&
"Violates minimum mandated guarantee");

if (devCapabilities && PI_DEVICE_ATOMIC_ORDER_SEQ_CST) {
result |= PI_MEMORY_ORDER_SEQ_CST;
}

} else {
// This info is only available in OpenCL version >= 3.0
// Just return minimum mandated capabilities for older versions.
// OpenCL 1.x minimum mandated capabilities are RELAXED | ACQ_REL, we
// already initialized using these.
if (devVer >= OCLV::V2_0) {
// OpenCL 2.x minimum mandated capabilities are RELAXED | ACQ_REL |
// SEQ_CST
result |= PI_MEMORY_ORDER_SEQ_CST;
}
}
std::memcpy(paramValue, &result, sizeof(result));
return PI_SUCCESS;
}
case PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES: {
// Initialize result to minimum mandated capabilities according to
// SYCL2020 4.6.3.2.
pi_memory_scope_capabilities result = PI_MEMORY_SCOPE_WORK_ITEM |
PI_MEMORY_SCOPE_SUB_GROUP |
PI_MEMORY_SCOPE_WORK_GROUP;

OCLV::OpenCLVersion devVer;

cl_device_id deviceID = cast<cl_device_id>(device);
cl_int ret_err = getDeviceVersion(deviceID, devVer);
if (ret_err != CL_SUCCESS)
return static_cast<pi_result>(ret_err);

pi_device_atomic_capabilities devCapabilities = 0;
if (devVer >= OCLV::V3_0) {
ret_err = clGetDeviceInfo(deviceID, PI_DEVICE_ATOMIC_FENCE_CAPABILITIES,
sizeof(pi_device_atomic_capabilities),
&devCapabilities, nullptr);
if (ret_err != CL_SUCCESS)
return static_cast<pi_result>(ret_err);
assert(devCapabilities && PI_DEVICE_ATOMIC_SCOPE_WORK_GROUP &&
"Violates minimum mandated guarantee");

// Because scopes are hierarchical, wider scopes support all narrower
// scopes. SUB_GROUP and WORK_ITEM was already included in the
// initialization, since WORK_GROUP is mandated minimum capality.
if (devCapabilities && PI_DEVICE_ATOMIC_SCOPE_DEVICE) {
result |= PI_MEMORY_SCOPE_DEVICE;
}

if (devCapabilities && PI_DEVICE_ATOMIC_SCOPE_ALL_DEVICES) {
result |= PI_MEMORY_SCOPE_SYSTEM;
}

} else {
// This info is only available in OpenCL version >= 3.0
// Just return minimum mandated capabilities for older versions.
// OpenCL 1.x minimum mandated capabilities are WORK_GROUP, we
// already initialized using it.
if (devVer >= OCLV::V2_0) {
// OpenCL 2.x minimum mandated capabilities are WORK_GROUP | DEVICE |
// ALL_DEVICES
result |= PI_MEMORY_SCOPE_DEVICE | PI_MEMORY_SCOPE_SYSTEM;
}
}
std::memcpy(paramValue, &result, sizeof(result));
return PI_SUCCESS;
}
case PI_DEVICE_INFO_ATOMIC_64: {
cl_int ret_err = CL_SUCCESS;
cl_bool result = CL_FALSE;
Expand Down
4 changes: 4 additions & 0 deletions sycl/plugins/unified_runtime/pi2ur.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -475,6 +475,10 @@ inline pi_result piDeviceGetInfo(pi_device Device, pi_device_info ParamName,
(ur_device_info_t)UR_DEVICE_INFO_BFLOAT16},
{PI_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES,
(ur_device_info_t)UR_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES},
{PI_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES,
(ur_device_info_t)UR_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES},
{PI_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES,
(ur_device_info_t)UR_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES},
};

auto InfoType = InfoMapping.find(ParamName);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===-----------------------------------------------------------------===//
#include <sycl/detail/pi.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR should not include PI

Suggested change
#include <sycl/detail/pi.h>


#include <algorithm>
#include <climits>
Expand Down Expand Up @@ -1164,6 +1165,26 @@ UR_APIEXPORT ur_result_t UR_APICALL urDeviceGetInfo(
// bfloat16 math functions are not yet supported on Intel GPUs.
return ReturnValue(bool{false});
}
case UR_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES: {
// There are no explicit restrictions in L0 programming guide, so assume all
// are supported
pi_memory_order_capabilities result =
PI_MEMORY_ORDER_RELAXED | PI_MEMORY_ORDER_ACQUIRE |
PI_MEMORY_ORDER_RELEASE | PI_MEMORY_ORDER_ACQ_REL |
PI_MEMORY_ORDER_SEQ_CST;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should write this in UR (extended as needed) and add a conversion in piDeviceGetInfo


return ReturnValue(result);
}
case UR_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES: {
// There are no explicit restrictions in L0 programming guide, so assume all
// are supported
pi_memory_scope_capabilities result =
PI_MEMORY_SCOPE_WORK_ITEM | PI_MEMORY_SCOPE_SUB_GROUP |
PI_MEMORY_SCOPE_WORK_GROUP | PI_MEMORY_SCOPE_DEVICE |
PI_MEMORY_SCOPE_SYSTEM;

return ReturnValue(result);
}

// TODO: Implement.
case UR_DEVICE_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES:
Expand Down
4 changes: 4 additions & 0 deletions sycl/plugins/unified_runtime/ur/ur.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ const int UR_EXT_DEVICE_INFO_FREE_MEMORY = UR_EXT_DEVICE_INFO_END - 13;
// const int ZER_EXT_DEVICE_INFO_DEVICE_ID = UR_EXT_DEVICE_INFO_END - 14;
// const int ZER_EXT_DEVICE_INFO_IMAGE_MAX_ARRAY_SIZE =
// UR_DEVICE_INFO_IMAGE_MAX_ARRAY_SIZE;
const int UR_DEVICE_INFO_ATOMIC_FENCE_ORDER_CAPABILITIES =
UR_EXT_DEVICE_INFO_END - 16;
const int UR_DEVICE_INFO_ATOMIC_FENCE_SCOPE_CAPABILITIES =
UR_EXT_DEVICE_INFO_END - 17;
Comment on lines +46 to +49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbenzie: please take this change into UR


const ur_device_info_t UR_EXT_DEVICE_INFO_OPENCL_C_VERSION =
(ur_device_info_t)0x103D;
Expand Down
40 changes: 40 additions & 0 deletions sycl/source/detail/device_info.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,19 @@ struct get_device_info_impl<std::vector<memory_order>,
}
};

// Specialization for atomic_fence_order_capabilities, PI returns a bitfield
template <>
struct get_device_info_impl<std::vector<memory_order>,
info::device::atomic_fence_order_capabilities> {
static std::vector<memory_order> get(RT::PiDevice dev, const plugin &Plugin) {
pi_memory_order_capabilities result;
Plugin.call_nocheck<PiApiKind::piDeviceGetInfo>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why nocheck? Where do we perform error handling?

dev, PiInfoCode<info::device::atomic_fence_order_capabilities>::value,
sizeof(pi_memory_order_capabilities), &result, nullptr);
return readMemoryOrderBitfield(result);
}
};

// Specialization for atomic_memory_scope_capabilities, PI returns a bitfield
template <>
struct get_device_info_impl<std::vector<memory_scope>,
Expand All @@ -288,6 +301,19 @@ struct get_device_info_impl<std::vector<memory_scope>,
}
};

// Specialization for atomic_fence_scope_capabilities, PI returns a bitfield
template <>
struct get_device_info_impl<std::vector<memory_scope>,
info::device::atomic_fence_scope_capabilities> {
static std::vector<memory_scope> get(RT::PiDevice dev, const plugin &Plugin) {
pi_memory_scope_capabilities result;
Plugin.call_nocheck<PiApiKind::piDeviceGetInfo>(
dev, PiInfoCode<info::device::atomic_fence_scope_capabilities>::value,
sizeof(pi_memory_scope_capabilities), &result, nullptr);
return readMemoryScopeBitfield(result);
}
};

// Specialization for bf16 math functions
template <>
struct get_device_info_impl<bool,
Expand Down Expand Up @@ -1005,13 +1031,27 @@ get_device_info_host<info::device::atomic_memory_order_capabilities>() {
memory_order::acq_rel, memory_order::seq_cst};
}

template <>
inline std::vector<memory_order>
get_device_info_host<info::device::atomic_fence_order_capabilities>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder when we will be able to stop specializing that for host device, because it doesn't really exists anymore

return {memory_order::relaxed, memory_order::acquire, memory_order::release,
memory_order::acq_rel};
}

template <>
inline std::vector<memory_scope>
get_device_info_host<info::device::atomic_memory_scope_capabilities>() {
return {memory_scope::work_item, memory_scope::sub_group,
memory_scope::work_group, memory_scope::device, memory_scope::system};
}

template <>
inline std::vector<memory_scope>
get_device_info_host<info::device::atomic_fence_scope_capabilities>() {
return {memory_scope::work_item, memory_scope::sub_group,
memory_scope::work_group, memory_scope::device, memory_scope::system};
}

template <>
inline bool
get_device_info_host<info::device::ext_oneapi_bfloat16_math_functions>() {
Expand Down
Loading