Skip to content

[SYCL] Add prototype of ExtendedAtomics features #1826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 26, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sycl/doc/extensions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ DPC++ extensions status:
| [SYCL_INTEL_deduction_guides](deduction_guides/SYCL_INTEL_deduction_guides.asciidoc) | Supported | |
| [SYCL_INTEL_device_specific_kernel_queries](DeviceSpecificKernelQueries/SYCL_INTEL_device_specific_kernel_queries.asciidoc) | Proposal | |
| [SYCL_INTEL_enqueue_barrier](EnqueueBarrier/enqueue_barrier.asciidoc) | Supported(OpenCL, Level Zero) | |
| [SYCL_INTEL_extended_atomics](ExtendedAtomics/SYCL_INTEL_extended_atomics.asciidoc) | Proposal | |
| [SYCL_INTEL_extended_atomics](ExtendedAtomics/SYCL_INTEL_extended_atomics.asciidoc) | Partially supported(OpenCL: CPU, GPU) | Not supported: pointer types |
| [SYCL_INTEL_group_algorithms](GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc) | Supported(OpenCL) | |
| [SYCL_INTEL_group_mask](./GroupMask/SYCL_INTEL_group_mask.asciidoc) | Proposal | |
| [FPGA selector](IntelFPGA/FPGASelector.md) | Supported | |
Expand Down
1 change: 1 addition & 0 deletions sycl/include/CL/__spirv/spirv_ops.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ extern SYCL_EXTERNAL TempRetT __spirv_ImageSampleExplicitLod(SampledType,
macro(__attribute__((opencl_local)), Arg)

__SPIRV_ATOMICS(__SPIRV_ATOMIC_FLOAT, float)
__SPIRV_ATOMICS(__SPIRV_ATOMIC_FLOAT, double)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question: Is 'half' type expected to be supported too eventually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. OpenCL only defines atomics for 32- and 64-bit types, which is what I've started with here.

C++ allows an atomic_ref to be constructed from any TriviallyCopyable type, and I think eventually that's where we'd like to end up. Whether the atomic_ref for a given type is implemented on top of native instructions, a compare-exchange loop, a global lock, etc, would then be implementation-defined and device-specific.

We haven't yet defined exactly what sort of device queries we'll want to support for testing atomic functionality. If you have any ideas for what the feature sets should look like, or if there are any queries that would be useful for implementing reductions, please let me know!

__SPIRV_ATOMICS(__SPIRV_ATOMIC_SIGNED, int)
__SPIRV_ATOMICS(__SPIRV_ATOMIC_SIGNED, long)
__SPIRV_ATOMICS(__SPIRV_ATOMIC_SIGNED, long long)
Expand Down
1 change: 1 addition & 0 deletions sycl/include/CL/sycl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include <CL/sycl/handler.hpp>
#include <CL/sycl/id.hpp>
#include <CL/sycl/image.hpp>
#include <CL/sycl/intel/atomic.hpp>
#include <CL/sycl/intel/builtins.hpp>
#include <CL/sycl/intel/function_pointer.hpp>
#include <CL/sycl/intel/group_algorithm.hpp>
Expand Down
7 changes: 7 additions & 0 deletions sycl/include/CL/sycl/detail/defines.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,10 @@
#warning "No assumptions will be emitted due to no __builtin_assume available"
#endif
#endif

// inline constexpr is a C++17 feature
#if __cplusplus >= 201703L
#define __SYCL_INLINE_CONSTEXPR inline constexpr
#else
#define __SYCL_INLINE_CONSTEXPR static constexpr
#endif
223 changes: 222 additions & 1 deletion sycl/include/CL/sycl/detail/spirv.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <CL/__spirv/spirv_vars.hpp>
#include <CL/sycl/detail/generic_type_traits.hpp>
#include <CL/sycl/detail/type_traits.hpp>
#include <CL/sycl/intel/atomic_enums.hpp>

#ifdef __SYCL_DEVICE_ONLY__
__SYCL_INLINE_NAMESPACE(cl) {
Expand All @@ -28,7 +29,7 @@ template <int Dimensions> struct group_scope<group<Dimensions>> {
static constexpr __spv::Scope::Flag value = __spv::Scope::Flag::Workgroup;
};

template <> struct group_scope<intel::sub_group> {
template <> struct group_scope<::cl::sycl::intel::sub_group> {
static constexpr __spv::Scope::Flag value = __spv::Scope::Flag::Subgroup;
};

Expand Down Expand Up @@ -69,6 +70,226 @@ T GroupBroadcast(T x, id<Dimensions> local_id) {
return __spirv_GroupBroadcast(group_scope<Group>::value, ocl_x, ocl_id);
}

// Single happens-before means semantics should always apply to all spaces
// Although consume is unsupported, forwarding to acquire is valid
static inline constexpr __spv::MemorySemanticsMask::Flag
getMemorySemanticsMask(intel::memory_order Order) {
__spv::MemorySemanticsMask::Flag SpvOrder = __spv::MemorySemanticsMask::None;
switch (Order) {
case intel::memory_order::relaxed:
SpvOrder = __spv::MemorySemanticsMask::None;
break;
case intel::memory_order::__consume_unsupported:
case intel::memory_order::acquire:
SpvOrder = __spv::MemorySemanticsMask::Acquire;
break;
case intel::memory_order::release:
SpvOrder = __spv::MemorySemanticsMask::Release;
break;
case intel::memory_order::acq_rel:
SpvOrder = __spv::MemorySemanticsMask::AcquireRelease;
break;
case intel::memory_order::seq_cst:
SpvOrder = __spv::MemorySemanticsMask::SequentiallyConsistent;
break;
}
return static_cast<__spv::MemorySemanticsMask::Flag>(
SpvOrder | __spv::MemorySemanticsMask::SubgroupMemory |
__spv::MemorySemanticsMask::WorkgroupMemory |
__spv::MemorySemanticsMask::CrossWorkgroupMemory);
}

static inline constexpr __spv::Scope::Flag getScope(intel::memory_scope Scope) {
switch (Scope) {
case intel::memory_scope::work_item:
return __spv::Scope::Invocation;
case intel::memory_scope::sub_group:
return __spv::Scope::Subgroup;
case intel::memory_scope::work_group:
return __spv::Scope::Workgroup;
case intel::memory_scope::device:
return __spv::Scope::Device;
case intel::memory_scope::system:
return __spv::Scope::CrossDevice;
}
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicCompareExchange(multi_ptr<T, AddressSpace> MPtr,
intel::memory_scope Scope, intel::memory_order Success,
intel::memory_order Failure, T Desired, T Expected) {
auto SPIRVSuccess = getMemorySemanticsMask(Success);
auto SPIRVFailure = getMemorySemanticsMask(Failure);
auto SPIRVScope = getScope(Scope);
auto *Ptr = MPtr.get();
return __spirv_AtomicCompareExchange(Ptr, SPIRVScope, SPIRVSuccess,
SPIRVFailure, Desired, Expected);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_floating_point<T>::value, T>
AtomicCompareExchange(multi_ptr<T, AddressSpace> MPtr,
intel::memory_scope Scope, intel::memory_order Success,
intel::memory_order Failure, T Desired, T Expected) {
using I = detail::make_unsinged_integer_t<T>;
auto SPIRVSuccess = getMemorySemanticsMask(Success);
auto SPIRVFailure = getMemorySemanticsMask(Failure);
auto SPIRVScope = getScope(Scope);
auto *PtrInt =
reinterpret_cast<typename multi_ptr<I, AddressSpace>::pointer_t>(
MPtr.get());
I DesiredInt = detail::bit_cast<I>(Desired);
I ExpectedInt = detail::bit_cast<I>(Expected);
I ResultInt = __spirv_AtomicCompareExchange(
PtrInt, SPIRVScope, SPIRVSuccess, SPIRVFailure, DesiredInt, ExpectedInt);
return detail::bit_cast<T>(ResultInt);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicLoad(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicLoad(Ptr, SPIRVScope, SPIRVOrder);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_floating_point<T>::value, T>
AtomicLoad(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order) {
using I = detail::make_unsinged_integer_t<T>;
auto *PtrInt =
reinterpret_cast<typename multi_ptr<I, AddressSpace>::pointer_t>(
MPtr.get());
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
I ResultInt = __spirv_AtomicLoad(PtrInt, SPIRVScope, SPIRVOrder);
return detail::bit_cast<T>(ResultInt);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value>
AtomicStore(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
__spirv_AtomicStore(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_floating_point<T>::value>
AtomicStore(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
using I = detail::make_unsinged_integer_t<T>;
auto *PtrInt =
reinterpret_cast<typename multi_ptr<I, AddressSpace>::pointer_t>(
MPtr.get());
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
I ValueInt = detail::bit_cast<I>(Value);
__spirv_AtomicStore(PtrInt, SPIRVScope, SPIRVOrder, ValueInt);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicExchange(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicExchange(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_floating_point<T>::value, T>
AtomicExchange(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
using I = detail::make_unsinged_integer_t<T>;
auto *PtrInt =
reinterpret_cast<typename multi_ptr<I, AddressSpace>::pointer_t>(
MPtr.get());
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
I ValueInt = detail::bit_cast<I>(Value);
I ResultInt =
__spirv_AtomicExchange(PtrInt, SPIRVScope, SPIRVOrder, ValueInt);
return detail::bit_cast<T>(ResultInt);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicIAdd(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicIAdd(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicISub(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicISub(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicAnd(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicAnd(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicOr(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicOr(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicXor(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicXor(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicMin(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicMin(Ptr, SPIRVScope, SPIRVOrder, Value);
}

template <typename T, access::address_space AddressSpace>
inline typename detail::enable_if_t<std::is_integral<T>::value, T>
AtomicMax(multi_ptr<T, AddressSpace> MPtr, intel::memory_scope Scope,
intel::memory_order Order, T Value) {
auto *Ptr = MPtr.get();
auto SPIRVOrder = getMemorySemanticsMask(Order);
auto SPIRVScope = getScope(Scope);
return __spirv_AtomicMax(Ptr, SPIRVScope, SPIRVOrder, Value);
}

} // namespace spirv
} // namespace detail
} // namespace sycl
Expand Down
13 changes: 13 additions & 0 deletions sycl/include/CL/sycl/intel/atomic.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
//==---------------- atomic.hpp - SYCL_INTEL_extended_atomics --------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

#include <CL/sycl/intel/atomic_enums.hpp>
#include <CL/sycl/intel/atomic_fence.hpp>
#include <CL/sycl/intel/atomic_ref.hpp>
103 changes: 103 additions & 0 deletions sycl/include/CL/sycl/intel/atomic_enums.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
//==---------------- atomic_enums.hpp - SYCL_INTEL_extended_atomics enums --==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

#include <CL/__spirv/spirv_ops.hpp>
#include <CL/sycl/access/access.hpp>
#include <CL/sycl/detail/defines.hpp>
#include <CL/sycl/detail/helpers.hpp>

#ifndef __SYCL_DEVICE_ONLY__
#include <atomic>
#endif
#include <type_traits>

__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace intel {

enum class memory_order : int {
relaxed,
acquire,
__consume_unsupported, // helps optimizer when mapping to std::memory_order
release,
acq_rel,
seq_cst
};
__SYCL_INLINE_CONSTEXPR memory_order memory_order_relaxed =
memory_order::relaxed;
__SYCL_INLINE_CONSTEXPR memory_order memory_order_acquire =
memory_order::acquire;
__SYCL_INLINE_CONSTEXPR memory_order memory_order_release =
memory_order::release;
__SYCL_INLINE_CONSTEXPR memory_order memory_order_acq_rel =
memory_order::acq_rel;
__SYCL_INLINE_CONSTEXPR memory_order memory_order_seq_cst =
memory_order::seq_cst;

enum class memory_scope : int {
work_item,
sub_group,
work_group,
device,
system
};
__SYCL_INLINE_CONSTEXPR memory_scope memory_scope_work_item =
memory_scope::work_item;
__SYCL_INLINE_CONSTEXPR memory_scope memory_scope_sub_group =
memory_scope::sub_group;
__SYCL_INLINE_CONSTEXPR memory_scope memory_scope_work_group =
memory_scope::work_group;
__SYCL_INLINE_CONSTEXPR memory_scope memory_scope_device = memory_scope::device;
__SYCL_INLINE_CONSTEXPR memory_scope memory_scope_system = memory_scope::system;

#ifndef __SYCL_DEVICE_ONLY__
namespace detail {
// Cannot use switch statement in constexpr before C++14
// Nested ternary conditions in else branch required for C++11
#if __cplusplus >= 201402L
static inline constexpr std::memory_order
getStdMemoryOrder(::cl::sycl::intel::memory_order order) {
switch (order) {
case memory_order::relaxed:
return std::memory_order_relaxed;
case memory_order::__consume_unsupported:
return std::memory_order_consume;
case memory_order::acquire:
return std::memory_order_acquire;
case memory_order::release:
return std::memory_order_release;
case memory_order::acq_rel:
return std::memory_order_acq_rel;
case memory_order::seq_cst:
return std::memory_order_seq_cst;
}
}
#else
static inline constexpr std::memory_order
getStdMemoryOrder(::cl::sycl::intel::memory_order order) {
return (order == memory_order::relaxed)
? std::memory_order_relaxed
: (order == memory_order::__consume_unsupported)
? std::memory_order_consume
: (order == memory_order::acquire)
? std::memory_order_acquire
: (order == memory_order::release)
? std::memory_order_release
: (order == memory_order::acq_rel)
? std::memory_order_acq_rel
: std::memory_order_seq_cst;
}
#endif // __cplusplus
} // namespace detail
#endif // __SYCL_DEVICE_ONLY__

} // namespace intel
} // namespace sycl
} // __SYCL_INLINE_NAMESPACE(cl)
Loading