Skip to content

Commit 5231fe4

Browse files
authored
[SYCL][CUDA] bfloat16 in oneapi namespace and also supporting CUDA (#5393)
There is a bug in the verify_logic function in the bfloat16_type.cpp test (C accessor is not written to) - I'm not sure how this did not lead to a failure already. With the bug fixed the test passes for the CUDA backend with this patch. I've added a draft test file that also increases the coverage to test unary minus operator here: intel/llvm-test-suite#889. Note that the unary neg intrinsic added here that is used in unary minus will be pulled down from upstream via e.g. https://reviews.llvm.org/D117887.
1 parent 67b0b41 commit 5231fe4

File tree

4 files changed

+43
-22
lines changed

4 files changed

+43
-22
lines changed

sycl/doc/extensions/experimental/sycl_ext_intel_bf16_conversion.asciidoc renamed to sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= SYCL_INTEL_bf16_conversion
1+
= sycl_ext_oneapi_bfloat16
22

33
:source-highlighter: coderay
44
:coderay-linenums-mode: table
@@ -24,15 +24,15 @@
2424

2525
IMPORTANT: This specification is a draft.
2626

27-
Copyright (c) 2021 Intel Corporation. All rights reserved.
27+
Copyright (c) 2021-2022 Intel Corporation. All rights reserved.
2828

2929
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
3030
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
3131
used by permission by Khronos.
3232

3333
== Dependencies
3434

35-
This extension is written against the SYCL 2020 specification, Revision 3.
35+
This extension is written against the SYCL 2020 specification, Revision 4.
3636

3737
== Status
3838

@@ -48,7 +48,7 @@ products.
4848

4949
== Version
5050

51-
Revision: 3
51+
Revision: 4
5252

5353
== Introduction
5454

@@ -57,7 +57,7 @@ floating-point type(`float`) to `bfloat16` type and vice versa. The extension
5757
doesn't add support for `bfloat16` type as such, instead it uses 16-bit integer
5858
type(`uint16_t`) as a storage for `bfloat16` values.
5959

60-
The purpose of conversion from float to bfloat16 is to reduce ammount of memory
60+
The purpose of conversion from float to bfloat16 is to reduce the amount of memory
6161
required to store floating-point numbers. Computations are expected to be done with
6262
32-bit floating-point values.
6363

@@ -73,7 +73,7 @@ command (e.g. from `parallel_for`).
7373
This extension provides a feature-test macro as described in the core SYCL
7474
specification section 6.3.3 "Feature test macros". Therefore, an implementation
7575
supporting this extension must predefine the macro
76-
`SYCL_EXT_INTEL_BF16_CONVERSION` to one of the values defined in the table
76+
`SYCL_EXT_ONEAPI_BFLOAT16` to one of the values defined in the table
7777
below. Applications can test for the existence of this macro to determine if
7878
the implementation supports this feature, or applications can test the macro’s
7979
value to determine which of the extension’s APIs the implementation supports.
@@ -91,19 +91,19 @@ the implementation supports this feature, or applications can test the macro’s
9191
namespace sycl {
9292
enum class aspect {
9393
...
94-
ext_intel_bf16_conversion
94+
ext_oneapi_bfloat16
9595
}
9696
}
9797
----
9898

99-
If a SYCL device has the `ext_intel_bf16_conversion` aspect, then it natively
99+
If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively
100100
supports conversion of values of `float` type to `bfloat16` and back.
101101

102102
If the device doesn't have the aspect, objects of `bfloat16` class must not be
103103
used in the device code.
104104

105-
**NOTE**: The `ext_intel_bf16_conversion` aspect is not yet supported. The
106-
`bfloat16` class is currently supported only on Xe HP GPU.
105+
**NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported. The
106+
`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU.
107107

108108
== New `bfloat16` class
109109

@@ -115,7 +115,7 @@ mode.
115115
----
116116
namespace sycl {
117117
namespace ext {
118-
namespace intel {
118+
namespace oneapi {
119119
namespace experimental {
120120
121121
class bfloat16 {
@@ -171,7 +171,7 @@ public:
171171
};
172172
173173
} // namespace experimental
174-
} // namespace intel
174+
} // namespace oneapi
175175
} // namespace ext
176176
} // namespace sycl
177177
----
@@ -277,9 +277,9 @@ OP is `==, !=, <, >, <=, >=`
277277
[source]
278278
----
279279
#include <sycl/sycl.hpp>
280-
#include <sycl/ext/intel/experimental/bfloat16.hpp>
280+
#include <sycl/ext/oneapi/experimental/bfloat16.hpp>
281281
282-
using sycl::ext::intel::experimental::bfloat16;
282+
using sycl::ext::oneapi::experimental::bfloat16;
283283
284284
bfloat16 operator+(const bfloat16 &lhs, const bfloat16 &rhs) {
285285
return static_cast<float>(lhs) + static_cast<float>(rhs);
@@ -304,7 +304,7 @@ int main (int argc, char *argv[]) {
304304
sycl::queue deviceQueue{dev};
305305
sycl::buffer<float, 1> buf {data, sycl::range<1> {3}};
306306
307-
if (dev.has(sycl::aspect::ext_intel_bf16_conversion)) {
307+
if (dev.has(sycl::aspect::ext_oneapi_bfloat16)) {
308308
deviceQueue.submit ([&] (sycl::handler& cgh) {
309309
auto numbers = buf.get_access<sycl::access::mode::read_write> (cgh);
310310
cgh.single_task<class simple_kernel> ([=] () {
@@ -332,4 +332,5 @@ None.
332332
Add operator overloadings +
333333
Apply code review suggestions
334334
|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor
335+
|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi
335336
|========================================

sycl/include/CL/sycl/feature_test.hpp.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ namespace sycl {
5555
#define SYCL_EXT_ONEAPI_SUB_GROUP 1
5656
#define SYCL_EXT_ONEAPI_PROPERTIES 1
5757
#define SYCL_EXT_ONEAPI_NATIVE_MATH 1
58-
#define SYCL_EXT_INTEL_BF16_CONVERSION 1
58+
#define SYCL_EXT_ONEAPI_BFLOAT16 1
5959
#define SYCL_EXT_INTEL_DATAFLOW_PIPES 1
6060
#ifdef __clang__
6161
#if __has_extension(sycl_extended_atomics)

sycl/include/sycl/ext/intel/experimental/bfloat16.hpp renamed to sycl/include/sycl/ext/oneapi/experimental/bfloat16.hpp

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@
1414
__SYCL_INLINE_NAMESPACE(cl) {
1515
namespace sycl {
1616
namespace ext {
17-
namespace intel {
17+
namespace oneapi {
1818
namespace experimental {
1919

20-
class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
20+
class bfloat16 {
2121
using storage_t = uint16_t;
2222
storage_t value;
2323

@@ -29,15 +29,26 @@ class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
2929
// Explicit conversion functions
3030
static storage_t from_float(const float &a) {
3131
#if defined(__SYCL_DEVICE_ONLY__)
32+
#if defined(__NVPTX__)
33+
return __nvvm_f2bf16_rn(a);
34+
#else
3235
return __spirv_ConvertFToBF16INTEL(a);
36+
#endif
3337
#else
3438
throw exception{errc::feature_not_supported,
3539
"Bfloat16 conversion is not supported on host device"};
3640
#endif
3741
}
3842
static float to_float(const storage_t &a) {
3943
#if defined(__SYCL_DEVICE_ONLY__)
44+
#if defined(__NVPTX__)
45+
uint32_t y = a;
46+
y = y << 16;
47+
float *res = reinterpret_cast<float *>(&y);
48+
return *res;
49+
#else
4050
return __spirv_ConvertBF16ToFINTEL(a);
51+
#endif
4152
#else
4253
throw exception{errc::feature_not_supported,
4354
"Bfloat16 conversion is not supported on host device"};
@@ -70,7 +81,16 @@ class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
7081

7182
// Unary minus operator overloading
7283
friend bfloat16 operator-(bfloat16 &lhs) {
73-
return bfloat16{-to_float(lhs.value)};
84+
#if defined(__SYCL_DEVICE_ONLY__)
85+
#if defined(__NVPTX__)
86+
return from_bits(__nvvm_neg_bf16(lhs.value));
87+
#else
88+
return bfloat16{-__spirv_ConvertBF16ToFINTEL(lhs.value)};
89+
#endif
90+
#else
91+
throw exception{errc::feature_not_supported,
92+
"Bfloat16 unary minus is not supported on host device"};
93+
#endif
7494
}
7595

7696
// Increment and decrement operators overloading
@@ -143,7 +163,7 @@ class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
143163
};
144164

145165
} // namespace experimental
146-
} // namespace intel
166+
} // namespace oneapi
147167
} // namespace ext
148168

149169
} // namespace sycl

sycl/test/extensions/bfloat16.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
// UNSUPPORTED: cuda || hip_amd
44

5-
#include <sycl/ext/intel/experimental/bfloat16.hpp>
5+
#include <sycl/ext/oneapi/experimental/bfloat16.hpp>
66
#include <sycl/sycl.hpp>
77

8-
using sycl::ext::intel::experimental::bfloat16;
8+
using sycl::ext::oneapi::experimental::bfloat16;
99

1010
SYCL_EXTERNAL uint16_t some_bf16_intrinsic(uint16_t x, uint16_t y);
1111
SYCL_EXTERNAL void foo(long x, sycl::half y);

0 commit comments

Comments
 (0)