Skip to content

Commit 2365dc8

Browse files
committed
[SYCL][DOC] Update dot_acc extension spec
Update the sycl_ext_oneapi_dot_accumulate extension spec to: * Use the latest specification template. * Document the "packed" APIs. These were previously shown in the "sample header" section, but there was no description. * Deprecate the old "packed" overloads and introduce new overloads with the more descriptive name `doc_acc_4x8packed`. This new name is consistent with the OpenCL C naming style for similar functions.
1 parent 2ddb42c commit 2365dc8

File tree

1 file changed

+101
-73
lines changed

1 file changed

+101
-73
lines changed
Lines changed: 101 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,99 @@
1-
= SYCL_INTEL_dot_accumulate
1+
= sycl_ext_oneapi_dot_accumulate
2+
23
:source-highlighter: coderay
34
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
47
:doctype: book
8+
:toc2:
9+
:toc: left
510
:encoding: utf-8
611
:lang: en
12+
:dpcpp: pass:[DPC++]
713

8-
:blank: pass:[ +]
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
918

10-
// Set the default source code type in this document to C,
11-
// for syntax highlighting purposes.
12-
:language: c
1319

14-
// This is what is needed for C++, since docbook uses c++
15-
// and everything else uses cpp. This doesn't work when
16-
// source blocks are in table cells, though, so don't use
17-
// C++ unless it is required.
18-
//:language: {basebackend@docbook:c++:cpp}
20+
== Notice
21+
22+
[%hardbreaks]
23+
Copyright (C) 2020-2023 Intel Corporation. All rights reserved.
1924

20-
== Introduction
25+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
26+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
27+
permission by Khronos.
2128

22-
IMPORTANT: This specification is a draft.
2329

24-
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos.
30+
== Contact
2531

26-
NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons.
32+
To report problems with this extension, please open a new issue at:
2733

28-
== Name Strings
34+
https://github.com/intel/llvm/issues
2935

30-
`SYCL_ONEAPI_dot_accumulate`
3136

32-
This is a placeholder name.
37+
== Dependencies
3338

34-
== Notice
39+
This extension is written against the SYCL 2020 revision 7 specification. All
40+
references below to the "core SYCL specification" or to section numbers in the
41+
SYCL specification refer to that revision.
3542

36-
Copyright (c) 2020 Intel Corporation. All rights reserved.
3743

3844
== Status
3945

40-
Working Draft
46+
This extension is implemented and fully supported by {dpcpp}.
4147

42-
This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension.
4348

44-
Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products.
49+
== Overview
4550

46-
== Version
51+
This extension adds new SYCL built-in functions that may simplify development
52+
and provide access specialized hardware instructions when a SYCL kernel needs
53+
to perform a dot product of two vectors followed by a scalar accumulation.
4754

48-
Built On: {docdate} +
49-
Revision: 3
5055

51-
== Contact
56+
== Specification
5257

53-
Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com)
58+
=== Feature test macro
5459

55-
== Dependencies
60+
This extension provides a feature-test macro as described in the core SYCL
61+
specification. An implementation supporting this extension must predefine the
62+
macro `SYCL_EXT_ONEAPI_DOT_ACCUMULATE` to one of the values defined in the table
63+
below. Applications can test for the existence of this macro to determine if
64+
the implementation supports this feature, or applications can test the macro's
65+
value to determine which of the extension's features the implementation
66+
supports.
5667

57-
This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6.
68+
[%header,cols="1,5"]
69+
|===
70+
|Value
71+
|Description
5872

59-
== Overview
73+
|1
74+
|Initial version of this extension.
75+
|===
6076

61-
This extension adds new SYCL built-in functions that may simplify development and provide access specialized hardware instructions when a SYCL kernel needs to perform a dot product of two vectors followed by a scalar accumulation.
77+
=== New functions to compute the dot product of vectors
6278

63-
== Enabling the extension
79+
This extension adds the following free functions:
6480

65-
The extension is always enabled. The dot product functionality may be emulated in software or executed using hardware when suitable instructions are available.
81+
[source,c++]
82+
----
83+
namespace sycl::ext::oneapi {
6684
67-
== Modifications of SYCL 1.2.1 specification
85+
int32_t dot_acc(vec<int8_t,4> a, vec<int8_t,4> b, int32_t c);
86+
int32_t dot_acc(vec<int8_t,4> a, vec<uint8_t,4> b, int32_t c);
87+
int32_t dot_acc(vec<uint8_t,4> a, vec<int8_t,4> b, int32_t c);
88+
int32_t dot_acc(vec<uint8_t,4> a, vec<uint8_t,4> b, int32_t c);
6889
69-
=== Add to Section 4.13.6 - Geometric Functions
90+
int32_t doc_acc_4x8packed(int32_t a, int32_t b, int32_t c);
91+
int32_t doc_acc_4x8packed(int32_t a, uint32_t b, int32_t c);
92+
int32_t doc_acc_4x8packed(uint32_t a, int32_t b, int32_t c);
93+
int32_t doc_acc_4x8packed(uint32_t a, uint32_t b, int32_t c);
7094
71-
Additionally, the following additional functions are available in the namespace `sycl::intel` on the host and device.
95+
} // namespace sycl::ext::oneapi
96+
----
7297

7398
[cols="4a,4",options="header"]
7499
|====
@@ -95,56 +120,59 @@ int32_t dot_acc(vec<uint8_t,4> a,
95120
{blank}
96121
The value that is returned is equivalent to +
97122
{blank}
98-
*dot*(_a_, _b_) + _c_
123+
`dot(a, b) + c`
124+
125+
|[source,c]
126+
----
127+
int32_t doc_acc_4x8packed(int32_t a,
128+
int32_t b,
129+
int32_t c)
130+
int32_t doc_acc_4x8packed(int32_t a,
131+
uint32_t b,
132+
int32_t c)
133+
int32_t doc_acc_4x8packed(uint32_t a,
134+
int32_t b,
135+
int32_t c)
136+
int32_t doc_acc_4x8packed(uint32_t a,
137+
uint32_t b,
138+
int32_t c);
139+
----
140+
141+
|Performs a four-component integer dot product accumulate operation, where
142+
`a` and `b` are 32-bit integers that represent a vector of 4 8-bit elements.
143+
When the type of `a` or `b` is `int32_t`, it is interpreted as `vec<int8_t,4>`.
144+
When the type of `a` or `b` is `uint32_t`, it is interpreted as
145+
`vec<uint8_t,4>`. In each case, the least significant byte is element 0, and
146+
the most significant byte is element 3.
99147

100148
|====
101149

102-
== Sample Header
150+
=== Deprecated functions
151+
152+
The following functions are deprecated. They have the same effect as the
153+
`doc_acc_4x8packed` overloads described above.
103154

104155
[source,c++]
105156
----
106-
namespace cl {
107-
namespace sycl {
108-
namespace ext {
109-
namespace oneapi {
110-
111-
int32_t dot_acc(vec<int8_t,4> a, vec<int8_t,4> b, int32_t c);
112-
int32_t dot_acc(vec<int8_t,4> a, vec<uint8_t,4> b, int32_t c);
113-
int32_t dot_acc(vec<uint8_t,4> a, vec<int8_t,4> b, int32_t c);
114-
int32_t dot_acc(vec<uint8_t,4> a, vec<uint8_t,4> b, int32_t c);
157+
namespace sycl::ext::oneapi {
115158
116159
int32_t dot_acc(int32_t a, int32_t b, int32_t c);
117160
int32_t dot_acc(int32_t a, uint32_t b, int32_t c);
118161
int32_t dot_acc(uint32_t a, int32_t b, int32_t c);
119162
int32_t dot_acc(uint32_t a, uint32_t b, int32_t c);
120163
121-
} // oneapi
122-
} // ext
123-
} // sycl
124-
} // cl
164+
} // namespace sycl::ext::oneapi
125165
----
126166

167+
127168
== Issues
128169

129-
None.
130-
131-
== Revision History
132-
133-
[cols="5,15,15,70"]
134-
[grid="rows"]
135-
[options="header"]
136-
|========================================
137-
|Rev|Date|Author|Changes
138-
|1|2019-12-13|Ben Ashbaugh|*Initial draft*
139-
|2|2019-12-18|Ben Ashbaugh|Switched to standard C++ fixed width types.
140-
|3|2020-10-26|Rajiv Deodhar|Added int32 types.
141-
|========================================
142-
143-
//************************************************************************
144-
//Other formatting suggestions:
145-
//
146-
//* Use *bold* text for host APIs, or [source] syntax highlighting.
147-
//* Use `mono` text for device APIs, or [source] syntax highlighting.
148-
//* Use `mono` text for extension names, types, or enum values.
149-
//* Use _italics_ for parameters.
150-
//************************************************************************
170+
* The overloads that take two unsigned vectors have a signed `c` and return a
171+
signed result. This is inconsistent with the SPIR-V primitives and the
172+
OpenCL C APIs, both of which return an unsigned value in this case and expect
173+
an unsigned `c`. I think we could implement the APIs as they are using the
174+
SPIR-V primitives, but the extra unsigned-to-signed conversions might
175+
generate less efficient code (I haven't checked). Is there a compelling
176+
reason to keep these APIs as they are now? If not, we could deprecate them
177+
and introduce overloads that take an unsigned `c` and return an unsigned
178+
value.

0 commit comments

Comments
 (0)