intel · JackAKirk · Oct 5, 2022 · Oct 18, 2022 · Oct 31, 2022 · Oct 31, 2022
@@ -0,0 +1,95 @@
+# `sycl_ext_oneapi_matrix` extension constraints specific to the `ext_oneapi_cuda` backend.
+:source-highlighter: coderay
+:coderay-linenums-mode: table
+:dpcpp: pass:[DPC++]
+
+// This section needs to be after the document title.
+:doctype: book
+:toc2:
+:toc: left
+:encoding: utf-8
+:lang: en
+
+:blank: pass:[ +]
+
+// Set the default source code type in this document to C++,
+// for syntax highlighting purposes.  This is needed because
+// docbook uses c++ and html5 uses cpp.
+:language: {basebackend@docbook:c++:cpp}
+
+
+== Notice
+
+Copyright (c) 2022-2022 Intel Corporation.  All rights reserved.
+
+NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
+trademarks of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc.
+used by permission by Khronos.
+
+This extension is written against the SYCL 2020 revision 6 specification.  All
+references below to the "core SYCL specification" or to section numbers in the
+SYCL specification refer to that revision.
+
+
+**_NOTE:_** This document describes the current design and API for the `ext_oneapi_cuda` only features matrix
+extension to {dpcpp}. This is an initial experimental version to try out functionality
+and performance, and **future versions of this API may change in ways that are incompatible with this experimental version**.
+
+## Introduction
+The `ext_oneapi_cuda` backend supports `joint_matrix`, `joint_matrix_load`, `joint_matrix_store`, `joint_matrix_mad` and `joint_matrix_fill` as they are defined in the `sycl_ext_oneapi_matrix` extension. The complete set of `joint_matrix` types and shapes that are valid in the `ext_oneapi_cuda` backend are listed in this document.
+This extension presents any constraints that apply specifically when using the `ext_oneapi_cuda` backend, which may not apply generally to the `sycl_ext_oneapi_matrix` extension.
+
+### Valid `joint_matrix` types and shapes
+
+The complete set of matrix data types and shapes that are supported by the `ext_oneapi_cuda` backend are represented in the following table. Tm indicates the matrix element data type held by a "multiplicand" `joint_matrix`: i.e requiring `use::a` or `use::b`. Tc indicates the matrix element data type held by an "accumulator" `joint_matrix`: i.e requiring `use::accumulator`.
+--
+[.center]
+|======================
+|Tm (`use::a` or `use::b`) |Tc (`use::accumulator`) |M |N |K | Minimum Compute Capability
+.3+|half  .3+|float
+|16 |16 |16| sm_70
+|8 |32 |16| sm_70
+|32 |8 |16| sm_70
+.3+|half  .3+|half
+|16 |16 |16| sm_70
+|8 |32 |16| sm_70
+|32 |8 |16| sm_70
+.3+|int8_t  .3+|int32_t
+|16 |16 |16| sm_72
+|8 |32 |16| sm_72
+|32 |8 |16| sm_72
+.3+|uint8_t  .3+|int32_t
+|16 |16 |16| sm_72
+|8 |32 |16| sm_72
+|32 |8 |16| sm_72
+|precision::tf32  |float |16 |16 |8| sm_80
+.3+|bfloat16  .3+|float
+|16 |16 |16 |sm_80
+|8 |32 |16 |sm_80
+|32 |8 |16 |sm_80
+|double  |double |8 |8 |4 |sm_80
+|======================
+--
+
+The M, N, K triple from the above table defines the complete set of matrix shapes constructible:
+--
+[.center]
+|======================
+|use |NumRows | NumCols
+|a |M |K
+|b |K |N
+|accumulator | M| N
+|======================
+--
+
+### Additional contraints in the `ext_oneapi_cuda` backend
+
+IMPORTANT: The `stride` argument to `joint_matrix_load` and `joint_matrix_store` must be a multiple of 8 when `T` is `half`, and a multiple of 4 when `T` is `float`; where `T` is the type of the `joint_matrix` elements.
+
+## Revision History
+
+[frame="none",options="header"]
+|======================
+|Rev |Date       |Author     |Changes
+|1   |2022-10-5 |Jack Kirk |Initial public working draft.
+|======================