[SYCL][CUDA] Draft PR for discussing matrix ext impl issues #6657

JackAKirk · 2022-08-29T15:16:41Z

This is a move towards the future looking joint_matrix, joint_matrix_load, joint_matrix_store APIs. The aim is to make the CUDA and Intel implementations of the joint_matrix extension use matching interfaces, whilst enabling all functionality of both backends.

Signed-off-by: JackAKirk [email protected]

Signed-off-by: JackAKirk <[email protected]>

This is a move towards the future looking joint_matrix, joint_matrix_load, joint_matrix_store APIs. Signed-off-by: JackAKirk <[email protected]>

dkhaldi · 2022-08-29T15:27:15Z

sycl/include/sycl/ext/oneapi/matrix/matrix-tensorcore.hpp

@@ -16,25 +16,28 @@ namespace oneapi {
 namespace experimental {
 namespace matrix {

-enum class matrix_use { a, b, accumulator };
+enum class matrix_use { a, b, accumulator, unnecessary };


#5835 deoes not have the most uptodate changes.
Also, the PR of doc with use is ready: https://github.com/intel/llvm/pull/6659/files.
In the final version:

there is no "unecessary" in use

packed_a and packed_b will be replaced by packed

we are calling "none", "unused". But if you prefer "none", we can call it that instead.

dkhaldi · 2022-08-29T15:29:50Z

sycl/include/sycl/ext/oneapi/matrix/matrix-tensorcore.hpp


-enum class matrix_layout { row_major, col_major, packed_a, packed_b };
+enum class layout { row_major, col_major, packed_a, packed_b, none};


Also, if are calling it "layout", we should remove matrix from "matrix_use" as well, right?

I agree. It could be worth having a final conversation about the semantics of layout/use in each backend (and for each API : it looks like in both backends there is a distinction between "layout" and "memory layout") before deciding on the naming. I wanted to focus on making sure we are on the same page about the interfaces first though.

JackAKirk · 2022-08-29T15:39:46Z

Updated usage of joint_matrix can be seen in the changes here: intel/llvm-test-suite#1183.

Also updated the impl functions used in the CUDA backend (Some of these functions may be also used in the HIP AMD case when that is implemented, since the interfaces will match). Signed-off-by: JackAKirk <[email protected]>

This is for illustrative purposes: to show the advantage of the proposed change in the joint_matrix_mad interface. Signed-off-by: JackAKirk <[email protected]>

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2022-10-11T09:09:12Z

@dkhaldi @yubingex007-a11y @gmlueck

matrix-unified.hpp contains the agreed interfaces for joint_matrix_load, joint_matrix_store and joint_matrix_mad. These functions call backend implementations depending on compiler flags __NVPTX__ __SPIR__ (and later we can also add amd flags). I've added the backend implementations for CUDA in the matrix-tensor-cores.hpp file.
This is just a draft aimed at finding any technical issues with the unified approach but when #6957 is merged I will pull in those changes and update the flags usage.

The main implementation issue that I think we will face is the redefinition of partial specializations of the joint_matrix struct in the AMX/CUDA backends. These backends use completely different definitions of joint_matrix but have overlapping template parameters. Ideally I think we can select the correct definitions depending on the backend. Unless you can see another solution?
Here you can see that I have also separated the unified joint_matrix struct in joint-matrix.hpp from the CUDA backend partial specializations of joint_matrix in joint-matrix-cuda-impl.hpp.

Do you think that we could use the driver to select the correct partial specializations in a similar manner to how https://github.com/intel/llvm/pull/6524/files#diff-f8c64e36dfe3828a6f816c4550e78bb0305769ace1be53207e86ac9a3280ac9e selects the correct bfloat16 native library?

Also you might want to check that you can call the intel implementations from matrix-unified.hpp, replacing the joint_matrix cuda partial specializations with the intel ones, in order to check that there are no other technical issues we need to consider when we unify sooner rather than later.

Tests using the unified interface in the cuda backend: intel/llvm-test-suite#1183

JackAKirk added 3 commits August 5, 2022 12:59

Allow joint_matrix to be loaded from const.

fdc4c42

Signed-off-by: JackAKirk <[email protected]>

removed duplicates.

68d3150

Signed-off-by: JackAKirk <[email protected]>

Layout accumulator is specified at load/store.

4949464

This is a move towards the future looking joint_matrix, joint_matrix_load, joint_matrix_store APIs. Signed-off-by: JackAKirk <[email protected]>

JackAKirk requested review from dkhaldi and yubingex007-a11y August 29, 2022 15:16

dkhaldi reviewed Aug 29, 2022

View reviewed changes

JackAKirk mentioned this pull request Aug 29, 2022

[SYCL][CUDA] Unified matrix interface updated tests intel/llvm-test-suite#1183

Merged

joint_matrix_mad takes D matrix as argument.

8c09910

Also updated the impl functions used in the CUDA backend (Some of these functions may be also used in the HIP AMD case when that is implemented, since the interfaces will match). Signed-off-by: JackAKirk <[email protected]>

JackAKirk mentioned this pull request Aug 30, 2022

[SYCL][Spec] Update the matrix spec based on new use argument #6662

Merged

JackAKirk and others added 8 commits September 1, 2022 10:19

Add new mma cases enabled by joint_matrix_mad.

e55e5f0

This is for illustrative purposes: to show the advantage of the proposed change in the joint_matrix_mad interface. Signed-off-by: JackAKirk <[email protected]>

packed_a, packed_b -> packed

a881055

Signed-off-by: JackAKirk <[email protected]>

Made interface compatible with intel backend.

5b84434

Signed-off-by: JackAKirk <[email protected]>

Merge branch 'sycl' into nvptx-matrix-const

75774f2

Merge branch 'nvptx-matrix-const' into update-matrix-interface

5c03b3f

Signed-off-by: JackAKirk <[email protected]>

added unified header, moved nvptx specific impl.

ccdb544

Signed-off-by: JackAKirk <[email protected]>

Merge branch 'sycl' into update-matrix-interface

331760a

Signed-off-by: JackAKirk <[email protected]>

(very) draft updated interfaces.

46e87a1

Signed-off-by: JackAKirk <[email protected]>

JackAKirk changed the title ~~[SYCL][CUDA] Layout accumulator is specified at load/store.~~ [SYCL][CUDA] Separate matrix extension interfaces from impls Oct 10, 2022

cuda joint_matrix partial specializations in separate file.

766fd8c

Signed-off-by: JackAKirk <[email protected]>

JackAKirk closed this Oct 14, 2022

JackAKirk changed the title ~~[SYCL][CUDA] Separate matrix extension interfaces from impls~~ [SYCL][CUDA] Draft PR for discussing matrix ext impl issues Oct 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA] Draft PR for discussing matrix ext impl issues #6657

[SYCL][CUDA] Draft PR for discussing matrix ext impl issues #6657

Uh oh!

JackAKirk commented Aug 29, 2022

Uh oh!

dkhaldi Aug 29, 2022 •

edited

Loading

Uh oh!

dkhaldi Aug 29, 2022

Uh oh!

JackAKirk Aug 29, 2022 •

edited

Loading

Uh oh!

JackAKirk commented Aug 29, 2022 •

edited

Loading

Uh oh!

JackAKirk commented Oct 11, 2022 •

edited

Loading

Uh oh!

Uh oh!


		enum class matrix_layout { row_major, col_major, packed_a, packed_b };
		enum class layout { row_major, col_major, packed_a, packed_b, none};

[SYCL][CUDA] Draft PR for discussing matrix ext impl issues #6657

[SYCL][CUDA] Draft PR for discussing matrix ext impl issues #6657

Uh oh!

Conversation

JackAKirk commented Aug 29, 2022

Uh oh!

dkhaldi Aug 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi Aug 29, 2022

Choose a reason for hiding this comment

Uh oh!

JackAKirk Aug 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JackAKirk commented Aug 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dkhaldi Aug 29, 2022 •

edited

Loading

JackAKirk Aug 29, 2022 •

edited

Loading

JackAKirk commented Aug 29, 2022 •

edited

Loading

JackAKirk commented Oct 11, 2022 •

edited

Loading