[SYCL][Matrix] Add a more general query example #1492

dkhaldi · 2023-01-05T22:00:28Z

that shows dispatching the same joint matrix kernel on different devices using the query interface

…he same joint matrix kernel on different devices using the query interface

JackAKirk · 2023-01-11T11:27:08Z

This example seems to have a routine that can work with different sub group joint_matrix sizes. Then it is called using the defaults specified for different architectures.

My initial question is whether it is a good idea to have a default for a given arch or not. I'm not saying it isn't a good idea. I'm not sure of the answer right now, I'm trying to frame the decision in a simple way.

I think it would definitely be a good idea if we can pick a default value that is good (preferably the best choice) for a large proportion of the problems (or at least much larger than the proportion of problems that other) that people are likely to use joint_matrix for. Then it would make sense to me to provide some kind of encouragement/guidance to programmers to use a default value, and I think this would be a nice feature.

I think it would be a not so good idea if we can't pick a default value that is very good for a large proportion of problems.
If this isn't obvious but there has to be a default value then maybe it would make sense to pick a value with the most overlap across architectures: I think there is one value that overlaps with XMX and AMX for all types? Perhaps that could be a good choice. Unfortunately there doesn't seem to be a value that overlaps with Tensor Cores and both those two, although I think there is one that overlaps Tensor Cores and AMX (m16n16k16 I think). Perhaps that could be a good default for Tensor Cores.

gmlueck · 2023-01-11T14:31:35Z

SYCL/Matrix/joint_matrix_query_general.cpp

+  }
+
+  if (is_this_xmx8_device(q)) {
+    using myparams2 = tpu_params<tpu::xmx8, int8_t, int8_t, int>;


The variables TM, TN, and TK aren't defined here. I assume that is just a typo?

you are right this is a typo. I only run the code on SPR (AMX) and PVC (XMX16).
Since I did not run it on ATS-M (that has XMX8), i did not catch the typo. I will run on ATS-M and fix it.

dkhaldi · 2023-01-12T15:25:04Z

This example seems to have a routine that can work with different sub group joint_matrix sizes. Then it is called using the defaults specified for different architectures.

My initial question is whether it is a good idea to have a default for a given arch or not. I'm not saying it isn't a good idea. I'm not sure of the answer right now, I'm trying to frame the decision in a simple way.

The default in this case is the maximum supported which is a good choice for many problems.
I can create a second example where the user actually queries the different supported sizes and look for the ones they actually prefer (for instance, if the user's problem is relatively small and they are interested in an M = 4 (not the maximum)), this is also possible with the query.

JackAKirk · 2023-01-12T16:04:18Z

The default in this case is the maximum supported which is a good choice for many problems.

👍 That's a very good point: there's a big range of sizes (in terms of total number of matrix elements) in XMX and AMX. Picking a default that is large makes sense to me: but then what about the aspect ratio: should it be the largest possible size irrespective of aspect ratio: e.g. for bfloat16 m16n16k32 for amx and m8n16k16 for xmx or is there a preferable aspect ratio, e.g. square matrix for A,B, and C all the same size: then perhaps m16n16k16 and m8n8k8 would be more preferable?
Then do you want to take into account portability? Then perhaps m8n8k8 would be best for AMX and XMX?
I don't know the answers to these questions but someone probably does.

By the way, does AMX not have mixed precision float? Or is it just missing from the table?
I think a default for Nvidia is easier to pick based on the above discussed constraints (since the matrix size in terms of number of elements isn't variable). A default that would make sense for mixed precision float/mixed precision int and bfloat16 would be m16n16k16, since this would at least overlap with an admissable AMX combination (although admittedly not the most sensible choice for its default following my thoughts above).

…t matrix template

dkhaldi added 4 commits January 5, 2023 13:58

[SYCL][Matrix] Add a more general example that shows dispatching of t…

2d4dd4c

…he same joint matrix kernel on different devices using the query interface

remove comments

1e5355b

test first if this is a GPU

ebba9ef

format

be7ee49

dkhaldi mentioned this pull request Jan 9, 2023

[SYCL][Matrix] Add get-coord API and general query example intel/llvm#7964

Merged

gmlueck reviewed Jan 11, 2023

View reviewed changes

dkhaldi added 2 commits January 17, 2023 07:30

Add an example that uses the different fields of the general query

61bd1ce

Add an example that shows constexpr calculation and usage in the join…

c3a12b1

…t matrix template

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][Matrix] Add a more general query example #1492

[SYCL][Matrix] Add a more general query example #1492

Uh oh!

dkhaldi commented Jan 5, 2023

Uh oh!

JackAKirk commented Jan 11, 2023

Uh oh!

gmlueck Jan 11, 2023

Uh oh!

dkhaldi Jan 12, 2023

Uh oh!

dkhaldi commented Jan 12, 2023

Uh oh!

JackAKirk commented Jan 12, 2023 •

edited

Loading

Uh oh!

Uh oh!

[SYCL][Matrix] Add a more general query example #1492

Are you sure you want to change the base?

[SYCL][Matrix] Add a more general query example #1492

Uh oh!

Conversation

dkhaldi commented Jan 5, 2023

Uh oh!

JackAKirk commented Jan 11, 2023

Uh oh!

gmlueck Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

dkhaldi Jan 12, 2023

Choose a reason for hiding this comment

Uh oh!

dkhaldi commented Jan 12, 2023

Uh oh!

JackAKirk commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

JackAKirk commented Jan 12, 2023 •

edited

Loading