Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL][Matrix] Add a more general query example #1492

Draft
wants to merge 6 commits into
base: intel
Choose a base branch
from

Conversation

dkhaldi
Copy link

@dkhaldi dkhaldi commented Jan 5, 2023

that shows dispatching the same joint matrix kernel on different devices using the query interface

@JackAKirk
Copy link

This example seems to have a routine that can work with different sub group joint_matrix sizes. Then it is called using the defaults specified for different architectures.

My initial question is whether it is a good idea to have a default for a given arch or not. I'm not saying it isn't a good idea. I'm not sure of the answer right now, I'm trying to frame the decision in a simple way.

I think it would definitely be a good idea if we can pick a default value that is good (preferably the best choice) for a large proportion of the problems (or at least much larger than the proportion of problems that other) that people are likely to use joint_matrix for. Then it would make sense to me to provide some kind of encouragement/guidance to programmers to use a default value, and I think this would be a nice feature.

I think it would be a not so good idea if we can't pick a default value that is very good for a large proportion of problems.
If this isn't obvious but there has to be a default value then maybe it would make sense to pick a value with the most overlap across architectures: I think there is one value that overlaps with XMX and AMX for all types? Perhaps that could be a good choice. Unfortunately there doesn't seem to be a value that overlaps with Tensor Cores and both those two, although I think there is one that overlaps Tensor Cores and AMX (m16n16k16 I think). Perhaps that could be a good default for Tensor Cores.

}

if (is_this_xmx8_device(q)) {
using myparams2 = tpu_params<tpu::xmx8, int8_t, int8_t, int>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables TM, TN, and TK aren't defined here. I assume that is just a typo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right this is a typo. I only run the code on SPR (AMX) and PVC (XMX16).
Since I did not run it on ATS-M (that has XMX8), i did not catch the typo. I will run on ATS-M and fix it.

@dkhaldi
Copy link
Author

dkhaldi commented Jan 12, 2023

This example seems to have a routine that can work with different sub group joint_matrix sizes. Then it is called using the defaults specified for different architectures.

My initial question is whether it is a good idea to have a default for a given arch or not. I'm not saying it isn't a good idea. I'm not sure of the answer right now, I'm trying to frame the decision in a simple way.

The default in this case is the maximum supported which is a good choice for many problems.
I can create a second example where the user actually queries the different supported sizes and look for the ones they actually prefer (for instance, if the user's problem is relatively small and they are interested in an M = 4 (not the maximum)), this is also possible with the query.

@JackAKirk
Copy link

JackAKirk commented Jan 12, 2023

The default in this case is the maximum supported which is a good choice for many problems.

👍 That's a very good point: there's a big range of sizes (in terms of total number of matrix elements) in XMX and AMX. Picking a default that is large makes sense to me: but then what about the aspect ratio: should it be the largest possible size irrespective of aspect ratio: e.g. for bfloat16 m16n16k32 for amx and m8n16k16 for xmx or is there a preferable aspect ratio, e.g. square matrix for A,B, and C all the same size: then perhaps m16n16k16 and m8n8k8 would be more preferable?
Then do you want to take into account portability? Then perhaps m8n8k8 would be best for AMX and XMX?
I don't know the answers to these questions but someone probably does.

By the way, does AMX not have mixed precision float? Or is it just missing from the table?
I think a default for Nvidia is easier to pick based on the above discussed constraints (since the matrix size in terms of number of elements isn't variable). A default that would make sense for mixed precision float/mixed precision int and bfloat16 would be m16n16k16, since this would at least overlap with an admissable AMX combination (although admittedly not the most sensible choice for its default following my thoughts above).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants