-
Notifications
You must be signed in to change notification settings - Fork 787
[SYCL][CUDA] joint_matrix required changes following #11215 #11563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA] joint_matrix required changes following #11215 #11563
Conversation
Added new supported mma builtins where C/D types differ. Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
and test new cases. Signed-off-by: JackAKirk <[email protected]>
const variables. Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
from tf32 device code check test. Signed-off-by: JackAKirk <[email protected]>
@@ -79,15 +79,6 @@ int main() { | |||
sg, sub_c, accC.template get_multi_ptr<access::decorated::yes>(), | |||
N, layout::row_major); | |||
|
|||
// Round a, b to tf32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file/directory needs an entry in CODEOWNERS file. Please merge a separate PR with that change. Once done, I expect not review from SYCL RT team will be required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this here: #11582
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
joint_matrix_mad(sg, sub_c, sub_a, sub_b, sub_c); | ||
joint_matrix_mad(sg, sub_d, sub_a, sub_b, sub_c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many times this will actually run?
You want the return matrix to be "sub_c" so it can be input again, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just once, I changed it to run once so that sub_d can be a different joint_matrix (with a different type to sub_c), in order to test the new cases I added to the support matrix/joint_matrix_tensorcores_sm70.cpp.
|16 |16 |16 | ||
|8 |32 |16 | ||
|32 |8 |16 | ||
.3+| `matrix_type::fp16` .3+| `matrix_type::fp16` .3+| `matrix_type::fp32` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, where are the bfloat16 combinations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are in the same table at the bottom, between tf32 and fp64.
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// Round a, b to tf32 | ||
for (auto i = 0; i < 4; ++i) | ||
get_wi_data(sg, sub_a)[i] = | ||
round_to_tf32(get_wi_data(sg, sub_a)[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just realized that to be consistent round_to_tf32
should also have a device code check. I removed it here since it didn't have one, but it will be better to add it back via
auto round_lambda = [](auto &x) { x = round_to_tf32(x); };
joint_matrix_apply(sg, sub_a, round_lambda);
and add a check that it calls the correct nvvm builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done this
Signed-off-by: JackAKirk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FE change LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec changes OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matrix-nvptx-tf32-test.cpp
changes LGTM
I think it should be "the types of C/D matrices differ". |
I don't see any changes that require runtime-reviewers approval. @JackAKirk , should I merge this in? |
Yes please! |
As discussed in #11215 this patch:
joint_matrix_cuda
(This change requires an upstream llvm patch (https://reviews.llvm.org/rGb781c7ab574f))get_wi_data()
I also added back the cases that the change in the
joint_matrix_mad
interface allows: namely when the type of C/D matrices differ. I correspondingly updated the tests, to test the new cases that are supported.I also updated the support matrix for cuda in the spec doc for the newly supported combinations.