Skip to content

[SYCL][CUDA] joint_matrix required changes following #11215 #11563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented Oct 17, 2023

As discussed in #11215 this patch:

I also added back the cases that the change in the joint_matrix_mad interface allows: namely when the type of C/D matrices differ. I correspondingly updated the tests, to test the new cases that are supported.

I also updated the support matrix for cuda in the spec doc for the newly supported combinations.

Added new supported mma builtins where C/D types differ.

Signed-off-by: JackAKirk <[email protected]>
 and test new cases.

Signed-off-by: JackAKirk <[email protected]>
const variables.

Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 17, 2023 15:55 — with GitHub Actions Inactive
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 17, 2023 16:26 — with GitHub Actions Inactive
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 17, 2023 16:44 — with GitHub Actions Inactive
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 17, 2023 17:19 — with GitHub Actions Inactive
from tf32 device code check test.

Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 18, 2023 08:46 — with GitHub Actions Inactive
@JackAKirk JackAKirk marked this pull request as ready for review October 18, 2023 13:31
@@ -79,15 +79,6 @@ int main() {
sg, sub_c, accC.template get_multi_ptr<access::decorated::yes>(),
N, layout::row_major);

// Round a, b to tf32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file/directory needs an entry in CODEOWNERS file. Please merge a separate PR with that change. Once done, I expect not review from SYCL RT team will be required here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this here: #11582

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

joint_matrix_mad(sg, sub_c, sub_a, sub_b, sub_c);
joint_matrix_mad(sg, sub_d, sub_a, sub_b, sub_c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many times this will actually run?
You want the return matrix to be "sub_c" so it can be input again, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just once, I changed it to run once so that sub_d can be a different joint_matrix (with a different type to sub_c), in order to test the new cases I added to the support matrix/joint_matrix_tensorcores_sm70.cpp.

|16 |16 |16
|8 |32 |16
|32 |8 |16
.3+| `matrix_type::fp16` .3+| `matrix_type::fp16` .3+| `matrix_type::fp32`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, where are the bfloat16 combinations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are in the same table at the bottom, between tf32 and fp64.

Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk requested a review from a team as a code owner October 19, 2023 08:57
@JackAKirk JackAKirk requested a review from npmiller October 19, 2023 08:57
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 19, 2023 09:04 — with GitHub Actions Inactive
@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 19, 2023 09:34 — with GitHub Actions Inactive
Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// Round a, b to tf32
for (auto i = 0; i < 4; ++i)
get_wi_data(sg, sub_a)[i] =
round_to_tf32(get_wi_data(sg, sub_a)[i]);
Copy link
Contributor Author

@JackAKirk JackAKirk Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realized that to be consistent round_to_tf32 should also have a device code check. I removed it here since it didn't have one, but it will be better to add it back via

                auto round_lambda = [](auto &x) { x = round_to_tf32(x); };
                joint_matrix_apply(sg, sub_a, round_lambda);

and add a check that it calls the correct nvvm builtin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done this

@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 19, 2023 16:47 — with GitHub Actions Inactive
Copy link
Contributor

@smanna12 smanna12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FE change LGTM

@JackAKirk JackAKirk temporarily deployed to WindowsCILock October 19, 2023 17:36 — with GitHub Actions Inactive
Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec changes OK

Copy link
Contributor

@npmiller npmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matrix-nvptx-tf32-test.cpp changes LGTM

@aelovikov-intel
Copy link
Contributor

the type of C/D matrices differs

I think it should be "the types of C/D matrices differ".

@aelovikov-intel
Copy link
Contributor

I don't see any changes that require runtime-reviewers approval. @JackAKirk , should I merge this in?

@JackAKirk
Copy link
Contributor Author

I don't see any changes that require runtime-reviewers approval. @JackAKirk , should I merge this in?

Yes please!

@aelovikov-intel aelovikov-intel merged commit 8cae553 into intel:sycl Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants