Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix #975

Merged
merged 24 commits into from
Jun 30, 2022

Conversation

JackAKirk
Copy link

@JackAKirk JackAKirk commented Apr 5, 2022

requires intel/llvm#5964

bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from #897 (that cover all "storage type" uint16_t impl cases).

elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using wi_data. Since CUDA does not support joint_matrix_store for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a joint_matrix_mad operation and then call joint_matrix_store on the accumulator matrix in order the reach the host code check.
Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code.

Signed-off-by: jack.kirk [email protected]

@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

@JackAKirk
Copy link
Author

I have decided to mark this and the associated intel/llvm#5964 as draft: I've have been working on the follow up to this: implementing vector of bfloat16 cases: the vector of bfloat16 cases could motivate a change in the joint_matrix bfloat16 implementation. As such I want to explore this further before merging the bfloat16 scalar implementation.

@JackAKirk JackAKirk marked this pull request as draft April 7, 2022 16:58
@steffenlarsen
Copy link

I have decided to mark this and the associated intel/llvm#5964 as draft: I've have been working on the follow up to this: implementing vector of bfloat16 cases: the vector of bfloat16 cases could motivate a change in the joint_matrix bfloat16 implementation. As such I want to explore this further before merging the bfloat16 scalar implementation.

Sounds good! Keep me posted. 😄

@JackAKirk JackAKirk marked this pull request as ready for review April 14, 2022 16:32
@JackAKirk JackAKirk marked this pull request as draft April 15, 2022 09:08
@JackAKirk JackAKirk marked this pull request as ready for review April 15, 2022 12:08
@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

JackAKirk added 2 commits May 31, 2022 10:37
Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk changed the title [SYCL][CUDA] Test cases for bfloat16 math/joint_matrix [SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix May 31, 2022
@JackAKirk
Copy link
Author

I can use the ext_oneapi_bfloat16 aspect in this PR now that intel/llvm#5720 has been merged so that there is not a cuda specific compute capability check blocking future generalization of the tests to other backends. I will update this shortly.

I've done this now. Note that I only made this change to the bfloat16_builtins.cpp test file. I've left element_wise_wi_marray.cpp as it is because it is only expected to be supported by cuda in the future.

dkhaldi
dkhaldi previously approved these changes Jun 28, 2022
Copy link

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

@steffenlarsen
Copy link

@JackAKirk - Looks like CI choked for some reason. Failures in verification seem unrelated. Would you mind pushing a merge-commit to retrigger CI, just to make sure it was a one-time hiccup.

@JackAKirk JackAKirk closed this Jun 30, 2022
@JackAKirk JackAKirk reopened this Jun 30, 2022
@JackAKirk
Copy link
Author

@JackAKirk - Looks like CI choked for some reason. Failures in verification seem unrelated. Would you mind pushing a merge-commit to retrigger CI, just to make sure it was a one-time hiccup.

Sure: it seems that you can retrigger CI by closing and reopening the PR: seems a bit easier than merging the main branch.

@JackAKirk
Copy link
Author

/verify with intel/llvm#5964

@steffenlarsen
Copy link

Failures are unrelated. Merging this.

@steffenlarsen steffenlarsen merged commit a94982c into intel:intel Jun 30, 2022
@steffenlarsen
Copy link

@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these?

@JackAKirk
Copy link
Author

@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these?

Looking into it now.

@JackAKirk
Copy link
Author

@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these?

I still don't understand these failure or why they are only occurring on the CI for new PRs. I've explicitly added headers and used extension namespaces here #1072. Since these tests are passing for me locally I can't say for sure that this will fix it. I could also mark them XFAIL temporarily. It would be useful to see if #1072 fixes it though.

@pvchupin
Copy link

pvchupin commented Jul 1, 2022

@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:

Failed Tests (11):
  SYCL :: Assert/assert_in_kernels.cpp
  SYCL :: Assert/assert_in_multiple_tus.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug.cpp
  SYCL :: Assert/assert_in_one_kernel.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: BFloat16/bfloat16_builtins.cpp
  SYCL :: Matrix/element_wise_all_ops_cuda.cpp
  SYCL :: Matrix/element_wise_wi_marray.cpp
  SYCL :: Matrix/joint_matrix_tensorcore.cpp

It somewhat overlaps. Tried #1072 but it doesn't help.

@JackAKirk
Copy link
Author

@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:

Failed Tests (11):
  SYCL :: Assert/assert_in_kernels.cpp
  SYCL :: Assert/assert_in_multiple_tus.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug.cpp
  SYCL :: Assert/assert_in_one_kernel.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: BFloat16/bfloat16_builtins.cpp
  SYCL :: Matrix/element_wise_all_ops_cuda.cpp
  SYCL :: Matrix/element_wise_wi_marray.cpp
  SYCL :: Matrix/joint_matrix_tensorcore.cpp

It somewhat overlaps. Tried #1072 but it doesn't help.

@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:

Failed Tests (11):
  SYCL :: Assert/assert_in_kernels.cpp
  SYCL :: Assert/assert_in_multiple_tus.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug.cpp
  SYCL :: Assert/assert_in_one_kernel.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: BFloat16/bfloat16_builtins.cpp
  SYCL :: Matrix/element_wise_all_ops_cuda.cpp
  SYCL :: Matrix/element_wise_wi_marray.cpp
  SYCL :: Matrix/joint_matrix_tensorcore.cpp

It somewhat overlaps. Tried #1072 but it doesn't help.

@pvchupin The errors reported by the CI in the bfloat16 tests are all due to experimental headers not being included correctly: intel/llvm#6386 might lead to different behaviour since the tests will have to include the experimental/builtins.hpp header explicitly instead of relying on sycl.hpp: I think I should have made this change already since apparently no C++17 usage should be included in sycl.hpp: although this did not lead to failing tests previously.

@pvchupin
Copy link

pvchupin commented Jul 1, 2022

@JackAKirk, I think it is OK to have C++17 usage in the sycl.hpp if it's guarded and excluded properly, so that C++14 host compilers don't complain. It's ok to have new/experimental features missing in C++14 mode.

To summarize:

  • C++17 is SYCL2020 default and we'd like to be free in writing C++17 code
  • Some customers lagging with move from C++14 forward, so for them we guard C++17 features so that nothing existing breaks on their side.

Not sure if that changes any of the patches. I'm testing these now and will merge if everything works (just to stop confusion in other PRs).

pvchupin pushed a commit to intel/llvm that referenced this pull request Jul 1, 2022
…6386)

C++17 usage of if constexpr etc was added to experimental/builtins.hpp as requested in #5964, but I did not remove this header from sycl.hpp since there were no failing tests and I didn't notice it was included in sycl.hpp. Apparently sycl.hpp should not include any usage of C++17. This may be related to some of the failing tests that appear only on the CI: intel/llvm-test-suite#975 (comment). Necessary changes to the tests are added here : intel/llvm-test-suite#1072

Signed-off-by: JackAKirk [email protected]
pvchupin pushed a commit that referenced this pull request Jul 1, 2022
Explicitly including the extension headers since tests are complaining about missing extension functions/classes in : #975 (comment).

Signed-off-by: JackAKirk [email protected]
@JackAKirk
Copy link
Author

@JackAKirk, I think it is OK to have C++17 usage in the sycl.hpp if it's guarded and excluded properly, so that C++14 host compilers don't complain. It's ok to have new/experimental features missing in C++14 mode.

To summarize:

* C++17 is SYCL2020 default and we'd like to be free in writing C++17 code

* Some customers lagging with move from C++14 forward, so for them we guard C++17 features so that nothing existing breaks on their side.

Not sure if that changes any of the patches. I'm testing these now and will merge if everything works (just to stop confusion in other PRs).

I see, thanks for clarifying that.

myler pushed a commit to myler/llvm-test-suite that referenced this pull request Mar 22, 2023
…el#975)

requires intel/llvm#5964

bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from intel#897 (that cover all "storage type" uint16_t impl cases).

elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check.
Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code.

Signed-off-by: jack.kirk <[email protected]>
myler pushed a commit to myler/llvm-test-suite that referenced this pull request Mar 22, 2023
Explicitly including the extension headers since tests are complaining about missing extension functions/classes in : intel#975 (comment).

Signed-off-by: JackAKirk [email protected]
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…el/llvm-test-suite#975)

requires intel#5964

bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel#5964, using the tests from intel/llvm-test-suite#897 (that cover all "storage type" uint16_t impl cases).

elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check.
Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code.

Signed-off-by: jack.kirk <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants