[SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix #975

JackAKirk · 2022-04-05T17:04:23Z

bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from #897 (that cover all "storage type" uint16_t impl cases).

elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using wi_data. Since CUDA does not support joint_matrix_store for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a joint_matrix_mad operation and then call joint_matrix_store on the accumulator matrix in order the reach the host code check.
Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code.

Signed-off-by: jack.kirk [email protected]

Signed-off-by: jack.kirk <[email protected]>

JackAKirk · 2022-04-05T17:53:18Z

/verify with intel/llvm#5964

SYCL/Matrix/joint_matrix_tensorcore.cpp

JackAKirk · 2022-04-07T15:40:29Z

/verify with intel/llvm#5964

JackAKirk · 2022-04-07T16:58:48Z

I have decided to mark this and the associated intel/llvm#5964 as draft: I've have been working on the follow up to this: implementing vector of bfloat16 cases: the vector of bfloat16 cases could motivate a change in the joint_matrix bfloat16 implementation. As such I want to explore this further before merging the bfloat16 scalar implementation.

steffenlarsen · 2022-04-08T09:10:48Z

I have decided to mark this and the associated intel/llvm#5964 as draft: I've have been working on the follow up to this: implementing vector of bfloat16 cases: the vector of bfloat16 cases could motivate a change in the joint_matrix bfloat16 implementation. As such I want to explore this further before merging the bfloat16 scalar implementation.

Sounds good! Keep me posted. 😄

JackAKirk · 2022-04-27T16:00:53Z

/verify with intel/llvm#5964

JackAKirk · 2022-05-11T16:29:51Z

/verify with intel/llvm#5964

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2022-06-28T09:24:13Z

I can use the ext_oneapi_bfloat16 aspect in this PR now that intel/llvm#5720 has been merged so that there is not a cuda specific compute capability check blocking future generalization of the tests to other backends. I will update this shortly.

I've done this now. Note that I only made this change to the bfloat16_builtins.cpp test file. I've left element_wise_wi_marray.cpp as it is because it is only expected to be supported by cuda in the future.

dkhaldi

LGTM

SYCL/BFloat16/bfloat16_builtins.cpp

Signed-off-by: JackAKirk <[email protected]>

JackAKirk · 2022-06-28T17:14:09Z

/verify with intel/llvm#5964

JackAKirk · 2022-06-29T15:09:13Z

/verify with intel/llvm#5964

steffenlarsen · 2022-06-30T09:45:28Z

@JackAKirk - Looks like CI choked for some reason. Failures in verification seem unrelated. Would you mind pushing a merge-commit to retrigger CI, just to make sure it was a one-time hiccup.

JackAKirk · 2022-06-30T09:49:01Z

@JackAKirk - Looks like CI choked for some reason. Failures in verification seem unrelated. Would you mind pushing a merge-commit to retrigger CI, just to make sure it was a one-time hiccup.

Sure: it seems that you can retrigger CI by closing and reopening the PR: seems a bit easier than merging the main branch.

JackAKirk · 2022-06-30T09:54:36Z

/verify with intel/llvm#5964

steffenlarsen · 2022-06-30T12:10:57Z

Failures are unrelated. Merging this.

steffenlarsen · 2022-06-30T16:07:58Z

@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these?

JackAKirk · 2022-06-30T17:42:32Z

@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these?

Looking into it now.

JackAKirk · 2022-06-30T19:18:36Z

@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these?

I still don't understand these failure or why they are only occurring on the CI for new PRs. I've explicitly added headers and used extension namespaces here #1072. Since these tests are passing for me locally I can't say for sure that this will fix it. I could also mark them XFAIL temporarily. It would be useful to see if #1072 fixes it though.

pvchupin · 2022-07-01T06:20:45Z

@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:

Failed Tests (11):
  SYCL :: Assert/assert_in_kernels.cpp
  SYCL :: Assert/assert_in_multiple_tus.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug.cpp
  SYCL :: Assert/assert_in_one_kernel.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: BFloat16/bfloat16_builtins.cpp
  SYCL :: Matrix/element_wise_all_ops_cuda.cpp
  SYCL :: Matrix/element_wise_wi_marray.cpp
  SYCL :: Matrix/joint_matrix_tensorcore.cpp

It somewhat overlaps. Tried #1072 but it doesn't help.

JackAKirk · 2022-07-01T11:24:44Z

@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:

Failed Tests (11):
  SYCL :: Assert/assert_in_kernels.cpp
  SYCL :: Assert/assert_in_multiple_tus.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug.cpp
  SYCL :: Assert/assert_in_one_kernel.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: BFloat16/bfloat16_builtins.cpp
  SYCL :: Matrix/element_wise_all_ops_cuda.cpp
  SYCL :: Matrix/element_wise_wi_marray.cpp
  SYCL :: Matrix/joint_matrix_tensorcore.cpp

It somewhat overlaps. Tried #1072 but it doesn't help.

@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:

Failed Tests (11):
  SYCL :: Assert/assert_in_kernels.cpp
  SYCL :: Assert/assert_in_multiple_tus.cpp
  SYCL :: Assert/assert_in_multiple_tus_one_ndebug.cpp
  SYCL :: Assert/assert_in_one_kernel.cpp
  SYCL :: Assert/assert_in_simultaneous_kernels.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus.cpp
  SYCL :: Assert/assert_in_simultaneously_multiple_tus_one_ndebug.cpp
  SYCL :: BFloat16/bfloat16_builtins.cpp
  SYCL :: Matrix/element_wise_all_ops_cuda.cpp
  SYCL :: Matrix/element_wise_wi_marray.cpp
  SYCL :: Matrix/joint_matrix_tensorcore.cpp

It somewhat overlaps. Tried #1072 but it doesn't help.

@pvchupin The errors reported by the CI in the bfloat16 tests are all due to experimental headers not being included correctly: intel/llvm#6386 might lead to different behaviour since the tests will have to include the experimental/builtins.hpp header explicitly instead of relying on sycl.hpp: I think I should have made this change already since apparently no C++17 usage should be included in sycl.hpp: although this did not lead to failing tests previously.

pvchupin · 2022-07-01T22:52:55Z

@JackAKirk, I think it is OK to have C++17 usage in the sycl.hpp if it's guarded and excluded properly, so that C++14 host compilers don't complain. It's ok to have new/experimental features missing in C++14 mode.

To summarize:

C++17 is SYCL2020 default and we'd like to be free in writing C++17 code
Some customers lagging with move from C++14 forward, so for them we guard C++17 features so that nothing existing breaks on their side.

Not sure if that changes any of the patches. I'm testing these now and will merge if everything works (just to stop confusion in other PRs).

…6386) C++17 usage of if constexpr etc was added to experimental/builtins.hpp as requested in #5964, but I did not remove this header from sycl.hpp since there were no failing tests and I didn't notice it was included in sycl.hpp. Apparently sycl.hpp should not include any usage of C++17. This may be related to some of the failing tests that appear only on the CI: intel/llvm-test-suite#975 (comment). Necessary changes to the tests are added here : intel/llvm-test-suite#1072 Signed-off-by: JackAKirk [email protected]

Explicitly including the extension headers since tests are complaining about missing extension functions/classes in : #975 (comment). Signed-off-by: JackAKirk [email protected]

JackAKirk · 2022-07-04T09:17:04Z

@JackAKirk, I think it is OK to have C++17 usage in the sycl.hpp if it's guarded and excluded properly, so that C++14 host compilers don't complain. It's ok to have new/experimental features missing in C++14 mode.

To summarize:
* C++17 is SYCL2020 default and we'd like to be free in writing C++17 code

* Some customers lagging with move from C++14 forward, so for them we guard C++17 features so that nothing existing breaks on their side.
Not sure if that changes any of the patches. I'm testing these now and will merge if everything works (just to stop confusion in other PRs).

I see, thanks for clarifying that.

…el#975) requires intel/llvm#5964 bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from intel#897 (that cover all "storage type" uint16_t impl cases). elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check. Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code. Signed-off-by: jack.kirk <[email protected]>

Explicitly including the extension headers since tests are complaining about missing extension functions/classes in : intel#975 (comment). Signed-off-by: JackAKirk [email protected]

…el/llvm-test-suite#975) requires intel#5964 bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel#5964, using the tests from intel/llvm-test-suite#897 (that cover all "storage type" uint16_t impl cases). elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check. Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code. Signed-off-by: jack.kirk <[email protected]>

New test cases, bfloat16 builtins/matrix.

fc9b2d7

Signed-off-by: jack.kirk <[email protected]>

JackAKirk requested review from a team as code owners April 5, 2022 17:04

JackAKirk requested a review from vladimirlaz April 5, 2022 17:04

JackAKirk mentioned this pull request Apr 5, 2022

[SYCL][CUDA] Joint_matrix elem wise ops inc bfloat16 intel/llvm#5964

Merged

JackAKirk added 4 commits April 5, 2022 18:14

format

e8acad9

format

caed520

format

b178c43

format

ca24153

steffenlarsen reviewed Apr 7, 2022

View reviewed changes

SYCL/Matrix/joint_matrix_tensorcore.cpp Outdated Show resolved Hide resolved

JackAKirk added 2 commits April 7, 2022 14:58

Do compute capability check at runtime.

4013bd0

format

e4e7f20

JackAKirk marked this pull request as draft April 7, 2022 16:58

JackAKirk added 2 commits April 12, 2022 15:13

tests for bfloat16 marray math fcts

5c5d65c

format

6856b89

JackAKirk marked this pull request as ready for review April 14, 2022 16:32

JackAKirk marked this pull request as draft April 15, 2022 09:08

test case for expected NAN behaviour.

4e1d6e4

JackAKirk marked this pull request as ready for review April 15, 2022 12:08

JackAKirk mentioned this pull request Apr 15, 2022

[SYCL][Doc] math functions added to bfloat16 ext intel/llvm#5645

Merged

Merge branch 'intel:intel' into bfloat16-class-tests

c133f17

JackAKirk added 2 commits May 31, 2022 10:37

Added tests for elem wise ops.

095f775

Signed-off-by: JackAKirk <[email protected]>

format

5347f7e

Signed-off-by: JackAKirk <[email protected]>

JackAKirk changed the title ~~[SYCL][CUDA] Test cases for bfloat16 math/joint_matrix~~ [SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix May 31, 2022

dkhaldi previously approved these changes Jun 28, 2022

View reviewed changes

steffenlarsen reviewed Jun 28, 2022

View reviewed changes

SYCL/BFloat16/bfloat16_builtins.cpp Show resolved Hide resolved

removed requires cuda (rely on bfloat16 aspect instead)

3506d29

Signed-off-by: JackAKirk <[email protected]>

JackAKirk dismissed dkhaldi’s stale review via 3506d29 June 28, 2022 16:59

Noted test doesn't compile for other backends.

7478793

JackAKirk closed this Jun 30, 2022

JackAKirk reopened this Jun 30, 2022

steffenlarsen merged commit a94982c into intel:intel Jun 30, 2022

JackAKirk mentioned this pull request Jun 30, 2022

[SYCL][CUDA] Explicitly adding extension headers in tests. #1072

Merged

pvchupin mentioned this pull request Jun 30, 2022

[CI] Enable llvm-test-suite in post-commit on l0_gen9 intel/llvm#6352

Merged

JackAKirk mentioned this pull request Jul 1, 2022

[SYCL] Remove experimental/builtins.hpp from sycl.hpp due to C++17 intel/llvm#6386

Merged

[SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix #975

[SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix #975

Uh oh!

Conversation

JackAKirk commented Apr 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackAKirk commented Apr 5, 2022

Uh oh!

Uh oh!

JackAKirk commented Apr 7, 2022

Uh oh!

JackAKirk commented Apr 7, 2022

Uh oh!

steffenlarsen commented Apr 8, 2022

Uh oh!

JackAKirk commented Apr 27, 2022

Uh oh!

JackAKirk commented May 11, 2022

Uh oh!

JackAKirk commented Jun 28, 2022

Uh oh!

dkhaldi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JackAKirk commented Jun 28, 2022

Uh oh!

JackAKirk commented Jun 29, 2022

Uh oh!

steffenlarsen commented Jun 30, 2022

Uh oh!

JackAKirk commented Jun 30, 2022

Uh oh!

JackAKirk commented Jun 30, 2022

Uh oh!

steffenlarsen commented Jun 30, 2022

Uh oh!

steffenlarsen commented Jun 30, 2022

Uh oh!

JackAKirk commented Jun 30, 2022

Uh oh!

JackAKirk commented Jun 30, 2022

Uh oh!

pvchupin commented Jul 1, 2022

Uh oh!

JackAKirk commented Jul 1, 2022

Uh oh!

pvchupin commented Jul 1, 2022

Uh oh!

JackAKirk commented Jul 4, 2022

Uh oh!

Uh oh!

JackAKirk commented Apr 5, 2022 •

edited

Loading