-
Notifications
You must be signed in to change notification settings - Fork 130
[SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix #975
Conversation
Signed-off-by: jack.kirk <[email protected]>
/verify with intel/llvm#5964 |
/verify with intel/llvm#5964 |
I have decided to mark this and the associated intel/llvm#5964 as draft: I've have been working on the follow up to this: implementing vector of bfloat16 cases: the vector of bfloat16 cases could motivate a change in the joint_matrix bfloat16 implementation. As such I want to explore this further before merging the bfloat16 scalar implementation. |
Sounds good! Keep me posted. 😄 |
/verify with intel/llvm#5964 |
/verify with intel/llvm#5964 |
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
I've done this now. Note that I only made this change to the bfloat16_builtins.cpp test file. I've left element_wise_wi_marray.cpp as it is because it is only expected to be supported by cuda in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: JackAKirk <[email protected]>
/verify with intel/llvm#5964 |
/verify with intel/llvm#5964 |
@JackAKirk - Looks like CI choked for some reason. Failures in verification seem unrelated. Would you mind pushing a merge-commit to retrigger CI, just to make sure it was a one-time hiccup. |
Sure: it seems that you can retrigger CI by closing and reopening the PR: seems a bit easier than merging the main branch. |
/verify with intel/llvm#5964 |
Failures are unrelated. Merging this. |
@JackAKirk - The tests seem to be failing in intel/llvm CI (see for 7131271000 and 7131035571). Could you please address these? |
Looking into it now. |
I still don't understand these failure or why they are only occurring on the CI for new PRs. I've explicitly added headers and used extension namespaces here #1072. Since these tests are passing for me locally I can't say for sure that this will fix it. I could also mark them XFAIL temporarily. It would be useful to see if #1072 fixes it though. |
@JackAKirk, @steffenlarsen, I couldn't reproduce on other cuda machine, but when ran on the runner got a bunch of issues:
It somewhat overlaps. Tried #1072 but it doesn't help. |
@pvchupin The errors reported by the CI in the bfloat16 tests are all due to experimental headers not being included correctly: intel/llvm#6386 might lead to different behaviour since the tests will have to include the experimental/builtins.hpp header explicitly instead of relying on sycl.hpp: I think I should have made this change already since apparently no C++17 usage should be included in sycl.hpp: although this did not lead to failing tests previously. |
@JackAKirk, I think it is OK to have C++17 usage in the sycl.hpp if it's guarded and excluded properly, so that C++14 host compilers don't complain. It's ok to have new/experimental features missing in C++14 mode. To summarize:
Not sure if that changes any of the patches. I'm testing these now and will merge if everything works (just to stop confusion in other PRs). |
…6386) C++17 usage of if constexpr etc was added to experimental/builtins.hpp as requested in #5964, but I did not remove this header from sycl.hpp since there were no failing tests and I didn't notice it was included in sycl.hpp. Apparently sycl.hpp should not include any usage of C++17. This may be related to some of the failing tests that appear only on the CI: intel/llvm-test-suite#975 (comment). Necessary changes to the tests are added here : intel/llvm-test-suite#1072 Signed-off-by: JackAKirk [email protected]
Explicitly including the extension headers since tests are complaining about missing extension functions/classes in : #975 (comment). Signed-off-by: JackAKirk [email protected]
I see, thanks for clarifying that. |
…el#975) requires intel/llvm#5964 bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from intel#897 (that cover all "storage type" uint16_t impl cases). elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check. Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code. Signed-off-by: jack.kirk <[email protected]>
Explicitly including the extension headers since tests are complaining about missing extension functions/classes in : intel#975 (comment). Signed-off-by: JackAKirk [email protected]
…el/llvm-test-suite#975) requires intel#5964 bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel#5964, using the tests from intel/llvm-test-suite#897 (that cover all "storage type" uint16_t impl cases). elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check. Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code. Signed-off-by: jack.kirk <[email protected]>
requires intel/llvm#5964
bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from #897 (that cover all "storage type" uint16_t impl cases).
elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using
wi_data
. Since CUDA does not supportjoint_matrix_store
for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform ajoint_matrix_mad
operation and then calljoint_matrix_store
on the accumulator matrix in order the reach the host code check.Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code.
Signed-off-by: jack.kirk [email protected]