Skip to content

[SYCL][HIP] Add basic HIP atomics #8003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 31, 2023
Merged

[SYCL][HIP] Add basic HIP atomics #8003

merged 10 commits into from
Jan 31, 2023

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Jan 13, 2023

Adding support for basic atomic operations for HIP AMD backend.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 13, 2023

/verify with intel/llvm-test-suite#1510

@hdelan hdelan temporarily deployed to aws January 18, 2023 10:16 — with GitHub Actions Inactive
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! 🚀

@hdelan hdelan temporarily deployed to aws January 18, 2023 10:51 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws January 18, 2023 12:50 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws January 18, 2023 15:19 — with GitHub Actions Inactive
Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments needed, but otherwise FE changes LGTM.

@hdelan hdelan temporarily deployed to aws January 19, 2023 10:24 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws January 19, 2023 12:25 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws January 19, 2023 12:56 — with GitHub Actions Inactive
Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang changes look ok to me.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 23, 2023

/verify with intel/llvm-test-suite#1510

@zjin-lcf
Copy link
Contributor

Faster than hipcc ...

SYCL
./main 10000
Atomic min for data type U64 | Average execution time: 0.000025 (s)
Atomic min for data type S64 | Average execution time: 0.000021 (s)
Atomic min for data type F64 | Average execution time: 0.000022 (s)
Atomic max for data type U64 | Average execution time: 0.000052 (s)
Atomic max for data type S64 | Average execution time: 0.000054 (s)
Atomic max for data type F64 | Average execution time: 0.000052 (s)
Atomic add for data type U64 | Average execution time: 3.235254 (s)
Atomic add for data type S64 | Average execution time: 3.215109 (s)
Atomic add for data type F64 | Average execution time: 3.245661 (s)
atomic max results: 262144 -9223372036854775808 262144.000000
FAIL

=================================================
HIP
./main 10000
Atomic min for data type U64 | Average execution time: 0.000053 (s)
Atomic min for data type S64 | Average execution time: 0.000029 (s)
Atomic min for data type F64 | Average execution time: 0.000029 (s)
Atomic max for data type U64 | Average execution time: 0.000280 (s)
Atomic max for data type S64 | Average execution time: 0.000292 (s)
Atomic max for data type F64 | Average execution time: 0.000280 (s)
Atomic add for data type U64 | Average execution time: 4.652767 (s)
Atomic add for data type S64 | Average execution time: 4.517521 (s)
Atomic add for data type F64 | Average execution time: 4.601057 (s)
PASS

@hdelan
Copy link
Contributor Author

hdelan commented Jan 25, 2023

@zjin-lcf wow nice. Note that AtomicFMin and AtomicFMax are not included in this PR, which is why the tests are failing.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 25, 2023

@bader the LLVM test suite failures are all XPASSes, so I think this can be merged

@zjin-lcf
Copy link
Contributor

@zjin-lcf wow nice. Note that AtomicFMin and AtomicFMax are not included in this PR, which is why the tests are failing.

Is there something missing in ROCm for not including FMin/FMax ?

@hdelan
Copy link
Contributor Author

hdelan commented Jan 25, 2023

No it is just lower priority than the FAdd etc. I will be adding FMin, FMax as well as some other atomics as soon as I have finished some other stuff I am working on

@bader
Copy link
Contributor

bader commented Jan 25, 2023

@bader the LLVM test suite failures are all XPASSes, so I think this can be merged

I see these failures:

Failed Tests (4):
SYCL :: Reduction/reduction_big_data.cpp
SYCL :: Reduction/reduction_nd_ext_double.cpp
SYCL :: Reduction/reduction_usm.cpp
SYCL :: Reduction/reduction_usm_dw.cpp

lld: error: undefined hidden symbol: __spirv_AtomicFMaxEXT(float AS1*, __spv::Scope::Flag, __spv::MemorySemanticsMask::Flag, float)

referenced by lto.tmp:(typeinfo name for sycl::_V1::detail::reduction::MainKrn<main::A1, (sycl::_V1::detail::reduction::strategy)4>)
referenced by lto.tmp:(typeinfo name for sycl::_V1::detail::reduction::MainKrn<main::A1, (sycl::_V1::detail::reduction::strategy)4>)
referenced by lto.tmp:(_ZTSN4sycl3_V16detail9reduction7MainKrnIZ4mainE2A1LNS2_8strategyE4EJEEE_with_offset)
referenced 1 more times
llvm-foreach:
clang-16: error: amdgcn-link command failed with exit code 1 (use -v to see invocation)

error: command failed with exit status: 1

These seem to be related to the change.

NOTE: Unfortunately, "/verify with" doesn't cover AMDGPU, so there is no safe way to test both changes (compiler + test) in CI.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 26, 2023

@bader that should be fixed now. @zjin-lcf remaining atomics are all added, so FMax, FMin etc should be working

@hdelan hdelan temporarily deployed to aws January 26, 2023 13:31 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to aws January 26, 2023 14:02 — with GitHub Actions Inactive
@zjin-lcf
Copy link
Contributor

I tried to run atomic tests. Can you please run them again ? Thanks.

https://github.com/zjin-lcf/HeCBench/tree/master/atomicCAS-sycl

@hdelan
Copy link
Contributor Author

hdelan commented Jan 26, 2023

I tried to run atomic tests. Can you please run them again ? Thanks.

https://github.com/zjin-lcf/HeCBench/tree/master/atomicCAS-sycl
@zjin-lcf

$ sycl-ls
[ext_oneapi_hip:gpu:0] AMD HIP BACKEND, AMD Radeon VII 0.0 [HIP 40421.43]
$ ./main 10000
Atomic min for data type U64 | Average execution time: 0.000072 (s)
Atomic min for data type S64 | Average execution time: 0.000087 (s)
Atomic min for data type F64 | Average execution time: 0.000061 (s)
Atomic max for data type U64 | Average execution time: 0.000218 (s)
Atomic max for data type S64 | Average execution time: 0.000226 (s)
Atomic max for data type F64 | Average execution time: 0.000250 (s)
Atomic add for data type U64 | Average execution time: 2.484193 (s)
Atomic add for data type S64 | Average execution time: 2.413692 (s)
Atomic add for data type F64 | Average execution time: 2.370539 (s)
PASS

@zjin-lcf
Copy link
Contributor

@hdelan
I pushed some changes to the sycl benchmark to fix some issues, and observed the same result (PASS) on a MI200 GPU. Congratulations!

I tried to run the benchmark
https://github.com/zjin-lcf/HeCBench/tree/master/sheath-hip
https://github.com/zjin-lcf/HeCBench/tree/master/sheath-sycl

The option -munsafe-fp-atomics is passed to the hip compiler and the option -DSYCL_USE_NATIVE_FP_ATOMICS to the sycl compiler. However, I think that the sycl compiler does not generate fast fp atomics due to the significant difference in execution time on a MI200 GPU. Do I pass the right compiler option to the sycl compiler ? Thanks again.

@bader
Copy link
Contributor

bader commented Jan 26, 2023

so I think this can be merged

One more thing. We need to merge these changes together with test changes, but the test PR is not reviewed yet.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 26, 2023

@hdelan I pushed some changes to the sycl benchmark to fix some issues, and observed the same result (PASS) on a MI200 GPU. Congratulations!

I tried to run the benchmark https://github.com/zjin-lcf/HeCBench/tree/master/sheath-hip https://github.com/zjin-lcf/HeCBench/tree/master/sheath-sycl

The option -munsafe-fp-atomics is passed to the hip compiler and the option -DSYCL_USE_NATIVE_FP_ATOMICS to the sycl compiler. However, I think that the sycl compiler does not generate fast fp atomics due to the significant difference in execution time on a MI200 GPU. Do I pass the right compiler option to the sycl compiler ? Thanks again.

@zjin-lcf So the fast gfx90a+ atomics are not part of this PR. They will be enabled in a future PR, will keep you posted

Also SYCL_USE_NATIVE_FP_ATOMICS is now enabled by default for the AMD HIP backend, so you no longer need to pass this to the compiler

@zjin-lcf
Copy link
Contributor

zjin-lcf commented Jan 27, 2023

I see. I am also puzzled by the issue (#7252).
The atomic add works with "shared" USM, but other atomics (e.g. atomic sub, atomic xor) don't.

@hdelan
Copy link
Contributor Author

hdelan commented Jan 30, 2023

@bader Can we get the test suite changes reviewed?

@hdelan
Copy link
Contributor Author

hdelan commented Jan 31, 2023

@bader the test suite changes have been reviewed so I think this is safe to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants