Skip to content

[SYCL][HIP] Change atomic sub to atomic add (-val) #11251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 6, 2023

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Sep 21, 2023

This PR changes the use of atomic_sub to atomic_add with a negated val. Doing so extends the number of cases in which the atomic operation will work, since support for atomic_add is more common than atomic_sub, especially at the PCIe level.

@hdelan hdelan requested a review from a team as a code owner September 21, 2023 13:33
@hdelan hdelan requested a review from jchlanda September 21, 2023 13:33
@hdelan
Copy link
Contributor Author

hdelan commented Sep 21, 2023

Related to #11218 as well as #7252

@hdelan hdelan temporarily deployed to WindowsCILock September 21, 2023 14:30 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to WindowsCILock September 21, 2023 15:19 — with GitHub Actions Inactive
Copy link
Contributor

@jchlanda jchlanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making this change allows atomic_ref::fetch_sub to work in a wider variety of cases

Does that mean that there is still a subset of devices for which the bug exists.

Also, if I remember it right, it's the first commit message (not the PR info) that gets added to intel/llvm, so I would reword it to be more like your PR's title and the explanation for the change.

@hdelan
Copy link
Contributor Author

hdelan commented Sep 22, 2023

First comment again:

When using HIP atomics with malloc_shared allocation, ROCm may use PCIe level atomics in order to optimize operations. More information can be found here https://rocm.docs.amd.com/en/latest/understand/More-about-how-ROCm-uses-PCIe-Atomics.html .

PCIe support for atomic_sub is not as common as PCIe support for atomic_add, so making this change allows atomic_ref::fetch_sub to work in a wider variety of cases, most notably when using atomics on part of a malloc_shared allocation

@hdelan
Copy link
Contributor Author

hdelan commented Sep 22, 2023

Does that mean that there is still a subset of devices for which the bug exists.

@jchlanda Atomic add is guaranteed to work for PCIe 3. And that dates from 2010 I think. See here https://www.intel.sg/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf

@hdelan
Copy link
Contributor Author

hdelan commented Oct 3, 2023

Ping @intel/llvm-gatekeepers

@aelovikov-intel
Copy link
Contributor

Testing is old and the change is AMD GPU specific. We seem to have it broken currently, and I'm not sure if anybody is looking into it. I wouldn't want to merge until CI is fixed.

@jinz2014
Copy link
Contributor

jinz2014 commented Oct 4, 2023

@hdelan I have a question. Is double-precision floating-point subtract implemented elsewhere ?

@hdelan
Copy link
Contributor Author

hdelan commented Oct 6, 2023

Hi @jinz2014 that's a good question. __spirv_AtomicFAddEXT is an extension for floating point add in SPIRV, which we need to implement in libclc for fp32 and fp64. However __spirv_AtomicFSubEXT is actually not a SPIR-V extension and is not used in the SYCL headers. So the symbol was unused and floating. So I will remove it now. Good catch, thanks

@hdelan hdelan temporarily deployed to WindowsCILock October 6, 2023 11:28 — with GitHub Actions Inactive
@hdelan hdelan temporarily deployed to WindowsCILock October 6, 2023 11:50 — with GitHub Actions Inactive
@jinz2014
Copy link
Contributor

jinz2014 commented Oct 6, 2023

No problem. I didn't catch it. So fp32 and fp64 are implemented in another file in libclc.

@hdelan
Copy link
Contributor Author

hdelan commented Oct 6, 2023

No problem. I didn't catch it. So fp32 and fp64 are implemented in another file in libclc.

Well they aren't actually implemented at all for floating point. If floating point sub is needed then you need to rely on fetch_add(-val); in SYCL, since SYCL only uses atomic fAdd not atomic fSub

@hdelan
Copy link
Contributor Author

hdelan commented Oct 6, 2023

@aelovikov-intel it looks like CI is working again? Can this be merged?

@aelovikov-intel
Copy link
Contributor

Quick question first. Any chance a-b and a+(-b) behave differently under IEEE-754? Like some weird rounding/signed zeroes/nans/etc.?

@hdelan
Copy link
Contributor Author

hdelan commented Oct 6, 2023

So it's only different for unsigned types where a + (-b) will cause overflow for -b. But this is OK because unsigned overflow is not UB so this is guaranteed to always work. Note that HIP uses + (-val) for its atomics as well, presumably due to the better PCIe support for atomic add https://github.com/ROCm-Developer-Tools/clr/blob/develop/hipamd/include/hip/amd_detail/amd_hip_atomic.h#L305

@aelovikov-intel aelovikov-intel merged commit c8a6f0d into intel:sycl Oct 6, 2023
@jinz2014
Copy link
Contributor

jinz2014 commented Oct 6, 2023

The link gives me some light of the HIP implementation. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants