Skip to content

[libclc][cuda] CTS fix: CUDA backend uses "success" atomic order for cas. #12502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

JackAKirk
Copy link
Contributor

CTS fix: CUDA backend uses "success" atomic order for cas.

There was a bug in the cas impl for nvptx in libclc that lead to CTS test failures for the CUDA backend.
This fixes the bug in a simple way by simply replacing the cases where the failure order differs from the success order (when failure order is either release or acquire), so that the failure order matches the success order (acq_rel). This is safe even if the cas performs the failure operation, because acq_rel can be used for both acquire (load) and release (store) atomic ops in ptx. I think that this is the only valid way to implement cas for nvptx, because the cas operation only takes one order argument.
Now the sycl cts passes for acq_rel atomics for the cuda backend.

This is safe because acq_rel can be used for acquire and release in ptx.
This fixes a bug in a simple way and now the sycl cts passes for acq_rel
atomics for the cuda backend.

Signed-off-by: JackAKirk <[email protected]>
@JackAKirk JackAKirk requested a review from a team as a code owner January 25, 2024 20:47
Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do you have any references to NVIDIA docs that might back up your assertion of safety?

@JackAKirk
Copy link
Contributor Author

JackAKirk commented Jan 26, 2024

Looks good. Do you have any references to NVIDIA docs that might back up your assertion of safety?

From https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#id70

  • "acquire operation: A memory operation with .acquire or .acq_rel qualifier."
  • "release operation: A memory operation with .release or .acq_rel qualifier."

For the case that failure is stronger than success. Justifying still using success order:

  • From SYCL spec on definition of memory consistency model behavior:

"The SYCL memory consistency model is based upon the memory consistency model of the C++ core language. Where SYCL offers extensions to classes and functions that may affect memory consistency, the default behavior when these extensions are not used always matches the behavior of standard C++."

  • and then a page later:

"
sycl::memory_order::acquire;
sycl::memory_order::release;
sycl::memory_order::acq_rel;
sycl::memory_order::seq_cst.
The meanings of these values are identical to those defined in the C++ core language.
"

Then looking at C++ definition:
https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

"If failure is stronger than success or(until C++17) is one of std::memory_order_release and std::memory_order_acq_rel, the behavior is undefined."

Therefore the mapping that I described in the commit message, from the C++/SYCL interface (assuming default scope from atomic_ref constructor to satisfy the clause "the default behavior when these extensions are not used always matches the behavior of standard C++") to ptx interfaces:
compare_exchange_strong( T& expected, T desired, std::memory_order success, std::memory_order failure )

to atom{.sem}{.scope}{.space}.cas

Is I think the most sensible mapping that still satisfies the sycl function definition.

@steffenlarsen steffenlarsen merged commit eaff1cf into intel:sycl Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants