[libclc][cuda] CTS fix: CUDA backend uses "success" atomic order for cas. (#12502)

JackAKirk · web-flow · commit eaff1cffa547 · 2024-01-26T15:04:51.000+01:00
CTS fix: CUDA backend uses "success" atomic order for cas.

There was a bug in the cas impl for nvptx in libclc that lead to CTS
test failures for the CUDA backend.
This fixes the bug in a simple way by simply replacing the cases where
the failure order differs from the success order (when failure order is
either `release` or `acquire`), so that the failure order matches the
success order (`acq_rel`). This is safe even if the cas performs the
failure operation, because acq_rel can be used for both acquire (load)
and release (store) atomic ops in ptx. I think that this is the only
valid way to implement cas for nvptx, because the cas operation only
takes one order argument.
Now the sycl cts passes for acq_rel atomics for the cuda backend.

Signed-off-by: JackAKirk &lt;jack.kirk@codeplay.com&gt;
diff --git a/libclc/ptx-nvidiacl/libspirv/atomic/atomic_cmpxchg.cl b/libclc/ptx-nvidiacl/libspirv/atomic/atomic_cmpxchg.cl
@@ -86,7 +86,7 @@ SemanticsMask4FlagES##SUBSTITUTION2##_##TYPE_MANGLED##TYPE_MANGLED(            \
       enum MemorySemanticsMask semantics2, TYPE cmp, TYPE value) {             \
     /* Semantics mask may include memory order, storage class and other info   \
 Memory order is stored in the lowest 5 bits */                                 \
-    unsigned int order = (semantics1 | semantics2) & 0x1F;                     \
+    unsigned int order = semantics1 & 0x1F;                                    \
     switch (order) {                                                           \
     case None:                                                                 \
       __CLC_NVVM_ATOMIC_CAS_IMPL_ORDER(TYPE, TYPE_NV, TYPE_MANGLED_NV, OP,     \