-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 #134111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2070,8 +2070,8 @@ defm INT_PTX_ATOMIC_UMIN_32 : F_ATOMIC_2_AS<I32RT, atomic_load_umin_i32, "min.u3 | |
defm INT_PTX_ATOMIC_UMIN_64 : F_ATOMIC_2_AS<I64RT, atomic_load_umin_i64, "min.u64", [hasSM<32>]>; | ||
|
||
// atom_inc atom_dec | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't have a PTX test to prove that the final lowering to PTX will be the same, but this looks good enough. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it makes sense to test the auto-upgrade rules and test the lowering of the current syntax but not to maintain lowering tests using out-of-date syntax. |
||
defm INT_PTX_ATOM_INC_32 : F_ATOMIC_2_AS<I32RT, int_nvvm_atomic_load_inc_32, "inc.u32">; | ||
defm INT_PTX_ATOM_DEC_32 : F_ATOMIC_2_AS<I32RT, int_nvvm_atomic_load_dec_32, "dec.u32">; | ||
defm INT_PTX_ATOM_INC_32 : F_ATOMIC_2_AS<I32RT, atomic_load_uinc_wrap_i32, "inc.u32">; | ||
defm INT_PTX_ATOM_DEC_32 : F_ATOMIC_2_AS<I32RT, atomic_load_udec_wrap_i32, "dec.u32">; | ||
|
||
// atom_and | ||
defm INT_PTX_ATOM_AND_32 : F_ATOMIC_2_AS<I32RT, atomic_load_and_i32, "and.b32">; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intrinsic does not mention anything about ordering, but I suppose seq_cst is the safest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I do wonder about this as well. This is consistent with how the rest of the
atomicrmw
instructions are lowered, but I'm not sure it's correct. We're lowering seq_cst atomicrmw without a syncscope to the PTX atom instruction which has implied.relaxed
ordering and.gpu
scope.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For historical reasons, frontends generate
seq_cst
and "system scope", and we (LLVM and NVVM) lower that torelaxed+gpu scope
. We should definetly prioritize fixing these bugs by lowering these properly (there are also a few other related bugs that we need to fix).However, these fixes will break all the frontends, so we need to give some thought to how to approach this to minimize churn.
Therefore, i don't know if this PR is the right place to make this fix (this has been broken for a long time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are lowering cmpxchg with seq_cst correctly by using fence.sc and atom.acq_rel.cas. @gonzalobg - will this also break frontends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds like there is a larger issue to address around the scope and semantics of atomics in NVPTX. This change maintains consistency with all other
atomicrmw
instructions and I think the larger bug can be addressed separately.