Skip to content

[NVPTX] Make GlobalUniqueCallSite a member of NVPTXISelLowering #130212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AlexMaclean
Copy link
Member

@AlexMaclean AlexMaclean commented Mar 7, 2025

This change moves GlobalUniqueCallSite into NVPTXISelLowering. In processes where multiple compilations occur, this makes call site enumeration local to individual compilation, which ensures that call site numbers are consistently sequential within each compilation and is independent of other compilations happening in parallel.

@AlexMaclean AlexMaclean requested review from Artem-B and kalxr March 7, 2025 00:52
@AlexMaclean AlexMaclean self-assigned this Mar 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2025

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Moving GlobalUniqueCallSite into NVPTXISelLowering ensures that in processes where multiple compilations occur, race conditions do not impact the generated PTX.


Full diff: https://github.com/llvm/llvm-project/pull/130212.diff

2 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2-4)
  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.h (+2)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 3e755c25fd91a..b62c15ddb97d3 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -73,8 +73,6 @@
 
 using namespace llvm;
 
-static std::atomic<unsigned> GlobalUniqueCallSite;
-
 static cl::opt<bool> sched4reg(
     "nvptx-sched4reg",
     cl::desc("NVPTX Specific: schedule for register pressue"), cl::init(false));
@@ -500,7 +498,7 @@ static SDValue MaybeBitcast(SelectionDAG &DAG, SDLoc DL, EVT VT,
 // NVPTXTargetLowering Constructor.
 NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
                                          const NVPTXSubtarget &STI)
-    : TargetLowering(TM), nvTM(&TM), STI(STI) {
+    : TargetLowering(TM), nvTM(&TM), STI(STI), GlobalUniqueCallSite(0) {
   // always lower memset, memcpy, and memmove intrinsics to load/store
   // instructions, rather
   // then generating calls to memset, mempcy or memmove.
@@ -1474,7 +1472,7 @@ SDValue NVPTXTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
   unsigned FirstVAArg = CLI.NumFixedArgs; // position of the first variadic
   unsigned VAOffset = 0;                  // current offset in the param array
 
-  unsigned UniqueCallSite = GlobalUniqueCallSite.fetch_add(1);
+  const unsigned UniqueCallSite = GlobalUniqueCallSite++;
   SDValue TempChain = Chain;
   Chain = DAG.getCALLSEQ_START(Chain, UniqueCallSite, 0, dl);
   SDValue InGlue = Chain.getValue(1);
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index f41c569a65544..ff0241886223b 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -273,6 +273,8 @@ class NVPTXTargetLowering : public TargetLowering {
 
 private:
   const NVPTXSubtarget &STI; // cache the subtarget here
+  mutable unsigned GlobalUniqueCallSite;
+
   SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;
 
   SDValue LowerADDRSPACECAST(SDValue Op, SelectionDAG &DAG) const;

@Artem-B
Copy link
Member

Artem-B commented Mar 7, 2025

Can you elaborate on the problem you're trying to solve? What kind of race conditions do you see while updating an atomic variable?

Moving the counter into NVPTXTargetLowering, instead of relying on a global counter, is fine in principle, I just want to understand what's going on.

@AlexMaclean
Copy link
Member Author

Can you elaborate on the problem you're trying to solve? What kind of race conditions do you see while updating an atomic variable?

Moving the counter into NVPTXTargetLowering, instead of relying on a global counter, is fine in principle, I just want to understand what's going on.

The problem is that if a process is doing multiple distinct compilations of different modules they will share GlobalUniqueCallSite. As a result, depending on the order in which these compilations occur, the emitted PTX will be different. With this change the emitted PTX of one compilation will not be changed by whatever compilations were run before. Of course this won't be an issue for llc but we have encountered this internally.

Even setting that aside, I think it is probably a little bit cleaner/preferable to use a class member over a global variable.

@Artem-B
Copy link
Member

Artem-B commented Mar 7, 2025

OK, so it's not exactly a race condition, but rather compilation stability in multi-threaded compilation.

I'd rephrase the patch description along the lines of "make call site enumeration local to individual compilation, which ensures that call site numbers are consistently sequential within each compilation and is independent of other compilations happening in parallel".

Copy link
Member

@Artem-B Artem-B left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo patch description.

@AlexMaclean AlexMaclean merged commit 1b01f05 into llvm:main Mar 7, 2025
13 checks passed
jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
…#130212)

This change moves GlobalUniqueCallSite into NVPTXISelLowering. In
processes where multiple compilations occur, this makes call site
enumeration local to individual compilation, which ensures that call
site numbers are consistently sequential within each compilation and is
independent of other compilations happening in parallel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants