-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Make GlobalUniqueCallSite a member of NVPTXISelLowering #130212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Make GlobalUniqueCallSite a member of NVPTXISelLowering #130212
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Alex MacLean (AlexMaclean) ChangesMoving GlobalUniqueCallSite into NVPTXISelLowering ensures that in processes where multiple compilations occur, race conditions do not impact the generated PTX. Full diff: https://github.com/llvm/llvm-project/pull/130212.diff 2 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 3e755c25fd91a..b62c15ddb97d3 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -73,8 +73,6 @@
using namespace llvm;
-static std::atomic<unsigned> GlobalUniqueCallSite;
-
static cl::opt<bool> sched4reg(
"nvptx-sched4reg",
cl::desc("NVPTX Specific: schedule for register pressue"), cl::init(false));
@@ -500,7 +498,7 @@ static SDValue MaybeBitcast(SelectionDAG &DAG, SDLoc DL, EVT VT,
// NVPTXTargetLowering Constructor.
NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
const NVPTXSubtarget &STI)
- : TargetLowering(TM), nvTM(&TM), STI(STI) {
+ : TargetLowering(TM), nvTM(&TM), STI(STI), GlobalUniqueCallSite(0) {
// always lower memset, memcpy, and memmove intrinsics to load/store
// instructions, rather
// then generating calls to memset, mempcy or memmove.
@@ -1474,7 +1472,7 @@ SDValue NVPTXTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
unsigned FirstVAArg = CLI.NumFixedArgs; // position of the first variadic
unsigned VAOffset = 0; // current offset in the param array
- unsigned UniqueCallSite = GlobalUniqueCallSite.fetch_add(1);
+ const unsigned UniqueCallSite = GlobalUniqueCallSite++;
SDValue TempChain = Chain;
Chain = DAG.getCALLSEQ_START(Chain, UniqueCallSite, 0, dl);
SDValue InGlue = Chain.getValue(1);
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
index f41c569a65544..ff0241886223b 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.h
@@ -273,6 +273,8 @@ class NVPTXTargetLowering : public TargetLowering {
private:
const NVPTXSubtarget &STI; // cache the subtarget here
+ mutable unsigned GlobalUniqueCallSite;
+
SDValue getParamSymbol(SelectionDAG &DAG, int idx, EVT) const;
SDValue LowerADDRSPACECAST(SDValue Op, SelectionDAG &DAG) const;
|
Can you elaborate on the problem you're trying to solve? What kind of race conditions do you see while updating an atomic variable? Moving the counter into NVPTXTargetLowering, instead of relying on a global counter, is fine in principle, I just want to understand what's going on. |
The problem is that if a process is doing multiple distinct compilations of different modules they will share GlobalUniqueCallSite. As a result, depending on the order in which these compilations occur, the emitted PTX will be different. With this change the emitted PTX of one compilation will not be changed by whatever compilations were run before. Of course this won't be an issue for llc but we have encountered this internally. Even setting that aside, I think it is probably a little bit cleaner/preferable to use a class member over a global variable. |
OK, so it's not exactly a race condition, but rather compilation stability in multi-threaded compilation. I'd rephrase the patch description along the lines of "make call site enumeration local to individual compilation, which ensures that call site numbers are consistently sequential within each compilation and is independent of other compilations happening in parallel". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo patch description.
…#130212) This change moves GlobalUniqueCallSite into NVPTXISelLowering. In processes where multiple compilations occur, this makes call site enumeration local to individual compilation, which ensures that call site numbers are consistently sequential within each compilation and is independent of other compilations happening in parallel.
This change moves GlobalUniqueCallSite into NVPTXISelLowering. In processes where multiple compilations occur, this makes call site enumeration local to individual compilation, which ensures that call site numbers are consistently sequential within each compilation and is independent of other compilations happening in parallel.