Skip to content

[PowerPC][AIX] Enable aix-small-local-dynamic-tls target attribute #86641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 12, 2024
Merged

[PowerPC][AIX] Enable aix-small-local-dynamic-tls target attribute #86641

merged 5 commits into from
Apr 12, 2024

Conversation

orcguru
Copy link

@orcguru orcguru commented Mar 26, 2024

Following the aix-small-local-exec-tls target attribute, this patch adds a target attribute for an AIX-specific option in llc that informs the compiler that it can use a faster access sequence for the local-dynamic TLS model (formally named aix-small-local-dynamic-tls) when TLS variables are less than ~32KB in size.

The patch either produces an addi/la with a displacement off of module handle (return value from .__tls_get_mod) when the address is calculated, or it produces an addi/la followed by a load/store when the address is calculated and used for further accesses.

@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Felix (Ting Wang) (orcguru)

Changes

Following the aix-small-local-exec-tls target attribute, this patch adds a target attribute for an AIX-specific option in llc that informs the compiler that it can use a faster access sequence for the local-dynamic TLS model (formally named aix-small-local-dynamic-tls) when TLS variables are less than ~32KB in size.

The patch either produces an addi/la with a displacement off of module handle (return value from .__tls_get_mod) when the address is calculated, or it produces an addi/la followed by a load/store when the address is calculated and used for further accesses.


Patch is 160.98 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/86641.diff

10 Files Affected:

  • (modified) llvm/lib/Target/PowerPC/MCTargetDesc/PPCXCOFFObjectWriter.cpp (+4)
  • (modified) llvm/lib/Target/PowerPC/PPC.td (+9)
  • (modified) llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp (+13-7)
  • (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+27-4)
  • (modified) llvm/lib/Target/PowerPC/PPCMCInstLower.cpp (+8-5)
  • (modified) llvm/lib/Target/PowerPC/PPCSubtarget.cpp (+13-13)
  • (modified) llvm/test/CodeGen/PowerPC/aix-small-local-dynamic-tls-largeaccess.ll (+71-111)
  • (modified) llvm/test/CodeGen/PowerPC/aix-small-local-dynamic-tls-types.ll (+1356-245)
  • (modified) llvm/test/CodeGen/PowerPC/check-aix-small-local-exec-tls-opt-IRattribute.ll (+17-8)
  • (modified) llvm/test/CodeGen/PowerPC/check-aix-small-local-exec-tls-opt.ll (+11-3)
diff --git a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCXCOFFObjectWriter.cpp b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCXCOFFObjectWriter.cpp
index f4998e9b9dcba8..714ce64a39391e 100644
--- a/llvm/lib/Target/PowerPC/MCTargetDesc/PPCXCOFFObjectWriter.cpp
+++ b/llvm/lib/Target/PowerPC/MCTargetDesc/PPCXCOFFObjectWriter.cpp
@@ -71,6 +71,8 @@ std::pair<uint8_t, uint8_t> PPCXCOFFObjectWriter::getRelocTypeAndSignSize(
       return {XCOFF::RelocationType::R_TOCL, SignAndSizeForHalf16};
     case MCSymbolRefExpr::VK_PPC_AIX_TLSLE:
       return {XCOFF::RelocationType::R_TLS_LE, SignAndSizeForHalf16};
+    case MCSymbolRefExpr::VK_PPC_AIX_TLSLD:
+      return {XCOFF::RelocationType::R_TLS_LD, SignAndSizeForHalf16};
     }
   } break;
   case PPC::fixup_ppc_half16ds:
@@ -86,6 +88,8 @@ std::pair<uint8_t, uint8_t> PPCXCOFFObjectWriter::getRelocTypeAndSignSize(
       return {XCOFF::RelocationType::R_TOCL, 15};
     case MCSymbolRefExpr::VK_PPC_AIX_TLSLE:
       return {XCOFF::RelocationType::R_TLS_LE, 15};
+    case MCSymbolRefExpr::VK_PPC_AIX_TLSLD:
+      return {XCOFF::RelocationType::R_TLS_LD, 15};
     }
   } break;
   case PPC::fixup_ppc_br24:
diff --git a/llvm/lib/Target/PowerPC/PPC.td b/llvm/lib/Target/PowerPC/PPC.td
index 535616d33a8032..12d6b868f28545 100644
--- a/llvm/lib/Target/PowerPC/PPC.td
+++ b/llvm/lib/Target/PowerPC/PPC.td
@@ -329,6 +329,15 @@ def FeatureAIXLocalExecTLS :
                    "Produce a TOC-free local-exec TLS sequence for this function "
                    "for 64-bit AIX">;
 
+// Specifies that local-dynamic TLS accesses in any function with this target
+// attribute should use the optimized sequence (where the offset is an immediate
+// off module-hlandle for which the linker might add fix-up code if the
+// immediate is too large).
+def FeatureAIXLocalDynamicTLS :
+  SubtargetFeature<"aix-small-local-dynamic-tls", "HasAIXSmallLocalDynamicTLS",
+                   "true", "Produce a faster local-dynamic TLS sequence for this "
+                   " function for 64-bit AIX">;
+
 def FeaturePredictableSelectIsExpensive :
   SubtargetFeature<"predictable-select-expensive",
                    "PredictableSelectIsExpensive",
diff --git a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index 16942c6893a16d..7716aa4dc70f5f 100644
--- a/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -803,7 +803,8 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
   MCInst TmpInst;
   const bool IsPPC64 = Subtarget->isPPC64();
   const bool IsAIX = Subtarget->isAIXABI();
-  const bool HasAIXSmallLocalExecTLS = Subtarget->hasAIXSmallLocalExecTLS();
+  const bool HasAIXSmallLocalTLS = Subtarget->hasAIXSmallLocalExecTLS() ||
+                                   Subtarget->hasAIXSmallLocalDynamicTLS();
   const Module *M = MF->getFunction().getParent();
   PICLevel::Level PL = M->getPICLevel();
 
@@ -1612,11 +1613,11 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
   case PPC::LFD:
   case PPC::STFD:
   case PPC::ADDI8: {
-    // A faster non-TOC-based local-exec sequence is represented by `addi`
-    // or a load/store instruction (that directly loads or stores off of the
-    // thread pointer) with an immediate operand having the MO_TPREL_FLAG.
+    // A faster non-TOC-based local-[exec|dynamic] sequence is represented by
+    // `addi` or a load/store instruction (that directly loads or stores off of
+    // the thread pointer) with an immediate operand having the MO_TPREL_FLAG.
     // Such instructions do not otherwise arise.
-    if (!HasAIXSmallLocalExecTLS)
+    if (!HasAIXSmallLocalTLS)
       break;
     bool IsMIADDI8 = MI->getOpcode() == PPC::ADDI8;
     unsigned OpNum = IsMIADDI8 ? 2 : 1;
@@ -1624,7 +1625,7 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
     unsigned Flag = MO.getTargetFlags();
     if (Flag == PPCII::MO_TPREL_FLAG ||
         Flag == PPCII::MO_GOT_TPREL_PCREL_FLAG ||
-        Flag == PPCII::MO_TPREL_PCREL_FLAG) {
+        Flag == PPCII::MO_TPREL_PCREL_FLAG || Flag == PPCII::MO_TLSLD_FLAG) {
       LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
 
       const MCExpr *Expr = getAdjustedLocalExecExpr(MO, MO.getOffset());
@@ -1672,7 +1673,12 @@ const MCExpr *PPCAsmPrinter::getAdjustedLocalExecExpr(const MachineOperand &MO,
 
   assert(MO.isGlobal() && "Only expecting a global MachineOperand here!");
   const GlobalValue *GValue = MO.getGlobal();
-  assert(TM.getTLSModel(GValue) == TLSModel::LocalExec &&
+  // TODO: handle aix-small-local-dynamic-tls none-zero offset case.
+  TLSModel::Model Model = TM.getTLSModel(GValue);
+  if (Model == TLSModel::LocalDynamic) {
+    return nullptr;
+  }
+  assert(Model == TLSModel::LocalExec &&
          "Only local-exec accesses are handled!");
 
   bool IsGlobalADeclaration = GValue->isDeclarationForLinker();
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index cce0efad39c75b..20725614ea5539 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -149,10 +149,10 @@ static SDValue widenVec(SelectionDAG &DAG, SDValue Vec, const SDLoc &dl);
 
 static const char AIXSSPCanaryWordName[] = "__ssp_canary_word";
 
-// A faster local-exec TLS access sequence (enabled with the
-// -maix-small-local-exec-tls option) can be produced for TLS variables;
-// consistent with the IBM XL compiler, we apply a max size of slightly under
-// 32KB.
+// A faster local-[exec|dynamic] TLS access sequence (enabled with the
+// -maix-small-local-[exec|dynamic]-tls option) can be produced for TLS
+// variables; consistent with the IBM XL compiler, we apply a max size of
+// slightly under 32KB.
 constexpr uint64_t AIXSmallTlsPolicySizeLimit = 32751;
 
 // FIXME: Remove this once the bug has been fixed!
@@ -3368,6 +3368,7 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
   EVT PtrVT = getPointerTy(DAG.getDataLayout());
   bool Is64Bit = Subtarget.isPPC64();
   bool HasAIXSmallLocalExecTLS = Subtarget.hasAIXSmallLocalExecTLS();
+  bool HasAIXSmallLocalDynamicTLS = Subtarget.hasAIXSmallLocalDynamicTLS();
   TLSModel::Model Model = getTargetMachine().getTLSModel(GV);
   bool IsTLSLocalExecModel = Model == TLSModel::LocalExec;
 
@@ -3419,6 +3420,12 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
   }
 
   if (Model == TLSModel::LocalDynamic) {
+    // We do not implement the 32-bit version of the faster access sequence
+    // for local-dynamic that is controlled by -maix-small-local-dynamic-tls.
+    if (!Is64Bit && HasAIXSmallLocalDynamicTLS)
+      report_fatal_error("The small-local-dynamic TLS access sequence is "
+                         "currently only supported on AIX (64-bit mode).");
+
     // For local-dynamic on AIX, we need to generate one TOC entry for each
     // variable offset, and a single module-handle TOC entry for the entire
     // file.
@@ -3439,6 +3446,22 @@ SDValue PPCTargetLowering::LowerGlobalTLSAddressAIX(SDValue Op,
     SDValue ModuleHandle =
         DAG.getNode(PPCISD::TLSLD_AIX, dl, PtrVT, ModuleHandleTOC);
 
+    // With the -maix-small-local-dynamic-tls option, produce a faster access
+    // sequence for local-dynamic TLS variables where the offset from the
+    // module-handle is encoded as an immediate operand.
+    //
+    // We only utilize the faster local-dynamic access sequence when the TLS
+    // variable has a size within the policy limit. We treat types that are
+    // not sized or are empty as being over the policy size limit.
+    if (HasAIXSmallLocalDynamicTLS) {
+      Type *GVType = GV->getValueType();
+      if (GVType->isSized() && !GVType->isEmptyTy() &&
+          GV->getParent()->getDataLayout().getTypeAllocSize(GVType) <=
+              AIXSmallTlsPolicySizeLimit)
+        return DAG.getNode(PPCISD::Lo, dl, PtrVT, VariableOffsetTGA,
+                           ModuleHandle);
+    }
+
     return DAG.getNode(ISD::ADD, dl, PtrVT, ModuleHandle, VariableOffset);
   }
 
diff --git a/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp b/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
index 9a3ca5a7829362..c05bb37e58bf60 100644
--- a/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
+++ b/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
@@ -96,15 +96,18 @@ static MCOperand GetSymbolRef(const MachineOperand &MO, const MCSymbol *Symbol,
     RefKind = MCSymbolRefExpr::VK_PPC_GOT_TLSLD_PCREL;
   else if (MO.getTargetFlags() == PPCII::MO_GOT_TPREL_PCREL_FLAG)
     RefKind = MCSymbolRefExpr::VK_PPC_GOT_TPREL_PCREL;
-  else if (MO.getTargetFlags() == PPCII::MO_TPREL_FLAG) {
+  else if (MO.getTargetFlags() == PPCII::MO_TPREL_FLAG ||
+           MO.getTargetFlags() == PPCII::MO_TLSLD_FLAG) {
     assert(MO.isGlobal() && "Only expecting a global MachineOperand here!");
     TLSModel::Model Model = TM.getTLSModel(MO.getGlobal());
-    // For the local-exec TLS model, we may generate the offset from the TLS
-    // base as an immediate operand (instead of using a TOC entry).
-    // Set the relocation type in case the result is used for purposes other
-    // than a TOC reference. In TOC reference cases, this result is discarded.
+    // For the local-[exec|dynamic] TLS model, we may generate the offset from
+    // the TLS base as an immediate operand (instead of using a TOC entry). Set
+    // the relocation type in case the result is used for purposes other than a
+    // TOC reference. In TOC reference cases, this result is discarded.
     if (Model == TLSModel::LocalExec)
       RefKind = MCSymbolRefExpr::VK_PPC_AIX_TLSLE;
+    else if (Model == TLSModel::LocalDynamic)
+      RefKind = MCSymbolRefExpr::VK_PPC_AIX_TLSLD;
   }
 
   const MachineInstr *MI = MO.getParent();
diff --git a/llvm/lib/Target/PowerPC/PPCSubtarget.cpp b/llvm/lib/Target/PowerPC/PPCSubtarget.cpp
index 653d9bda99192a..d1722555f1fcb3 100644
--- a/llvm/lib/Target/PowerPC/PPCSubtarget.cpp
+++ b/llvm/lib/Target/PowerPC/PPCSubtarget.cpp
@@ -124,22 +124,22 @@ void PPCSubtarget::initSubtargetFeatures(StringRef CPU, StringRef TuneCPU,
   // Determine endianness.
   IsLittleEndian = TM.isLittleEndian();
 
-  if (HasAIXSmallLocalExecTLS) {
+  if (HasAIXSmallLocalExecTLS || HasAIXSmallLocalDynamicTLS) {
     if (!TargetTriple.isOSAIX() || !IsPPC64)
-      report_fatal_error(
-          "The aix-small-local-exec-tls attribute is only supported on AIX in "
-          "64-bit mode.\n",
-          false);
-    // The aix-small-local-exec-tls attribute should only be used with
+      report_fatal_error("The aix-small-local-[exec|dynamic]-tls attribute is "
+                         "only supported on AIX in "
+                         "64-bit mode.\n",
+                         false);
+    // The aix-small-local-[exec|dynamic]-tls attribute should only be used with
     // -data-sections, as having data sections turned off with this option
-    // is not ideal for performance. Moreover, the small-local-exec-tls region
-    // is a limited resource, and should not be used for variables that may
-    // be replaced.
+    // is not ideal for performance. Moreover, the
+    // small-local-[exec|dynamic]-tls region is a limited resource, and should
+    // not be used for variables that may be replaced.
     if (!TM.getDataSections())
-      report_fatal_error(
-          "The aix-small-local-exec-tls attribute can only be specified with "
-          "-data-sections.\n",
-          false);
+      report_fatal_error("The aix-small-local-[exec|dynamic]-tls attribute can "
+                         "only be specified with "
+                         "-data-sections.\n",
+                         false);
   }
 }
 
diff --git a/llvm/test/CodeGen/PowerPC/aix-small-local-dynamic-tls-largeaccess.ll b/llvm/test/CodeGen/PowerPC/aix-small-local-dynamic-tls-largeaccess.ll
index eb16bae67150e3..7db1048c258cd0 100644
--- a/llvm/test/CodeGen/PowerPC/aix-small-local-dynamic-tls-largeaccess.ll
+++ b/llvm/test/CodeGen/PowerPC/aix-small-local-dynamic-tls-largeaccess.ll
@@ -1,9 +1,9 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
 ; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
-; RUN:      -mtriple powerpc64-ibm-aix-xcoff < %s \
+; RUN:      -mattr=+aix-small-local-dynamic-tls -mtriple powerpc64-ibm-aix-xcoff < %s \
 ; RUN:      | FileCheck %s --check-prefix=SMALL-LOCAL-DYNAMIC-SMALLCM64
 ; RUN: llc  -verify-machineinstrs -mcpu=pwr7 -ppc-asm-full-reg-names \
-; RUN:      -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
+; RUN:      -mattr=+aix-small-local-dynamic-tls -mtriple powerpc64-ibm-aix-xcoff --code-model=large \
 ; RUN:      < %s | FileCheck %s \
 ; RUN:      --check-prefix=SMALL-LOCAL-DYNAMIC-LARGECM64
 
@@ -39,27 +39,23 @@ define signext i32 @test1() {
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stdu r1, -48(r1)
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tlsldm) @"_$TLSML"
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    std r0, 64(r1)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r6, L..C1(r2) # target-flags(ppc-tlsld) @ElementIntTLS2
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r7, L..C2(r2) # target-flags(ppc-tlsld) @ElementIntTLS3
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r8, L..C3(r2) # target-flags(ppc-tlsld) @ElementIntTLS4
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r9, L..C4(r2) # target-flags(ppc-tlsld) @ElementIntTLS5
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r6, 4
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    bla .__tls_get_mod[PR]
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r5, L..C5(r2) # target-flags(ppc-tlsld) @ElementIntTLSv1
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 1
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    add r6, r3, r6
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    add r7, r3, r7
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    add r8, r3, r8
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    add r9, r3, r9
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stwux r4, r3, r5
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 4
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r4, 24(r3)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r3, 2
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r3, 320(r6)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r3, 3
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r3, 324(r7)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r3, 88
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r4, 328(r8)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r3, 332(r9)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r5, 1
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r4, ElementIntTLSv1[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r5, ElementIntTLSv1[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r5, ElementIntTLS2[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r6, 24(r4)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 2
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r4, 320(r5)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r4, ElementIntTLS3[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r5, 3
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r5, 324(r4)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r4, ElementIntTLS4[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r3, ElementIntTLS5[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r6, 328(r4)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 88
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    stw r4, 332(r3)
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r3, 102
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    addi r1, r1, 48
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r0, 16(r1)
@@ -71,34 +67,25 @@ define signext i32 @test1() {
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    mflr r0
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stdu r1, -48(r1)
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addis r3, L..C0@u(r2)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addis r6, L..C1@u(r2)
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    std r0, 64(r1)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addis r7, L..C2@u(r2)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r6, 4
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r3, L..C0@l(r3)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addis r8, L..C3@u(r2)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addis r9, L..C4@u(r2)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r7, L..C2@l(r7)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r8, L..C3@l(r8)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r9, L..C4@l(r9)
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    bla .__tls_get_mod[PR]
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r5, L..C1@l(r6)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addis r6, L..C5@u(r2)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r4, 1
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r6, L..C5@l(r6)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    add r7, r3, r7
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    add r8, r3, r8
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    add r9, r3, r9
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    add r6, r3, r6
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stwux r4, r3, r5
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r4, 4
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r4, 24(r3)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r3, 2
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r3, 320(r6)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r3, 3
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r3, 324(r7)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r3, 88
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r4, 328(r8)
-; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r3, 332(r9)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r5, 1
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    la r4, ElementIntTLSv1[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r5, ElementIntTLSv1[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    la r5, ElementIntTLS2[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r6, 24(r4)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r4, 2
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r4, 320(r5)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    la r4, ElementIntTLS3[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r5, 3
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r5, 324(r4)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    la r4, ElementIntTLS4[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    la r3, ElementIntTLS5[TL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r6, 328(r4)
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r4, 88
+; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    stw r4, 332(r3)
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    li r3, 102
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    addi r1, r1, 48
 ; SMALL-LOCAL-DYNAMIC-LARGECM64-NEXT:    ld r0, 16(r1)
@@ -144,31 +131,26 @@ define i64 @test2() {
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r3, L..C0(r2) # target-flags(ppc-tlsldm) @"_$TLSML"
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    std r0, 64(r1)
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    bla .__tls_get_mod[PR]
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r4, L..C6(r2) # target-flags(ppc-tlsld) @ElementLongTLS6
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    mr r6, r3
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r3, 212
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    add r4, r6, r4
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    std r3, 424(r4)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r3, L..C7(r2) # target-flags(ppc-tlsld) @ElementLongTLS2
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r3, ElementLongTLS6[UL]@ld(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 212
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    std r4, 424(r3)
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    la r3, ElementLongTLS2[TL]@ld(r6)
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 203
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    add r3, r6, r3
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    std r4, 1200(r3)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r3, L..C8(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r4, L..C9(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r3, L..C1(r2) # target-flags(ppc-tlsgdm) @MyTLSGDVar
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r4, L..C2(r2) # target-flags(ppc-tlsgd) @MyTLSGDVar
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    bla .__tls_get_addr[PR]
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    li r4, 44
 ; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    std r4, 440(r3)
-; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT:    ld r3, L..C10(r2) # target-flags(ppc-tlsld) @ElementLongTLS3
+; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEX...
[truncated]

@orcguru
Copy link
Author

orcguru commented Mar 26, 2024

There will be some interactions between this patch and #84132. I will work on that part, and will update later.

@orcguru
Copy link
Author

orcguru commented Mar 28, 2024

There will be some interactions between this patch and #84132. I will work on that part, and will update later.

I'm not able to show the interworking part directly in this patch, so I created private branch and show the change there:
orcguru@a173d24

Copy link
Contributor

@amy-kwan amy-kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments for now. Will a subsequent front end patch also come?

@bzEq
Copy link
Collaborator

bzEq commented Apr 2, 2024

Overall lgtm, the title is confusing, could you adjust it?

@orcguru orcguru changed the title [PowerPC][AIX] Enable aix-small-local-dynamic-tls target attribute without folding opt [PowerPC][AIX] Enable aix-small-local-dynamic-tls target attribute Apr 3, 2024
@orcguru
Copy link
Author

orcguru commented Apr 9, 2024

Some initial comments for now. Will a subsequent front end patch also come?

Hi Amy, subsequently I will follow your commits (b1922e5 and then 2a50921). The next one will enable the functionality through front end by adding "-maix-small-local-dynamic-tls" clang option. After that, I will refactor some existing logic to handle non-zero offsets for TLS local-dynamic.

Let me know if you have other thoughts. Thank you!

@orcguru
Copy link
Author

orcguru commented Apr 9, 2024

Overall lgtm, the title is confusing, could you adjust it?

Updated. Thank you!

@orcguru orcguru requested a review from amy-kwan April 9, 2024 02:49
Copy link
Collaborator

@bzEq bzEq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

Copy link
Contributor

@amy-kwan amy-kwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@orcguru orcguru merged commit 09d51a8 into llvm:main Apr 12, 2024
@orcguru orcguru deleted the aix_small_tls_local_dynamic_p1_v3 branch March 14, 2025 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants