Skip to content

[SystemZ] Don't lower float/double ATOMIC_[LOAD|STORE] to [LOAD|STORE] #75879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 18, 2024

Conversation

JonPsson1
Copy link
Contributor

Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and do this transformation in Select() instead.

These nodes should not be exposed to DAGCombiner as they were with the non-atomic opcodes, and the general intent is that atomic operations should have the ATOMIC opcodes.

AtomicExpand pass no longer casts float/double load/stores to integer, but the fp128 still is as there is a special handling needed for 128-bit accesses.

@JonPsson1 JonPsson1 self-assigned this Dec 19, 2023
Copy link

github-actions bot commented Dec 19, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 09bc6abba6e226ad5e9d18d4365690d6f04de21a c10f882e6787762e949766dc54b4ce70114c4356 -- llvm/include/llvm/CodeGen/SelectionDAGNodes.h llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h
View the diff from clang-format here.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 20375a0f92..f92a563c98 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -845,10 +845,18 @@ void SDNode::print_details(raw_ostream &OS, const SelectionDAG *G) const {
       if (A->getOpcode() == ISD::ATOMIC_LOAD) {
         bool doExt = true;
         switch (A->getExtensionType()) {
-        default: doExt = false; break;
-        case ISD::EXTLOAD:  OS << ", anyext"; break;
-        case ISD::SEXTLOAD: OS << ", sext"; break;
-        case ISD::ZEXTLOAD: OS << ", zext"; break;
+        default:
+          doExt = false;
+          break;
+        case ISD::EXTLOAD:
+          OS << ", anyext";
+          break;
+        case ISD::SEXTLOAD:
+          OS << ", sext";
+          break;
+        case ISD::ZEXTLOAD:
+          OS << ", zext";
+          break;
         }
         if (doExt)
           OS << " from " << A->getMemoryVT();
diff --git a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
index deaf3dcaeb..39a7a417dd 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
@@ -1765,8 +1765,8 @@ void SystemZDAGToDAGISel::Select(SDNode *Node) {
     // ok since we know all store instructions <= 8 bytes are atomic, and the
     // 16 byte case is already handled during lowering.
     StoreSDNode *St = cast<StoreSDNode>(CurDAG->getTruncStore(
-         AtomOp->getChain(), SDLoc(AtomOp), AtomOp->getVal(),
-         AtomOp->getBasePtr(), AtomOp->getMemoryVT(), AtomOp->getMemOperand()));
+        AtomOp->getChain(), SDLoc(AtomOp), AtomOp->getVal(),
+        AtomOp->getBasePtr(), AtomOp->getMemoryVT(), AtomOp->getMemOperand()));
     assert(St->getMemOperand()->isAtomic() && "Broken MMO.");
     SDNode *Chain = St;
     // We have to enforce sequential consistency by performing a

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me, but see a couple of questions inline with the test case.

; CHECK: srlg [[R]], [[R]], 32
; CHECK: st [[R]], 0(%r2)
; CHECK: ste %f0, 0(%r2)
; CHECK: bcr 15, %r0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the brc 15 missing before, or was just the check missing? I see the check is also missing in the double version of this test. These test should probably best be fully auto-generated.

Also, why do the long double versions of the tests use library calls? Shouldn't they use lpq/stpq plus GPR/FPR moves? In addition, for the long double (and also the i128) versions, we really should test both z10 and z13.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the CHECK for bcr 15 was missing before - sorry should have mentioned that.

For the long double tests, see #75630 (comment). I think the way the tests were written the output would actually be the same for z10/z13...

@jyknight
Copy link
Member

I don't know if this patch also works, but why not translate atomic_load_8/etc patterns directly in tablegen to instructions, and skip the generic LOAD entirely, like e.g. X86 does:

def : Pat<(i8  (atomic_load_8 addr:$src)),  (MOV8rm addr:$src)>;
def : Pat<(i16 (atomic_load_16 addr:$src)), (MOV16rm addr:$src)>;
def : Pat<(i32 (atomic_load_32 addr:$src)), (MOV32rm addr:$src)>;
def : Pat<(i64 (atomic_load_64 addr:$src)), (MOV64rm addr:$src)>;
[...]
def : Pat<(atomic_store_8 (i8 imm:$src), addr:$dst),
          (MOV8mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_16 (i16 imm:$src), addr:$dst),
          (MOV16mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_32 (i32 imm:$src), addr:$dst),
          (MOV32mi addr:$dst, imm:$src)>;
def : Pat<(atomic_store_64 (i64immSExt32:$src), addr:$dst),
          (MOV64mi32 addr:$dst, i64immSExt32:$src)>;

You only need to deal with the trailing fence after the store; for that, could have a Custom action which, if the store is SequentiallyConsistent, converts it to a Release atomic_store plus the fence, otherwise passes it along.

@JonPsson1
Copy link
Contributor Author

I don't know if this patch also works, but why not translate atomic_load_8/etc patterns directly in tablegen to instructions, and skip the generic LOAD entirely, like e.g. X86 does:

On SystemZ, the memory instructions are only atomic if naturally aligned, so the non-atomic ISD opcode would still be needed for unaligned accesses.

I however now see a problem with this patch as the reg/mem folding does not seem to be working as I had expected, which need to look into.

@JonPsson1
Copy link
Contributor Author

As @arsenm suggested getting rid of ATOMIC_LOAD, while I thought the general idea was to clean up LOAD/STORE nodes handling by freeing them of possibly being atomic, I wonder if there is a consensus about this to follow...?

The reason I am asking is that it doesn't seem quite trivial after all to make sure all the reg/mem folding work both for LOADs and ATOMIC_LOADs in the SystemZ backend.

@jyknight
Copy link
Member

I don't really know what arsenm or @preames has/had in mind, but I think there is value in all the backends working in a similar way here, rather than SystemZ doing something different.

The reason I'm skeptical about merging the two operations is that there is a significant difference between LOAD and ATOMIC_LOAD beyond just the memory-ordering: a LOAD (even in some cases when it's volatile!) can be split (both by SelectionDAG expansions, and potentially within the hardware instruction that was selected). An ATOMIC_LOAD cannot be. Given the difference, I'm not sure merging the two into one opcode actually makes things simpler.

@JonPsson1
Copy link
Contributor Author

I am not sure which I think would be best: it seems natural to simply have "LOAD" as it has the MMO with the atomic bits. On the other hand, it kind of feel safer to have the ATOMIC_LOAD opcode to keep it separate from all optimizations on LOADs, even though guarding that with a simple isAtomic() check might be simple enough.

I guess the next step might be to rework the SystemZ implementation with this patch and even after that the ATOMIC_LOAD could be eliminated at some future point if needed.

Strictly speaking though, from a SystemZ backend perspective it would be simpler to just have the LOAD opcode as the same instruction is to be selected. Not sure how much work that would be, but we would have to be sure that the MMO is properly preserved everywhere, that optimizations are guarded with an isAtomic() check, and that backends like PowerPC selecting atomic loads with ATOMIC_LOAD learned how to do that by checking the MMO of a LOAD instead.

@preames
Copy link
Collaborator

preames commented Dec 21, 2023

I don't really know what arsenm or @preames has/had in mind,

I was only getting the code back into the state it was in before I'd added complexity which never became worthwhile. I have no strong opinions on how SystemZ should or should not implement this. Having atomic loads represented by a normal load with the appropriate MMO isn't a bad idea, it just requires a lot more work through generic DAG than I had time to ever finish. If SystemZ is lowering late enough to avoid those problems (in practice..), then so be it.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Jan 3, 2024
@llvmbot
Copy link
Member

llvmbot commented Jan 3, 2024

@llvm/pr-subscribers-backend-powerpc

@llvm/pr-subscribers-llvm-selectiondag

Author: Jonas Paulsson (JonPsson1)

Changes

Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and do this transformation in Select() instead.

These nodes should not be exposed to DAGCombiner as they were with the non-atomic opcodes, and the general intent is that atomic operations should have the ATOMIC opcodes.

AtomicExpand pass no longer casts float/double load/stores to integer, but the fp128 still is as there is a special handling needed for 128-bit accesses.


Patch is 45.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75879.diff

12 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+11)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp (+13)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+10)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp (+12)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp (+66)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+95-45)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+5-3)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrFP.td (+4-4)
  • (modified) llvm/lib/Target/SystemZ/SystemZOperators.td (+4)
  • (modified) llvm/test/CodeGen/SystemZ/atomic-load-06.ll (+1-3)
  • (added) llvm/test/CodeGen/SystemZ/atomic-memofolds.ll (+723)
  • (modified) llvm/test/CodeGen/SystemZ/atomic-store-06.ll (+1-4)
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 5c44538fe69974..59777020446eec 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -553,6 +553,7 @@ BEGIN_TWO_BYTE_PACK()
 
   class LoadSDNodeBitfields {
     friend class LoadSDNode;
+    friend class AtomicSDNode;
     friend class VPLoadSDNode;
     friend class VPStridedLoadSDNode;
     friend class MaskedLoadSDNode;
@@ -1462,6 +1463,16 @@ class AtomicSDNode : public MemSDNode {
             MMO->isAtomic()) && "then why are we using an AtomicSDNode?");
   }
 
+  void setExtensionType(ISD::LoadExtType ETy) {
+    assert(getOpcode() == ISD::ATOMIC_LOAD && "Only used for atomic loads.");
+    LoadSDNodeBits.ExtTy = ETy;
+  }
+
+  ISD::LoadExtType getExtensionType() const {
+    assert(getOpcode() == ISD::ATOMIC_LOAD && "Only used for atomic loads.");
+    return static_cast<ISD::LoadExtType>(LoadSDNodeBits.ExtTy);
+  }
+
   const SDValue &getBasePtr() const {
     return getOpcode() == ISD::ATOMIC_STORE ? getOperand(2) : getOperand(1);
   }
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
index 3d21bd22e6ef5d..8702e554864761 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -338,6 +338,19 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic0(AtomicSDNode *N) {
                               N->getMemoryVT(), ResVT,
                               N->getChain(), N->getBasePtr(),
                               N->getMemOperand());
+  if (N->getOpcode() == ISD::ATOMIC_LOAD) {
+    ISD::LoadExtType ETy = cast<AtomicSDNode>(N)->getExtensionType();
+    if (ETy == ISD::NON_EXTLOAD) {
+      if (TLI.getExtendForAtomicOps() == ISD::SIGN_EXTEND)
+        ETy = ISD::SEXTLOAD;
+      else if (TLI.getExtendForAtomicOps() == ISD::ZERO_EXTEND)
+        ETy = ISD::ZEXTLOAD;
+      else
+        ETy = ISD::EXTLOAD;
+    }
+    cast<AtomicSDNode>(Res)->setExtensionType(ETy);
+  }
+
   // Legalize the chain result - switch anything that used the old chain to
   // use the new one.
   ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 81facf92e55ae9..0849b5d38a6e7e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -4017,6 +4017,9 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt &DemandedElts,
     if (Op.getResNo() == 0) {
       if (TLI->getExtendForAtomicOps() == ISD::ZERO_EXTEND)
         Known.Zero.setBitsFrom(MemBits);
+      else if (Op->getOpcode() == ISD::ATOMIC_LOAD &&
+               cast<AtomicSDNode>(Op)->getExtensionType() == ISD::ZEXTLOAD)
+        Known.Zero.setBitsFrom(MemBits);
     }
     break;
   }
@@ -4828,6 +4831,13 @@ unsigned SelectionDAG::ComputeNumSignBits(SDValue Op, const APInt &DemandedElts,
         return VTBits - Tmp + 1;
       if (TLI->getExtendForAtomicOps() == ISD::ZERO_EXTEND)
         return VTBits - Tmp;
+      if (Op->getOpcode() == ISD::ATOMIC_LOAD) {
+        ISD::LoadExtType ETy = cast<AtomicSDNode>(Op)->getExtensionType();
+        if (ETy == ISD::SEXTLOAD)
+          return VTBits - Tmp + 1;
+        if (ETy == ISD::ZEXTLOAD)
+          return VTBits - Tmp;
+      }
     }
     break;
   }
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
index 78cc60084068a5..a878c2a99f521c 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
@@ -825,6 +825,18 @@ void SDNode::print_details(raw_ostream &OS, const SelectionDAG *G) const {
   } else if (const MemSDNode *M = dyn_cast<MemSDNode>(this)) {
     OS << "<";
     printMemOperand(OS, *M->getMemOperand(), G);
+    if (auto *A = dyn_cast<AtomicSDNode>(M))
+      if (A->getOpcode() == ISD::ATOMIC_LOAD) {
+        bool doExt = true;
+        switch (A->getExtensionType()) {
+        default: doExt = false; break;
+        case ISD::EXTLOAD:  OS << ", anyext"; break;
+        case ISD::SEXTLOAD: OS << ", sext"; break;
+        case ISD::ZEXTLOAD: OS << ", zext"; break;
+        }
+        if (doExt)
+          OS << " from " << A->getMemoryVT();
+      }
     OS << ">";
   } else if (const BlockAddressSDNode *BA =
                dyn_cast<BlockAddressSDNode>(this)) {
diff --git a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
index c7d8591c5bdf6f..25c0f10687117b 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
@@ -347,6 +347,9 @@ class SystemZDAGToDAGISel : public SelectionDAGISel {
   // Try to expand a boolean SELECT_CCMASK using an IPM sequence.
   SDValue expandSelectBoolean(SDNode *Node);
 
+  // Convert ATOMIC_LOADs to LOADs to facilitate instruction selection.
+  void convertATOMIC_LOADs(SDNode *Node, unsigned Depth = 0);
+
 public:
   static char ID;
 
@@ -1513,6 +1516,10 @@ bool SystemZDAGToDAGISel::storeLoadIsAligned(SDNode *N) const {
   MachineMemOperand *MMO = MemAccess->getMemOperand();
   assert(MMO && "Expected a memory operand.");
 
+  // These instructions are not atomic.
+  if (MMO->isAtomic())
+    return false;
+
   // The memory access must have a proper alignment and no index register.
   if (MemAccess->getAlign().value() < StoreSize ||
       !MemAccess->getOffset().isUndef())
@@ -1545,6 +1552,37 @@ bool SystemZDAGToDAGISel::storeLoadIsAligned(SDNode *N) const {
   return true;
 }
 
+// This is a hack to convert ATOMIC_LOADs to LOADs in the last minute just
+// before instruction selection begins. It would have been easier if
+// ATOMIC_LOAD nodes would instead always be built by SelectionDAGBuilder as
+// LOADs with an atomic MMO and properly handled as such in DAGCombiner, but
+// until that changes they need to remain as ATOMIC_LOADs until all
+// DAGCombining is done.  Convert Node or any of its operands from
+// ATOMIC_LOAD to LOAD.
+void SystemZDAGToDAGISel::convertATOMIC_LOADs(SDNode *Node, unsigned Depth) {
+  if (Depth > 1) // Chain operands are also followed so this seems enough.
+    return;
+  if (Node->getOpcode() == ISD::ATOMIC_LOAD) {
+    auto *ALoad = cast<AtomicSDNode>(Node);
+    // It seems necessary to morph the node as it is not yet being selected.
+    LoadSDNode *Ld = cast<LoadSDNode>(CurDAG->MorphNodeTo(
+        ALoad, ISD::LOAD, CurDAG->getVTList(ALoad->getValueType(0), MVT::Other),
+        {ALoad->getChain(), ALoad->getBasePtr()}));
+    // Sanity check the morph.  The extension type for an extending load
+    // should have been set prior to instruction selection and remain in the
+    // morphed node.
+    assert(((SDNode *)Ld) == ((SDNode *)ALoad) && "Bad CSE on atomic load.");
+    assert(Ld->getMemOperand()->isAtomic() && "Broken MMO.");
+    ISD::LoadExtType ETy = Ld->getExtensionType();
+    bool IsNonExt = Ld->getMemoryVT().getSizeInBits() ==
+                    Ld->getValueType(0).getSizeInBits();
+    assert(IsNonExt == (ETy == ISD::NON_EXTLOAD) && "Bad extension type.");
+    return;
+  }
+  for (SDValue Op : Node->ops())
+    convertATOMIC_LOADs(Op.getNode(), ++Depth);
+}
+
 void SystemZDAGToDAGISel::Select(SDNode *Node) {
   // If we have a custom node, we already have selected!
   if (Node->isMachineOpcode()) {
@@ -1553,6 +1591,9 @@ void SystemZDAGToDAGISel::Select(SDNode *Node) {
     return;
   }
 
+  // Prepare any ATOMIC_LOAD to be selected as a LOAD with an atomic MMO.
+  convertATOMIC_LOADs(Node);
+
   unsigned Opcode = Node->getOpcode();
   switch (Opcode) {
   case ISD::OR:
@@ -1744,6 +1785,31 @@ void SystemZDAGToDAGISel::Select(SDNode *Node) {
     }
     break;
   }
+
+  case ISD::ATOMIC_STORE: {
+    auto *AtomOp = cast<AtomicSDNode>(Node);
+    // Store FP values directly without first moving to a GPR.
+    EVT SVT = AtomOp->getMemoryVT();
+    SDValue StoredVal = AtomOp->getVal();
+    if (SVT.isInteger() && StoredVal->getOpcode() == ISD::BITCAST &&
+        StoredVal->getOperand(0).getValueType().isFloatingPoint()) {
+      StoredVal = StoredVal->getOperand(0);
+      SVT = StoredVal.getValueType();
+    }
+    StoreSDNode *St = cast<StoreSDNode>(CurDAG->getTruncStore(
+        AtomOp->getChain(), SDLoc(AtomOp), StoredVal, AtomOp->getBasePtr(), SVT,
+        AtomOp->getMemOperand()));
+    assert(St->getMemOperand()->isAtomic() && "Broken MMO.");
+    SDNode *Chain = St;
+    // We have to enforce sequential consistency by performing a
+    // serialization operation after the store.
+    if (AtomOp->getSuccessOrdering() == AtomicOrdering::SequentiallyConsistent)
+      Chain = CurDAG->getMachineNode(SystemZ::Serialize, SDLoc(AtomOp),
+                                     MVT::Other, SDValue(Chain, 0));
+    ReplaceNode(Node, Chain);
+    SelectCode(St);
+    return;
+  }
   }
 
   SelectCode(Node);
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 559f2ca476d709..768f23310fcea5 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -194,11 +194,6 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
       setOperationAction(ISD::UADDO_CARRY, VT, Custom);
       setOperationAction(ISD::USUBO_CARRY, VT, Custom);
 
-      // Lower ATOMIC_LOAD and ATOMIC_STORE into normal volatile loads and
-      // stores, putting a serialization instruction after the stores.
-      setOperationAction(ISD::ATOMIC_LOAD,  VT, Custom);
-      setOperationAction(ISD::ATOMIC_STORE, VT, Custom);
-
       // Lower ATOMIC_LOAD_SUB into ATOMIC_LOAD_ADD if LAA and LAAG are
       // available, or if the operand is constant.
       setOperationAction(ISD::ATOMIC_LOAD_SUB, VT, Custom);
@@ -693,7 +688,8 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::GET_ROUNDING, MVT::i32, Custom);
 
   // Codes for which we want to perform some z-specific combinations.
-  setTargetDAGCombine({ISD::ZERO_EXTEND,
+  setTargetDAGCombine({ISD::BITCAST,
+                       ISD::ZERO_EXTEND,
                        ISD::SIGN_EXTEND,
                        ISD::SIGN_EXTEND_INREG,
                        ISD::LOAD,
@@ -913,6 +909,22 @@ bool SystemZTargetLowering::hasInlineStackProbe(const MachineFunction &MF) const
   return false;
 }
 
+TargetLowering::AtomicExpansionKind
+SystemZTargetLowering::shouldCastAtomicLoadInIR(LoadInst *LI) const {
+  // Lower fp128 the same way as i128.
+  if (LI->getType()->isFP128Ty())
+    return AtomicExpansionKind::CastToInteger;
+  return AtomicExpansionKind::None;
+}
+
+TargetLowering::AtomicExpansionKind
+SystemZTargetLowering::shouldCastAtomicStoreInIR(StoreInst *SI) const {
+  // Lower fp128 the same way as i128.
+  if (SI->getValueOperand()->getType()->isFP128Ty())
+    return AtomicExpansionKind::CastToInteger;
+  return AtomicExpansionKind::None;
+}
+
 TargetLowering::AtomicExpansionKind
 SystemZTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
   // Don't expand subword operations as they require special treatment.
@@ -4503,40 +4515,14 @@ SDValue SystemZTargetLowering::lowerATOMIC_FENCE(SDValue Op,
   return DAG.getNode(ISD::MEMBARRIER, DL, MVT::Other, Op.getOperand(0));
 }
 
-// Op is an atomic load.  Lower it into a normal volatile load.
-SDValue SystemZTargetLowering::lowerATOMIC_LOAD(SDValue Op,
-                                                SelectionDAG &DAG) const {
+SDValue SystemZTargetLowering::lowerATOMIC_I128_LDST(SDValue Op,
+                                                     SelectionDAG &DAG) const {
   auto *Node = cast<AtomicSDNode>(Op.getNode());
-  if (Node->getMemoryVT() == MVT::i128) {
-    // Use same code to handle both legal and non-legal i128 types.
-    SmallVector<SDValue, 2> Results;
-    LowerOperationWrapper(Node, Results, DAG);
-    return DAG.getMergeValues(Results, SDLoc(Op));
-  }
-  return DAG.getExtLoad(ISD::EXTLOAD, SDLoc(Op), Op.getValueType(),
-                        Node->getChain(), Node->getBasePtr(),
-                        Node->getMemoryVT(), Node->getMemOperand());
-}
-
-// Op is an atomic store.  Lower it into a normal volatile store.
-SDValue SystemZTargetLowering::lowerATOMIC_STORE(SDValue Op,
-                                                 SelectionDAG &DAG) const {
-  auto *Node = cast<AtomicSDNode>(Op.getNode());
-  if (Node->getMemoryVT() == MVT::i128) {
-    // Use same code to handle both legal and non-legal i128 types.
-    SmallVector<SDValue, 1> Results;
-    LowerOperationWrapper(Node, Results, DAG);
-    return DAG.getMergeValues(Results, SDLoc(Op));
-  }
-  SDValue Chain = DAG.getTruncStore(Node->getChain(), SDLoc(Op), Node->getVal(),
-                                    Node->getBasePtr(), Node->getMemoryVT(),
-                                    Node->getMemOperand());
-  // We have to enforce sequential consistency by performing a
-  // serialization operation after the store.
-  if (Node->getSuccessOrdering() == AtomicOrdering::SequentiallyConsistent)
-    Chain = SDValue(DAG.getMachineNode(SystemZ::Serialize, SDLoc(Op),
-                                       MVT::Other, Chain), 0);
-  return Chain;
+  assert(Node->getMemoryVT() == MVT::i128 && "Only custom lowering i128.");
+  // Use same code to handle both legal and non-legal i128 types.
+  SmallVector<SDValue, 2> Results;
+  LowerOperationWrapper(Node, Results, DAG);
+  return DAG.getMergeValues(Results, SDLoc(Op));
 }
 
 // Prepare for a Compare And Swap for a subword operation. This needs to be
@@ -5659,9 +5645,13 @@ static SDValue tryBuildVectorShuffle(SelectionDAG &DAG,
   return GS.getNode(DAG, SDLoc(BVN));
 }
 
-bool SystemZTargetLowering::isVectorElementLoad(SDValue Op) const {
+bool SystemZTargetLowering::isVectorElementLoad(SDValue Op, EVT VecVT) const {
   if (Op.getOpcode() == ISD::LOAD && cast<LoadSDNode>(Op)->isUnindexed())
     return true;
+  if (auto *AL = dyn_cast<AtomicSDNode>(Op))
+    if (AL->getOpcode() == ISD::ATOMIC_LOAD && SDValue(AL, 0).hasOneUse() &&
+        AL->getMemoryVT() == VecVT.getScalarType())
+      return true;
   if (Subtarget.hasVectorEnhancements2() && Op.getOpcode() == SystemZISD::LRV)
     return true;
   return false;
@@ -5699,13 +5689,13 @@ SystemZTargetLowering::buildVector(SelectionDAG &DAG, const SDLoc &DL, EVT VT,
   //   we would need 2 instructions to replicate it: VLVGP followed by VREPx.
   //   This is only a win if the single defined element is used more than once.
   //   In other cases we're better off using a single VLVGx.
-  if (Single.getNode() && (Count > 1 || isVectorElementLoad(Single)))
+  if (Single.getNode() && (Count > 1 || isVectorElementLoad(Single, VT)))
     return DAG.getNode(SystemZISD::REPLICATE, DL, VT, Single);
 
   // If all elements are loads, use VLREP/VLEs (below).
   bool AllLoads = true;
   for (auto Elem : Elems)
-    if (!isVectorElementLoad(Elem)) {
+    if (!isVectorElementLoad(Elem, VT)) {
       AllLoads = false;
       break;
     }
@@ -5777,7 +5767,7 @@ SystemZTargetLowering::buildVector(SelectionDAG &DAG, const SDLoc &DL, EVT VT,
     std::map<const SDNode*, unsigned> UseCounts;
     SDNode *LoadMaxUses = nullptr;
     for (unsigned I = 0; I < NumElements; ++I)
-      if (isVectorElementLoad(Elems[I])) {
+      if (isVectorElementLoad(Elems[I], VT)) {
         SDNode *Ld = Elems[I].getNode();
         UseCounts[Ld]++;
         if (LoadMaxUses == nullptr || UseCounts[LoadMaxUses] < UseCounts[Ld])
@@ -6139,9 +6129,8 @@ SDValue SystemZTargetLowering::LowerOperation(SDValue Op,
   case ISD::ATOMIC_SWAP:
     return lowerATOMIC_LOAD_OP(Op, DAG, SystemZISD::ATOMIC_SWAPW);
   case ISD::ATOMIC_STORE:
-    return lowerATOMIC_STORE(Op, DAG);
   case ISD::ATOMIC_LOAD:
-    return lowerATOMIC_LOAD(Op, DAG);
+    return lowerATOMIC_I128_LDST(Op, DAG);
   case ISD::ATOMIC_LOAD_ADD:
     return lowerATOMIC_LOAD_OP(Op, DAG, SystemZISD::ATOMIC_LOADW_ADD);
   case ISD::ATOMIC_LOAD_SUB:
@@ -6588,6 +6577,52 @@ SDValue SystemZTargetLowering::combineTruncateExtract(
   return SDValue();
 }
 
+// Replace ALoad with a new ATOMIC_LOAD with a result that is extended to VT
+// per ETy.
+static SDValue extendAtomicLoad(AtomicSDNode *ALoad, EVT VT, SelectionDAG &DAG,
+                                ISD::LoadExtType ETy) {
+  if (VT.getSizeInBits() > 64)
+    return SDValue();
+  EVT OrigVT = ALoad->getValueType(0);
+  assert(OrigVT.getSizeInBits() < VT.getSizeInBits() && "VT should be wider.");
+  EVT MemoryVT = ALoad->getMemoryVT();
+  auto *NewALoad = dyn_cast<AtomicSDNode>(DAG.getAtomic(
+      ISD::ATOMIC_LOAD, SDLoc(ALoad), MemoryVT, VT, ALoad->getChain(),
+      ALoad->getBasePtr(), ALoad->getMemOperand()));
+  NewALoad->setExtensionType(ETy);
+  DAG.ReplaceAllUsesOfValueWith(
+      SDValue(ALoad, 0),
+      DAG.getNode(ISD::TRUNCATE, SDLoc(ALoad), OrigVT, SDValue(NewALoad, 0)));
+  // Update the chain uses.
+  DAG.ReplaceAllUsesOfValueWith(SDValue(ALoad, 1), SDValue(NewALoad, 1));
+  return SDValue(NewALoad, 0);
+}
+
+SDValue SystemZTargetLowering::combineBITCAST(SDNode *N,
+                                              DAGCombinerInfo &DCI) const {
+  SelectionDAG &DAG = DCI.DAG;
+  SDValue N0 = N->getOperand(0);
+  EVT InVT = N0.getValueType();
+  EVT ResVT = N->getValueType(0);
+  // Handle atomic loads to load float/double values directly and not via a
+  // GPR. Do it before legalization to help in treating the ATOMIC_LOAD the
+  // same way as a LOAD, and e.g. emit a REPLICATE.
+  if (auto *ALoad = dyn_cast<AtomicSDNode>(N0))
+    if (ALoad->getOpcode() == ISD::ATOMIC_LOAD && InVT.getSizeInBits() <= 64 &&
+        ALoad->getExtensionType() == ISD::NON_EXTLOAD &&
+        SDValue(ALoad, 0).hasOneUse() && InVT.isInteger() &&
+        ResVT.isFloatingPoint()) {
+      SDValue Res = DAG.getAtomic(ISD::ATOMIC_LOAD, SDLoc(N), ResVT, ResVT,
+                                  ALoad->getChain(), ALoad->getBasePtr(),
+                                  ALoad->getMemOperand());
+      // Update the chain uses.
+      DAG.ReplaceAllUsesOfValueWith(SDValue(ALoad, 1), Res.getValue(1));
+      return Res;
+    }
+
+  return SDValue();
+}
+
 SDValue SystemZTargetLowering::combineZERO_EXTEND(
     SDNode *N, DAGCombinerInfo &DCI) const {
   // Convert (zext (select_ccmask C1, C2)) into (select_ccmask C1', C2')
@@ -6612,6 +6647,13 @@ SDValue SystemZTargetLowering::combineZERO_EXTEND(
       return NewSelect;
     }
   }
+
+  // Fold into ATOMIC_LOAD unless it is already sign extending.
+  if (auto *ALoad = dyn_cast<AtomicSDNode>(N0))
+    if (ALoad->getOpcode() == ISD::ATOMIC_LOAD &&
+        ALoad->getExtensionType() != ISD::SEXTLOAD)
+      return extendAtomicLoad(ALoad, VT, DAG, ISD::ZEXTLOAD);
+
   return SDValue();
 }
 
@@ -6663,6 +6705,13 @@ SDValue SystemZTargetLowering::combineSIGN_EXTEND(
       }
     }
   }
+
+  // Fold into ATOMIC_LOAD unless it is already zero extending.
+  if (auto *ALoad = dyn_cast<AtomicSDNode>(N0))
+    if (ALoad->getOpcode() == ISD::ATOMIC_LOAD &&
+        ALoad->getExtensionType() != ISD::ZEXTLOAD)
+      return extendAtomicLoad(ALoad, VT, DAG, ISD::SEXTLOAD);
+
   return SDValue();
 }
 
@@ -7639,6 +7688,7 @@ SDValue SystemZTargetLowering::PerformDAGCombine(SDNode *N,
                                                  DAGCombinerInfo &DCI) const {
   switch(N->getOpcode()) {
   default: break;
+  case ISD::BITCAST:            return combineBITCAST(N, DCI);
   case ISD::ZERO_EXTEND:        return combineZERO_EXTEND(N, DCI);
   case ISD::SIGN_EXTEND:        return combineSIGN_EXTEND(N, DCI);
   case ISD::SIGN_EXTEND_INREG:  return combineSIGN_EXTEND_INREG(N, DCI);
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
index baf4ba41654879..9c442268dbb111 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
@@ -474,6 +474,8 @@ class SystemZTargetLowering : public TargetLowering {
     return VT != MVT::f64;
   }
   bool hasInlineStackProbe(const MachineFunction &MF) const override;
+  AtomicExpansionKind shouldCastAtomicLoadInIR(LoadInst *LI) const override;
+  AtomicExpansionKind shouldCastAtomicStoreInIR(StoreInst *SI) const override...
[truncated]

@JonPsson1
Copy link
Contributor Author

Patch improved

  • ATOMIC_LOADs are now converted to LOADs before they are selected by means of morphing so that reg/mem patterns work.

  • AtomicExpand pass now supposed to only cast atomic load/store to integer for FP128. However, it seems Clang always does this for all FP types and AtomicExpand does not remove the casts. The bitcasts are handled in the backend instead.

    • ATOMIC_LOAD bitcasts handled in SystemZ combineBITCAST()
    • ATOMIC_STORE bitcasts handled in SystemZ Select()
  • Until ATOMIC_LOAD is removed and replaced by LOAD + MMO handling, ATOMIC_LOAD is extended to handle an extension type, just like an ordinary LOAD.

    • When an ATOMIC_LOAD result is widened to a legal type, the proper extension attribute is added (or kept if already present).
    • computeKnownBits() and ComputeNumSignBits() extended to handle ATOMIC_LOAD.
    • SExt and ZExt handled in SystemZ combineZERO_EXTEND() and combineSIGN_EXTEND().
  • This patch now keeps ATOMIC_LOADs unexposed to general DAG combining whereas before they were all regular LOADs after legalization.

    • Some SystemZ specific lowering can still be done on ATOMIC_LOADs, like emission of REPLICATE (VLREP).
    • There are some optimizations involving LOADs in SystemZISelLowering that do not handle ATOMIC_LOADS as of now (e.g. compares), but it would not be too much work given that the ATOMIC_LOAD nodes have the right extension type and MemoryVT (also for float) set.
  • Another problem is that the SystemZ instruction selection still sees all the ATOMIC_LOADs as LOADs. To remedy this, I tried first to make the selection of ATOMIC_LOADS explicit by means of PatFrags handling both LOAD and ATOMIC_LOAD, but that became quite a task given that most of the instrutions are atomic if properly aligned.
    ExplicitAtomLdSt.patch.tar.gz
    I then simplified this task by the hack of morphing ATOMIC_LOADs into LOADs in convertATOMIC_LOADs(). This is less effort at the moment, but it also has the benefit that SystemZ isel is based on LOADs and their MMOs, so it will already work if the ATOMIC_LOAD opcode is removed.

  • Not sure if there are cases of non-atomic SystemZ instructions being used for atomic memory access. I am a bit suspicious about MAEB and friends which are RXF type instructions. Currently clang uses an MADB with this program, while GCC does not:


#include <math.h>
#include <float.h>
#include <atomic>
std::atomic<double> Y(0.0);
double fun(double *S) {
  double x = 2.0,z = 10.0;
  double y = Y.load(std::memory_order_seq_cst);
  return fma(x, y, z);
}

I am also not sure if SI type instructions are atomic (POP mentions SIY, so it would be a fair guess that they are). I have not looked through all instruction types, but this would be good to do. So far I have marked the MAEB to only match a load if it is non-atomic.

@@ -1545,6 +1552,37 @@ bool SystemZDAGToDAGISel::storeLoadIsAligned(SDNode *N) const {
return true;
}

// This is a hack to convert ATOMIC_LOADs to LOADs in the last minute just
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also not difficult to use PatFrags between atomic and non-atomic loads instead of hacking this up in C++

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, with all the different extending load operators it is not quite simple I'd say, but of course better. My concern is that if the ATOMIC_LOAD opcode will go away those changes would have to be reverted again in the .td files. So if that might happen in some foreseeable future, maybe this hack would be acceptable? As an alternative, maybe the switch to PatFrags could be done as a follow-up patch so that it could more easily be reverted if needed..?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think getting rid of ATOMIC_LOAD is the correct way to do this, and anything else is working around an aversion to completing the work to get there

@JonPsson1
Copy link
Contributor Author

Patch updated:

  • Don't return false for atomic memops in storeLoadIsAligned().
  • Treat ATOMIC_LOADs the same as regular LOADs in isVectorElementLoad().
  • The TernaryRXF instructions are atomic - remove the nonatomic_ld operator.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic bits lgtm. The systemZ handling is a hack, and we'd still be better off getting rid of ATOMIC_LOAD and following how GlobalISel does this

@JonPsson1
Copy link
Contributor Author

The generic bits lgtm. The systemZ handling is a hack, and we'd still be better off getting rid of ATOMIC_LOAD and following how GlobalISel does this

Thanks for review! Before committing, I think I will try to get rid of the hack in SystemZ and use PatFrags there instead. I will also make a note somewhere that ATOMIC_LOAD might removed at some point.

@JonPsson1
Copy link
Contributor Author

Patch updated to remove the ugly hack for morphing atomic_load:s to regular loads, by using PatFrags instead.

Since SystemZ has atomic FP loads, this required one more common code change of removing the requirement in SDTAtomicLoad that the result must be integer. This in turn confused TableGen in the PowerPC and VE backends which
I have kind of fixed by inserting an explicit 'iAny' as needed. It would probably be better to be specific and use e.g. 'i32' instead, but as I don't know much about those targets, I think I might as well ask for some help here instead of guessing.
(@arsenm @MaskRay @kaz7 @jam7 @stefanp-ibm @amy-kwan )

@uweigand: Tests passing and spec build is NFC.
We could do something similar with the atomic_store:s, like z_store, but it seems the handling of stores is quite alright like this, or?

@arsenm
Copy link
Contributor

arsenm commented Feb 29, 2024

We could do something similar with the atomic_store:s, like z_store, but it seems the handling of stores is quite alright like this, or?

Stores should be similarly handled, you can also use PatFrags

@JonPsson1
Copy link
Contributor Author

We could do something similar with the atomic_store:s, like z_store, but it seems the handling of stores is quite alright like this, or?

Stores should be similarly handled, you can also use PatFrags

Ideally, yes. This is however much less of a hack than what the handling of loads was before since the store gets selected after its operands and therefore has everything ready.

@JonPsson1
Copy link
Contributor Author

Patch updated: PPC integer types in patterns specified properly.

Waiting on #83446 which will remove the FP/Int casting that Clang is currently doing.

@@ -920,6 +916,23 @@ bool SystemZTargetLowering::hasInlineStackProbe(const MachineFunction &MF) const
return false;
}

// FIXME: Clang emits these casts always regardless of these hooks.
TargetLowering::AtomicExpansionKind
SystemZTargetLowering::shouldCastAtomicLoadInIR(LoadInst *LI) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I don't think this target hook should exist. Fundamentally AtomicExpandPass exists as a hack because SelectionDAG cannot handle emitting control flow for the cmpxchg blocks. Bitcasting types to match instructions is best done as part of regular legalization. cc @shiltian

@JonPsson1
Copy link
Contributor Author

Patch updated.

@uweigand Now that the Clang FE does not insert the fp<->int bitcasts, the handling for the resulting bitcasts have been removed from the SystemZ backend (there doesn't seem to be any value at all to having them around on SPEC). @arsenm approved the common code changes earlier, so now it's the SystemZ changes that also need review.

@kaz7 This patch shouldn't affect the VE backend at all, but since SDTAtomicLoad has been changed to also allow FP loads, I had to insert the equivalent MVT specifier in the VE patterns for atomic loads. I used 'iAny' for now, but this could easily be changed.

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments, but generally this LGTM now. As Matt suggested, I guess we should also have a z_store PatFrag so we keep the ATOMIC_STORE during SelectionDAG. But this can also be looked at separately, I think.

@JonPsson1
Copy link
Contributor Author

Removed check for extending atomic fp load nodes.

Given that this patch adds extending atomic loads as a common-code feature, shouldn't now also common code attempt to match them in DAGCombine? Why does this need to be platform-specific? But I guess this can be looked at in a separate patch.

ok, will do a follow-up on moving that into common-code.

@JonPsson1 JonPsson1 merged commit 8b8e1ad into llvm:main Mar 18, 2024
chencha3 pushed a commit to chencha3/llvm-project that referenced this pull request Mar 23, 2024
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
@JonPsson1 JonPsson1 deleted the AtomicLdSt branch March 26, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants