Skip to content

Commit ed6bde9

Browse files
authored
[X86] Use fence(seq_cst) in IdempotentRMWIntoFencedLoad (#126521)
This extends this optimization for scenarios where the subtarget has `!hasMFence` or we have SyncScope SingleThread, by avoiding the direct usage of `llvm.x64.sse2.mfence`.
1 parent 1a31bb3 commit ed6bde9

File tree

3 files changed

+665
-129
lines changed

3 files changed

+665
-129
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31905,21 +31905,10 @@ X86TargetLowering::lowerIdempotentRMWIntoFencedLoad(AtomicRMWInst *AI) const {
3190531905
// otherwise, we might be able to be more aggressive on relaxed idempotent
3190631906
// rmw. In practice, they do not look useful, so we don't try to be
3190731907
// especially clever.
31908-
if (SSID == SyncScope::SingleThread)
31909-
// FIXME: we could just insert an ISD::MEMBARRIER here, except we are at
31910-
// the IR level, so we must wrap it in an intrinsic.
31911-
return nullptr;
31912-
31913-
if (!Subtarget.hasMFence())
31914-
// FIXME: it might make sense to use a locked operation here but on a
31915-
// different cache-line to prevent cache-line bouncing. In practice it
31916-
// is probably a small win, and x86 processors without mfence are rare
31917-
// enough that we do not bother.
31918-
return nullptr;
3191931908

31920-
Function *MFence =
31921-
llvm::Intrinsic::getOrInsertDeclaration(M, Intrinsic::x86_sse2_mfence);
31922-
Builder.CreateCall(MFence, {});
31909+
// Use `fence seq_cst` over `llvm.x64.sse2.mfence` here to get the correct
31910+
// lowering for SSID == SyncScope::SingleThread and !hasMFence
31911+
Builder.CreateFence(AtomicOrdering::SequentiallyConsistent, SSID);
3192331912

3192431913
// Finally we can emit the atomic load.
3192531914
LoadInst *Loaded = Builder.CreateAlignedLoad(

0 commit comments

Comments
 (0)