Skip to content

Commit ab1d73e

Browse files
committed
[ARM] Don't reserve R12 on Thumb1 as an emergency spill slot.
The current implementation of ThumbRegisterInfo::saveScavengerRegister is bad for two reasons: one, it's buggy, and two, it blocks using R12 for other optimizations. So this patch gets rid of it, and adds the necessary support for using an ordinary emergency spill slot on Thumb1. (Specifically, I think saveScavengerRegister was broken by r305625, and nobody noticed for two years because the codepath is almost never used. The new code will also probably not be used much, but it now has better tests, and if we fail to emit a necessary emergency spill slot we get a reasonable error message instead of a miscompile.) A rough outline of the changes in the patch: 1. Gets rid of ThumbRegisterInfo::saveScavengerRegister. 2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an emergency spill slot for Thumb1. 3. Implements useFPForScavengingIndex, so the emergency spill slot isn't placed at a negative offset from FP on Thumb1. 4. Modifies the heuristics for allocating an emergency spill slot to support Thumb1. This includes fixing ExtraCSSpill so we don't try to use "lr" as a substitute for allocating an emergency spill slot. 5. Allocates a base pointer in more cases, so the emergency spill slot is always accessible. 6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the right offset in the new cases where we're forcing a base pointer. 7. Ensures we never generate a load or store with an offset outside of its frame object. This makes the heuristics more straightforward. 8. Changes Thumb1 prologue and epilogue emission so it never uses register scavenging. Some of the changes to the emergency spill slot heuristics in determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow the compiler to avoid allocating an emergency spill slot in cases where it isn't necessary. The rest of the changes should only affect Thumb1. Differential Revision: https://reviews.llvm.org/D63677 llvm-svn: 364490
1 parent d7999cb commit ab1d73e

13 files changed

+626
-211
lines changed

llvm/lib/Target/ARM/ARMBaseRegisterInfo.cpp

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -370,29 +370,35 @@ bool ARMBaseRegisterInfo::hasBasePointer(const MachineFunction &MF) const {
370370
const ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
371371
const ARMFrameLowering *TFI = getFrameLowering(MF);
372372

373-
// When outgoing call frames are so large that we adjust the stack pointer
374-
// around the call, we can no longer use the stack pointer to reach the
375-
// emergency spill slot.
373+
// If we have stack realignment and VLAs, we have no pointer to use to
374+
// access the stack. If we have stack realignment, and a large call frame,
375+
// we have no place to allocate the emergency spill slot.
376376
if (needsStackRealignment(MF) && !TFI->hasReservedCallFrame(MF))
377377
return true;
378378

379379
// Thumb has trouble with negative offsets from the FP. Thumb2 has a limited
380380
// negative range for ldr/str (255), and thumb1 is positive offsets only.
381+
//
381382
// It's going to be better to use the SP or Base Pointer instead. When there
382383
// are variable sized objects, we can't reference off of the SP, so we
383384
// reserve a Base Pointer.
384-
if (AFI->isThumbFunction() && MFI.hasVarSizedObjects()) {
385-
// Conservatively estimate whether the negative offset from the frame
386-
// pointer will be sufficient to reach. If a function has a smallish
387-
// frame, it's less likely to have lots of spills and callee saved
388-
// space, so it's all more likely to be within range of the frame pointer.
389-
// If it's wrong, the scavenger will still enable access to work, it just
390-
// won't be optimal.
391-
if (AFI->isThumb2Function() && MFI.getLocalFrameSize() < 128)
392-
return false;
385+
//
386+
// For Thumb2, estimate whether a negative offset from the frame pointer
387+
// will be sufficient to reach the whole stack frame. If a function has a
388+
// smallish frame, it's less likely to have lots of spills and callee saved
389+
// space, so it's all more likely to be within range of the frame pointer.
390+
// If it's wrong, the scavenger will still enable access to work, it just
391+
// won't be optimal. (We should always be able to reach the emergency
392+
// spill slot from the frame pointer.)
393+
if (AFI->isThumb2Function() && MFI.hasVarSizedObjects() &&
394+
MFI.getLocalFrameSize() >= 128)
395+
return true;
396+
// For Thumb1, if sp moves, nothing is in range, so force a base pointer.
397+
// This is necessary for correctness in cases where we need an emergency
398+
// spill slot. (In Thumb1, we can't use a negative offset from the frame
399+
// pointer.)
400+
if (AFI->isThumb1OnlyFunction() && !TFI->hasReservedCallFrame(MF))
393401
return true;
394-
}
395-
396402
return false;
397403
}
398404

llvm/lib/Target/ARM/ARMFrameLowering.cpp

Lines changed: 81 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -344,6 +344,10 @@ static void emitAligningInstructions(MachineFunction &MF, ARMFunctionInfo *AFI,
344344
/// as assignCalleeSavedSpillSlots() hasn't run at this point. Instead we use
345345
/// this to produce a conservative estimate that we check in an assert() later.
346346
static int getMaxFPOffset(const Function &F, const ARMFunctionInfo &AFI) {
347+
// For Thumb1, push.w isn't available, so the first push will always push
348+
// r7 and lr onto the stack first.
349+
if (AFI.isThumb1OnlyFunction())
350+
return -AFI.getArgRegsSaveSize() - (2 * 4);
347351
// This is a conservative estimation: Assume the frame pointer being r7 and
348352
// pc("r15") up to r8 getting spilled before (= 8 registers).
349353
return -AFI.getArgRegsSaveSize() - (8 * 4);
@@ -954,8 +958,12 @@ ARMFrameLowering::ResolveFrameIndexReference(const MachineFunction &MF,
954958
}
955959
}
956960
// Use the base pointer if we have one.
957-
if (RegInfo->hasBasePointer(MF))
961+
// FIXME: Maybe prefer sp on Thumb1 if it's legal and the offset is cheaper?
962+
// That can happen if we forced a base pointer for a large call frame.
963+
if (RegInfo->hasBasePointer(MF)) {
958964
FrameReg = RegInfo->getBaseRegister();
965+
Offset -= SPAdj;
966+
}
959967
return Offset;
960968
}
961969

@@ -1775,13 +1783,59 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
17751783
}
17761784
EstimatedStackSize += 16; // For possible paddings.
17771785

1778-
unsigned EstimatedRSStackSizeLimit = estimateRSStackSizeLimit(MF, this);
1786+
unsigned EstimatedRSStackSizeLimit, EstimatedRSFixedSizeLimit;
1787+
if (AFI->isThumb1OnlyFunction()) {
1788+
// For Thumb1, don't bother to iterate over the function. The only
1789+
// instruction that requires an emergency spill slot is a store to a
1790+
// frame index.
1791+
//
1792+
// tSTRspi, which is used for sp-relative accesses, has an 8-bit unsigned
1793+
// immediate. tSTRi, which is used for bp- and fp-relative accesses, has
1794+
// a 5-bit unsigned immediate.
1795+
//
1796+
// We could try to check if the function actually contains a tSTRspi
1797+
// that might need the spill slot, but it's not really important.
1798+
// Functions with VLAs or extremely large call frames are rare, and
1799+
// if a function is allocating more than 1KB of stack, an extra 4-byte
1800+
// slot probably isn't relevant.
1801+
if (RegInfo->hasBasePointer(MF))
1802+
EstimatedRSStackSizeLimit = (1U << 5) * 4;
1803+
else
1804+
EstimatedRSStackSizeLimit = (1U << 8) * 4;
1805+
EstimatedRSFixedSizeLimit = (1U << 5) * 4;
1806+
} else {
1807+
EstimatedRSStackSizeLimit = estimateRSStackSizeLimit(MF, this);
1808+
EstimatedRSFixedSizeLimit = EstimatedRSStackSizeLimit;
1809+
}
1810+
// Final estimate of whether sp or bp-relative accesses might require
1811+
// scavenging.
1812+
bool HasLargeStack = EstimatedStackSize > EstimatedRSStackSizeLimit;
1813+
1814+
// If the stack pointer moves and we don't have a base pointer, the
1815+
// estimate logic doesn't work. The actual offsets might be larger when
1816+
// we're constructing a call frame, or we might need to use negative
1817+
// offsets from fp.
1818+
bool HasMovingSP = MFI.hasVarSizedObjects() ||
1819+
(MFI.adjustsStack() && !canSimplifyCallFramePseudos(MF));
1820+
bool HasBPOrFixedSP = RegInfo->hasBasePointer(MF) || !HasMovingSP;
1821+
1822+
// If we have a frame pointer, we assume arguments will be accessed
1823+
// relative to the frame pointer. Check whether fp-relative accesses to
1824+
// arguments require scavenging.
1825+
//
1826+
// We could do slightly better on Thumb1; in some cases, an sp-relative
1827+
// offset would be legal even though an fp-relative offset is not.
17791828
int MaxFPOffset = getMaxFPOffset(MF.getFunction(), *AFI);
1780-
bool BigFrameOffsets = EstimatedStackSize >= EstimatedRSStackSizeLimit ||
1781-
MFI.hasVarSizedObjects() ||
1782-
(MFI.adjustsStack() && !canSimplifyCallFramePseudos(MF)) ||
1783-
// For large argument stacks fp relative addressed may overflow.
1784-
(HasFP && (MaxFixedOffset - MaxFPOffset) >= (int)EstimatedRSStackSizeLimit);
1829+
bool HasLargeArgumentList =
1830+
HasFP && (MaxFixedOffset - MaxFPOffset) > (int)EstimatedRSFixedSizeLimit;
1831+
1832+
bool BigFrameOffsets = HasLargeStack || !HasBPOrFixedSP ||
1833+
HasLargeArgumentList;
1834+
LLVM_DEBUG(dbgs() << "EstimatedLimit: " << EstimatedRSStackSizeLimit
1835+
<< "; EstimatedStack" << EstimatedStackSize
1836+
<< "; EstimatedFPStack" << MaxFixedOffset - MaxFPOffset
1837+
<< "; BigFrameOffsets: " << BigFrameOffsets
1838+
<< "\n");
17851839
if (BigFrameOffsets ||
17861840
!CanEliminateFrame || RegInfo->cannotEliminateFrame(MF)) {
17871841
AFI->setHasStackFrame(true);
@@ -1806,8 +1860,17 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
18061860
CS1Spilled = true;
18071861
}
18081862

1809-
// This is true when we inserted a spill for an unused register that can now
1810-
// be used for register scavenging.
1863+
// This is true when we inserted a spill for a callee-save GPR which is
1864+
// not otherwise used by the function. This guaranteees it is possible
1865+
// to scavenge a register to hold the address of a stack slot. On Thumb1,
1866+
// the register must be a valid operand to tSTRi, i.e. r4-r7. For other
1867+
// subtargets, this is any GPR, i.e. r4-r11 or lr.
1868+
//
1869+
// If we don't insert a spill, we instead allocate an emergency spill
1870+
// slot, which can be used by scavenging to spill an arbitrary register.
1871+
//
1872+
// We currently don't try to figure out whether any specific instruction
1873+
// requires scavening an additional register.
18111874
bool ExtraCSSpill = false;
18121875

18131876
if (AFI->isThumb1OnlyFunction()) {
@@ -1916,7 +1979,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
19161979
NumGPRSpills++;
19171980
CS1Spilled = true;
19181981
assert(!MRI.isReserved(Reg) && "Should not be reserved");
1919-
if (!MRI.isPhysRegUsed(Reg))
1982+
if (Reg != ARM::LR && !MRI.isPhysRegUsed(Reg))
19201983
ExtraCSSpill = true;
19211984
UnspilledCS1GPRs.erase(llvm::find(UnspilledCS1GPRs, Reg));
19221985
if (Reg == ARM::LR)
@@ -1941,7 +2004,8 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
19412004
UnspilledCS1GPRs.erase(LRPos);
19422005

19432006
ForceLRSpill = false;
1944-
if (!MRI.isReserved(ARM::LR) && !MRI.isPhysRegUsed(ARM::LR))
2007+
if (!MRI.isReserved(ARM::LR) && !MRI.isPhysRegUsed(ARM::LR) &&
2008+
!AFI->isThumb1OnlyFunction())
19452009
ExtraCSSpill = true;
19462010
}
19472011

@@ -1963,7 +2027,8 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
19632027
SavedRegs.set(Reg);
19642028
LLVM_DEBUG(dbgs() << "Spilling " << printReg(Reg, TRI)
19652029
<< " to make up alignment\n");
1966-
if (!MRI.isReserved(Reg) && !MRI.isPhysRegUsed(Reg))
2030+
if (!MRI.isReserved(Reg) && !MRI.isPhysRegUsed(Reg) &&
2031+
!(Reg == ARM::LR && AFI->isThumb1OnlyFunction()))
19672032
ExtraCSSpill = true;
19682033
break;
19692034
}
@@ -1992,8 +2057,7 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
19922057
unsigned Reg = UnspilledCS1GPRs.back();
19932058
UnspilledCS1GPRs.pop_back();
19942059
if (!MRI.isReserved(Reg) &&
1995-
(!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg) ||
1996-
Reg == ARM::LR)) {
2060+
(!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg))) {
19972061
Extras.push_back(Reg);
19982062
NumExtras--;
19992063
}
@@ -2016,10 +2080,10 @@ void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
20162080
ExtraCSSpill = true;
20172081
}
20182082
}
2019-
if (!ExtraCSSpill && !AFI->isThumb1OnlyFunction()) {
2020-
// note: Thumb1 functions spill to R12, not the stack. Reserve a slot
2021-
// closest to SP or frame pointer.
2083+
if (!ExtraCSSpill) {
2084+
// Reserve a slot closest to SP or frame pointer.
20222085
assert(RS && "Register scavenging not provided");
2086+
LLVM_DEBUG(dbgs() << "Reserving emergency spill slot\n");
20232087
const TargetRegisterClass &RC = ARM::GPRRegClass;
20242088
unsigned Size = TRI->getSpillSize(RC);
20252089
unsigned Align = TRI->getSpillAlignment(RC);

llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1150,15 +1150,22 @@ bool ARMDAGToDAGISel::SelectThumbAddrModeSP(SDValue N,
11501150
if (isScaledConstantInRange(N.getOperand(1), /*Scale=*/4, 0, 256, RHSC)) {
11511151
Base = N.getOperand(0);
11521152
int FI = cast<FrameIndexSDNode>(Base)->getIndex();
1153-
// For LHS+RHS to result in an offset that's a multiple of 4 the object
1154-
// indexed by the LHS must be 4-byte aligned.
1153+
// Make sure the offset is inside the object, or we might fail to
1154+
// allocate an emergency spill slot. (An out-of-range access is UB, but
1155+
// it could show up anyway.)
11551156
MachineFrameInfo &MFI = MF->getFrameInfo();
1156-
if (MFI.getObjectAlignment(FI) < 4)
1157-
MFI.setObjectAlignment(FI, 4);
1158-
Base = CurDAG->getTargetFrameIndex(
1159-
FI, TLI->getPointerTy(CurDAG->getDataLayout()));
1160-
OffImm = CurDAG->getTargetConstant(RHSC, SDLoc(N), MVT::i32);
1161-
return true;
1157+
if (RHSC * 4 < MFI.getObjectSize(FI)) {
1158+
// For LHS+RHS to result in an offset that's a multiple of 4 the object
1159+
// indexed by the LHS must be 4-byte aligned.
1160+
if (!MFI.isFixedObjectIndex(FI) && MFI.getObjectAlignment(FI) < 4)
1161+
MFI.setObjectAlignment(FI, 4);
1162+
if (MFI.getObjectAlignment(FI) >= 4) {
1163+
Base = CurDAG->getTargetFrameIndex(
1164+
FI, TLI->getPointerTy(CurDAG->getDataLayout()));
1165+
OffImm = CurDAG->getTargetConstant(RHSC, SDLoc(N), MVT::i32);
1166+
return true;
1167+
}
1168+
}
11621169
}
11631170
}
11641171

llvm/lib/Target/ARM/ARMISelLowering.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3652,7 +3652,8 @@ void ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG &DAG,
36523652
// argument passed via stack.
36533653
int FrameIndex = StoreByValRegs(CCInfo, DAG, dl, Chain, nullptr,
36543654
CCInfo.getInRegsParamsCount(),
3655-
CCInfo.getNextStackOffset(), 4);
3655+
CCInfo.getNextStackOffset(),
3656+
std::max(4U, TotalArgRegsSaveSize));
36563657
AFI->setVarArgsFrameIndex(FrameIndex);
36573658
}
36583659

0 commit comments

Comments
 (0)