-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Transforms] LoopIdiomRecognize recognize strlen and wcslen #108985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Henry Jiang (mustartt) ChangesThis PR continues the effort made in https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848 and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460 to extend base = str;
while (*str)
++str; and transforming the str = base + strlen(base)
len = str - base Patch is 32.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108985.diff 7 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h b/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
index 0c6406d8618518..241a3fc1093607 100644
--- a/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
+++ b/llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h
@@ -34,6 +34,12 @@ struct DisableLIRP {
/// When true, Memcpy is disabled.
static bool Memcpy;
+
+ /// When true, Strlen is disabled.
+ static bool Strlen;
+
+ /// When true, Wcslen is disabled.
+ static bool Wcslen;
};
/// Performs Loop Idiom Recognize Pass.
diff --git a/llvm/include/llvm/Transforms/Utils/BuildLibCalls.h b/llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
index 1979c4af770b02..dfe3b35bb596e1 100644
--- a/llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
+++ b/llvm/include/llvm/Transforms/Utils/BuildLibCalls.h
@@ -93,6 +93,12 @@ namespace llvm {
Value *emitStrLen(Value *Ptr, IRBuilderBase &B, const DataLayout &DL,
const TargetLibraryInfo *TLI);
+ /// Emit a call to the wcslen function to the builder, for the specified
+ /// pointer. Ptr is required to be some pointer type, and the return value has
+ /// 'size_t' type.
+ Value *emitWcsLen(Value *Ptr, IRBuilderBase &B, const DataLayout &DL,
+ const TargetLibraryInfo *TLI);
+
/// Emit a call to the strdup function to the builder, for the specified
/// pointer. Ptr is required to be some pointer type, and the return value has
/// 'i8*' type.
diff --git a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
index 578d087e470e1e..c3cf24686d0934 100644
--- a/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
@@ -33,6 +33,7 @@
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
+#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"
@@ -97,6 +98,7 @@ using namespace llvm;
STATISTIC(NumMemSet, "Number of memset's formed from loop stores");
STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
STATISTIC(NumMemMove, "Number of memmove's formed from loop load+stores");
+STATISTIC(NumStrLen, "Number of strlen's formed from loop loads");
STATISTIC(
NumShiftUntilBitTest,
"Number of uncountable loops recognized as 'shift until bitttest' idiom");
@@ -126,6 +128,22 @@ static cl::opt<bool, true>
cl::location(DisableLIRP::Memcpy), cl::init(false),
cl::ReallyHidden);
+bool DisableLIRP::Strlen;
+static cl::opt<bool, true>
+ DisableLIRPStrlen("disable-" DEBUG_TYPE "-strlen",
+ cl::desc("Proceed with loop idiom recognize pass, but do "
+ "not convert loop(s) to strlen."),
+ cl::location(DisableLIRP::Strlen), cl::init(false),
+ cl::ReallyHidden);
+
+bool DisableLIRP::Wcslen;
+static cl::opt<bool, true>
+ DisableLIRPWcslen("disable-" DEBUG_TYPE "-wcslen",
+ cl::desc("Proceed with loop idiom recognize pass, but do "
+ "not convert loop(s) to wcslen."),
+ cl::location(DisableLIRP::Wcslen), cl::init(false),
+ cl::ReallyHidden);
+
static cl::opt<bool> UseLIRCodeSizeHeurs(
"use-lir-code-size-heurs",
cl::desc("Use loop idiom recognition code size heuristics when compiling"
@@ -246,6 +264,7 @@ class LoopIdiomRecognize {
bool recognizeShiftUntilBitTest();
bool recognizeShiftUntilZero();
+ bool recognizeAndInsertStrLen();
/// @}
};
@@ -1489,7 +1508,7 @@ bool LoopIdiomRecognize::runOnNoncountableLoop() {
return recognizePopcount() || recognizeAndInsertFFS() ||
recognizeShiftUntilBitTest() || recognizeShiftUntilZero() ||
- recognizeShiftUntilLessThan();
+ recognizeShiftUntilLessThan() || recognizeAndInsertStrLen();
}
/// Check if the given conditional branch is based on the comparison between
@@ -1507,9 +1526,11 @@ static Value *matchCondition(BranchInst *BI, BasicBlock *LoopEntry,
if (!Cond)
return nullptr;
- ConstantInt *CmpZero = dyn_cast<ConstantInt>(Cond->getOperand(1));
- if (!CmpZero || !CmpZero->isZero())
- return nullptr;
+ if (!isa<ConstantPointerNull>(Cond->getOperand(1))) {
+ ConstantInt *CmpZero = dyn_cast<ConstantInt>(Cond->getOperand(1));
+ if (!CmpZero || !CmpZero->isZero())
+ return nullptr;
+ }
BasicBlock *TrueSucc = BI->getSuccessor(0);
BasicBlock *FalseSucc = BI->getSuccessor(1);
@@ -1524,6 +1545,232 @@ static Value *matchCondition(BranchInst *BI, BasicBlock *LoopEntry,
return nullptr;
}
+/// Recognizes a strlen idiom by checking for loops that increment
+/// a char pointer and then subtract with the base pointer.
+///
+/// If detected, transforms the relevant code to a strlen function
+/// call, and returns true; otherwise, returns false.
+///
+/// The core idiom we are trying to detect is:
+/// \code
+/// start = str;
+/// do {
+/// str++;
+/// } while(*str != '\0');
+/// \endcode
+///
+/// The transformed output is similar to below c-code:
+/// \code
+/// str = start + strlen(start)
+/// len = str - start
+/// \endcode
+///
+/// Later the pointer subtraction will be folded by InstCombine
+bool LoopIdiomRecognize::recognizeAndInsertStrLen() {
+ if (DisableLIRPStrlen)
+ return false;
+
+ // Give up if the loop has multiple blocks or multiple backedges.
+ if (CurLoop->getNumBackEdges() != 1 || CurLoop->getNumBlocks() != 1)
+ return false;
+
+ // It should have a preheader containing nothing but an unconditional branch.
+ auto *Preheader = CurLoop->getLoopPreheader();
+ if (!Preheader || &Preheader->front() != Preheader->getTerminator())
+ return false;
+
+ auto *EntryBI = dyn_cast<BranchInst>(Preheader->getTerminator());
+ if (!EntryBI || EntryBI->isConditional())
+ return false;
+
+ // The loop exit must be conditioned on an icmp with 0.
+ // The icmp operand has to be a load on some SSA reg that increments
+ // by 1 in the loop.
+ BasicBlock *LoopBody = *CurLoop->block_begin();
+ BranchInst *LoopTerm = dyn_cast<BranchInst>(LoopBody->getTerminator());
+ Value *LoopCond = matchCondition(LoopTerm, LoopBody);
+
+ if (!LoopCond)
+ return false;
+
+ auto *LoopLoad = dyn_cast<LoadInst>(LoopCond);
+ if (!LoopLoad || LoopLoad->getPointerAddressSpace() != 0)
+ return false;
+
+ Type *OperandType = LoopLoad->getType();
+ if (!OperandType || !OperandType->isIntegerTy())
+ return false;
+
+ // See if the pointer expression is an AddRec with step 1 ({n,+,1}) on
+ // the loop, indicating strlen calculation.
+ auto *IncPtr = LoopLoad->getPointerOperand();
+ const SCEVAddRecExpr *LoadEv = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(IncPtr));
+
+ if (!LoadEv || LoadEv->getLoop() != CurLoop || !LoadEv->isAffine())
+ return false;
+
+ const SCEVConstant *Step =
+ dyn_cast<SCEVConstant>(LoadEv->getStepRecurrence(*SE));
+ if (!Step)
+ return false;
+
+ unsigned int StepSize = 0;
+ if (ConstantInt *CI = dyn_cast<ConstantInt>(Step->getValue()))
+ StepSize = CI->getZExtValue();
+
+ unsigned OpWidth = OperandType->getIntegerBitWidth();
+ unsigned WcharSize = TLI->getWCharSize(*LoopLoad->getModule());
+ if (OpWidth != StepSize * 8)
+ return false;
+ if (OpWidth != 8 && OpWidth != 16 && OpWidth != 32)
+ return false;
+ if (OpWidth >= 16)
+ if (OpWidth != WcharSize * 8 || DisableLIRPWcslen)
+ return false;
+
+ // Scan every instruction in the loop to ensure there are no side effects.
+ for (auto &I : *LoopBody)
+ if (I.mayHaveSideEffects())
+ return false;
+
+ auto *LoopExitBB = CurLoop->getExitBlock();
+ if (!LoopExitBB)
+ return false;
+
+ // Check that the loop exit block is valid:
+ // It needs to have exactly one LCSSA Phi which is an AddRec.
+ PHINode *LCSSAPhi = nullptr;
+ for (PHINode &PN : LoopExitBB->phis()) {
+ if (!LCSSAPhi && PN.getNumIncomingValues() == 1)
+ LCSSAPhi = &PN;
+ else
+ return false;
+ }
+
+ if (!LCSSAPhi || !SE->isSCEVable(LCSSAPhi->getType()))
+ return false;
+
+ // This matched the pointer version of the idiom
+ if (LCSSAPhi->getIncomingValueForBlock(LoopBody) !=
+ LoopLoad->getPointerOperand())
+ return false;
+
+ const SCEVAddRecExpr *LCSSAEv =
+ dyn_cast<SCEVAddRecExpr>(SE->getSCEV(LCSSAPhi->getIncomingValue(0)));
+
+ if (!LCSSAEv || !dyn_cast<SCEVUnknown>(SE->getPointerBase(LCSSAEv)) ||
+ !LCSSAEv->isAffine())
+ return false;
+
+ // We can now expand the base of the str
+ IRBuilder<> Builder(Preheader->getTerminator());
+
+ auto LoopPhiRange = LoopBody->phis();
+ if (!hasNItems(LoopPhiRange, 1))
+ return false;
+ auto *LoopPhi = &*LoopPhiRange.begin();
+ Value *PreVal = LoopPhi->getIncomingValueForBlock(Preheader);
+ if (!PreVal)
+ return false;
+
+ Value *Expanded = nullptr;
+ Type *ExpandedType = nullptr;
+ if (auto *GEP = dyn_cast<GetElementPtrInst>(LoopLoad->getPointerOperand())) {
+ if (GEP->getPointerOperand() != LoopPhi)
+ return false;
+ GetElementPtrInst *NewGEP = GetElementPtrInst::Create(
+ LoopLoad->getType(), PreVal, SmallVector<Value *, 4>(GEP->indices()),
+ "newgep", Preheader->getTerminator());
+ Expanded = NewGEP;
+ ExpandedType = LoopLoad->getType();
+ } else if (LoopLoad->getPointerOperand() == LoopPhi) {
+ Expanded = PreVal;
+ ExpandedType = LoopLoad->getType();
+ }
+ if (!Expanded)
+ return false;
+
+ // Ensure that the GEP has the correct index if the pointer was modified.
+ // This can happen when the pointer in the user code, outside the loop,
+ // walks past a certain pre-checked index of the string.
+ if (auto *GEP = dyn_cast<GEPOperator>(Expanded)) {
+ if (GEP->getNumOperands() != 2)
+ return false;
+
+ ConstantInt *I0 = dyn_cast<ConstantInt>(GEP->getOperand(1));
+ if (!I0)
+ return false;
+
+ int64_t Index = I0->getSExtValue(); // GEP index
+ auto *SAdd = dyn_cast<SCEVAddExpr>(LoadEv->getStart());
+ if (!SAdd || SAdd->getNumOperands() != 2)
+ return false;
+
+ auto *SAdd0 = dyn_cast<SCEVConstant>(SAdd->getOperand(0));
+ if (!SAdd0)
+ return false;
+
+ ConstantInt *CInt = SAdd0->getValue(); // SCEV index
+ assert(CInt && "Expecting CInt to be valid.");
+ int64_t Offset = CInt->getSExtValue();
+
+ // Update the index based on the Offset
+ assert((Offset * 8) % GEP->getSourceElementType()->getIntegerBitWidth() ==
+ 0 &&
+ "Invalid offset");
+ int64_t NewIndex =
+ (Offset * 8) / GEP->getSourceElementType()->getIntegerBitWidth() -
+ Index;
+ Value *NewIndexVal =
+ ConstantInt::get(GEP->getOperand(1)->getType(), NewIndex);
+ GEP->setOperand(1, NewIndexVal);
+ }
+
+ Value *StrLenFunc = nullptr;
+ switch (OpWidth) {
+ case 8:
+ if (!TLI->has(LibFunc_strlen))
+ return false;
+ StrLenFunc = emitStrLen(Expanded, Builder, *DL, TLI);
+ break;
+ case 16:
+ case 32:
+ if (!TLI->has(LibFunc_wcslen))
+ return false;
+ StrLenFunc = emitWcsLen(Expanded, Builder, *DL, TLI);
+ }
+
+ assert(StrLenFunc && "Failed to emit strlen function.");
+
+ // Replace LCSSA Phi use with new pointer to the null terminator
+ SmallVector<Value *, 4> NewBaseIndex{StrLenFunc};
+ GetElementPtrInst *NewEndPtr = GetElementPtrInst::Create(
+ ExpandedType, Expanded, NewBaseIndex, "end", Preheader->getTerminator());
+ LCSSAPhi->replaceAllUsesWith(NewEndPtr);
+ RecursivelyDeleteDeadPHINode(LCSSAPhi);
+
+ ConstantInt *NewLoopCond = LoopTerm->getSuccessor(0) == LoopBody
+ ? Builder.getFalse()
+ : Builder.getTrue();
+ LoopTerm->setCondition(NewLoopCond);
+
+ deleteDeadInstruction(cast<Instruction>(LoopCond));
+ deleteDeadInstruction(cast<Instruction>(IncPtr));
+ SE->forgetLoop(CurLoop);
+
+ LLVM_DEBUG(dbgs() << " Formed strlen: " << *StrLenFunc << "\n");
+
+ ORE.emit([&]() {
+ return OptimizationRemark(DEBUG_TYPE, "recognizeAndInsertStrLen",
+ CurLoop->getStartLoc(), Preheader)
+ << "Transformed pointer difference into a call to strlen() function";
+ });
+
+ ++NumStrLen;
+
+ return true;
+}
+
/// Check if the given conditional branch is based on an unsigned less-than
/// comparison between a variable and a constant, and if the comparison is false
/// the control yields to the loop entry. If the branch matches the behaviour,
diff --git a/llvm/lib/Transforms/Utils/BuildLibCalls.cpp b/llvm/lib/Transforms/Utils/BuildLibCalls.cpp
index b0da19813f0a4b..e880d8fc5dd8df 100644
--- a/llvm/lib/Transforms/Utils/BuildLibCalls.cpp
+++ b/llvm/lib/Transforms/Utils/BuildLibCalls.cpp
@@ -1506,6 +1506,15 @@ Value *llvm::emitStrLen(Value *Ptr, IRBuilderBase &B, const DataLayout &DL,
return emitLibCall(LibFunc_strlen, SizeTTy, CharPtrTy, Ptr, B, TLI);
}
+Value *llvm::emitWcsLen(Value *Ptr, IRBuilderBase &B, const DataLayout &DL,
+ const TargetLibraryInfo *TLI) {
+ assert(Ptr && Ptr->getType()->isPointerTy() &&
+ "Argument to wcslen intrinsic must be a pointer.");
+ Type *PtrTy = B.getPtrTy();
+ Type *SizeTTy = getSizeTTy(B, TLI);
+ return emitLibCall(LibFunc_wcslen, SizeTTy, PtrTy, Ptr, B, TLI);
+}
+
Value *llvm::emitStrDup(Value *Ptr, IRBuilderBase &B,
const TargetLibraryInfo *TLI) {
Type *CharPtrTy = B.getPtrTy();
diff --git a/llvm/test/Transforms/LoopIdiom/strlen.ll b/llvm/test/Transforms/LoopIdiom/strlen.ll
new file mode 100644
index 00000000000000..43ed9d0980bc49
--- /dev/null
+++ b/llvm/test/Transforms/LoopIdiom/strlen.ll
@@ -0,0 +1,293 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -passes='loop-idiom' < %s -S | FileCheck %s
+
+declare void @use(ptr)
+
+define i64 @valid_strlen_1(ptr %0) {
+; CHECK-LABEL: define i64 @valid_strlen_1(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[STRLEN:%.*]] = call i64 @strlen(ptr [[TMP0]])
+; CHECK-NEXT: [[DOTLCSSA:%.*]] = getelementptr i8, ptr [[TMP0]], i64 [[STRLEN]]
+; CHECK-NEXT: br label %[[BB2:.*]]
+; CHECK: [[BB2]]:
+; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i8 poison, 0
+; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i8, ptr poison, i64 1
+; CHECK-NEXT: br i1 true, label %[[BB5:.*]], label %[[BB2]]
+; CHECK: [[BB5]]:
+; CHECK-NEXT: [[TMP12:%.*]] = ptrtoint ptr [[DOTLCSSA]] to i64
+; CHECK-NEXT: [[TMP13:%.*]] = ptrtoint ptr [[TMP0]] to i64
+; CHECK-NEXT: [[TMP14:%.*]] = sub i64 [[TMP12]], [[TMP13]]
+; CHECK-NEXT: ret i64 [[TMP14]]
+;
+ br label %2
+
+2: ; preds = %2, %1
+ %3 = phi ptr [ %0, %1 ], [ %6, %2 ]
+ %4 = load i8, ptr %3, align 1
+ %5 = icmp eq i8 %4, 0
+ %6 = getelementptr inbounds i8, ptr %3, i64 1
+ br i1 %5, label %7, label %2
+
+7: ; preds = %2
+ %8 = ptrtoint ptr %3 to i64
+ %9 = ptrtoint ptr %0 to i64
+ %10 = sub i64 %8, %9
+ ret i64 %10
+}
+
+
+define i32 @valid_strlen_2(ptr %0) {
+; CHECK-LABEL: define i32 @valid_strlen_2(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; CHECK-NEXT: br i1 [[TMP2]], label %[[BB14:.*]], label %[[BB3:.*]]
+; CHECK: [[BB3]]:
+; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[TMP0]], align 1
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0
+; CHECK-NEXT: br i1 [[TMP5]], label %[[BB14]], label %[[DOTPREHEADER:.*]]
+; CHECK: [[_PREHEADER:.*:]]
+; CHECK-NEXT: [[STR:%.*]] = getelementptr i8, ptr [[TMP0]], i64 0
+; CHECK-NEXT: [[STRLEN:%.*]] = call i64 @strlen(ptr [[STR]])
+; CHECK-NEXT: [[STR_ADDR_0_LCSSA:%.*]] = getelementptr i8, ptr [[STR]], i64 [[STRLEN]]
+; CHECK-NEXT: br label %[[BB6:.*]]
+; CHECK: [[BB6]]:
+; CHECK-NEXT: [[TMP7:%.*]] = phi ptr [ poison, %[[BB6]] ], [ [[TMP0]], %[[DOTPREHEADER]] ]
+; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i8 poison, 0
+; CHECK-NEXT: br i1 true, label %[[BB9:.*]], label %[[BB6]]
+; CHECK: [[BB9]]:
+; CHECK-NEXT: [[SUB_PTR_LHS_CAST:%.*]] = ptrtoint ptr [[STR_ADDR_0_LCSSA]] to i64
+; CHECK-NEXT: [[SUB_PTR_RHS_CAST:%.*]] = ptrtoint ptr [[TMP0]] to i64
+; CHECK-NEXT: [[SUB_PTR_SUB:%.*]] = sub i64 [[SUB_PTR_LHS_CAST]], [[SUB_PTR_RHS_CAST]]
+; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[SUB_PTR_SUB]] to i32
+; CHECK-NEXT: br label %[[BB14]]
+; CHECK: [[BB14]]:
+; CHECK-NEXT: [[TMP15:%.*]] = phi i32 [ [[TMP13]], %[[BB9]] ], [ 0, %[[BB3]] ], [ 0, [[TMP1:%.*]] ]
+; CHECK-NEXT: ret i32 [[TMP15]]
+;
+ %2 = icmp eq ptr %0, null
+ br i1 %2, label %16, label %3
+
+3: ; preds = %1
+ %4 = load i8, ptr %0, align 1
+ %5 = icmp eq i8 %4, 0
+ br i1 %5, label %16, label %6
+
+6: ; preds = %3, %6
+ %7 = phi ptr [ %8, %6 ], [ %0, %3 ]
+ %8 = getelementptr inbounds i8, ptr %7, i64 1
+ %9 = load i8, ptr %8, align 1
+ %10 = icmp eq i8 %9, 0
+ br i1 %10, label %11, label %6
+
+11: ; preds = %6
+ %12 = ptrtoint ptr %8 to i64
+ %13 = ptrtoint ptr %0 to i64
+ %14 = sub i64 %12, %13
+ %15 = trunc i64 %14 to i32
+ br label %16
+
+16: ; preds = %1, %3, %11
+ %17 = phi i32 [ %15, %11 ], [ 0, %3 ], [ 0, %1 ]
+ ret i32 %17
+}
+
+define i64 @valid_strlen_3(ptr %str) local_unnamed_addr #0 {
+; CHECK-LABEL: define i64 @valid_strlen_3(
+; CHECK-SAME: ptr [[STR:%.*]]) local_unnamed_addr {
+; CHECK-NEXT: [[_PREHEADER:.*:]]
+; CHECK-NEXT: [[STRLEN:%.*]] = call i64 @strlen(ptr [[STR]])
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[STR]], i64 [[STRLEN]]
+; CHECK-NEXT: br label %[[WHILE_COND:.*]]
+; CHECK: [[WHILE_COND]]:
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i8 poison, 0
+; CHECK-NEXT: [[INCDEC_PTR:%.*]] = getelementptr inbounds i8, ptr poison, i64 1
+; CHECK-NEXT: br i1 true, label %[[WHILE_END:.*]], label %[[WHILE_COND]]
+; CHECK: [[WHILE_END]]:
+; CHECK-NEXT: [[TMP10:%.*]] = ptrtoint ptr [[TMP0]] to i64
+; CHECK-NEXT: [[SUB_PTR_RHS_CAST:%.*]] = ptrtoint ptr [[STR]] to i64
+; CHECK-NEXT: [[TMP13:%.*]] = sub i64 [[TMP10]], [[SUB_PTR_RHS_CAST]]
+; CHECK-NEXT: tail call void @use(ptr [[TMP0]])
+; CHECK-NEXT: tail call void @use(ptr [[STR]])
+; CHECK-NEXT: ret i64 [[TMP13]]
+;
+entry:
+ br label %while.cond
+
+while.cond: ; preds = %while.cond, %entry
+ %str.addr.0 = phi ptr [ %str, %entry ], [ %incdec.ptr, %while.cond ]
+ %0 = load i8, ptr %str.addr.0, align 1
+ %cmp.not = icmp eq i8 %0, 0
+ %incdec.ptr = getelementptr inbounds i8, ptr %str.addr.0, i64 1
+ br i1 %cmp.not, label %while.end, label %while.cond
+
+while.end: ; preds = %while.cond
+ %sub.ptr.lhs.cast = ptrtoint ptr %str.addr.0 to i64
+ %sub.ptr.rhs.cast = ptrtoint ptr %str to i64
+ %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
+ tail call void @use(ptr %str.addr.0)
+ tail call void @use(ptr %str)
+ ret i64 %sub.ptr.sub
+}
+
+define i64 @valid_strlen_4(ptr %0) {
+; CHECK-LABEL: define i64 @valid_strlen_4(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq ptr [[TMP0]], null
+; CHECK-NEXT: br i1 [[TMP2]], label %[[BB10:.*]], label %[[DOTPREHEADER:.*]]
+; CHECK: [[_PREHEADER:.*:]]
+; CHECK-NEXT: [[NEWGEP:%.*]] = getelementptr i8, ptr [[TMP0]], i64 0
+; CHECK-NEXT: [[STRLEN:%.*]] = call i64 @strlen(ptr [[NEWGEP]])
+; CHECK-NEXT: [[END:%.*]] = getelementptr i8, ptr [[NEWGEP]], i64 [[STRLEN]]
+; CHECK-NEXT: br label %[[BB3:.*]]
+; CHECK: [[BB3]]:
+; CHECK-NEXT: [[TMP4:%.*]] = phi ptr [ poison, %[[BB3]] ], [ [[TMP0]], %[[DOTPREHEADER]] ]
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i8 poison, 0
+; CHECK-NEXT: br i1 true, label %[[BB6:.*]], label %[[BB3]]
+; CHECK: [[BB6]]:
+; CHECK-NEXT: [[TMP7:%.*]] = ptrtoint ptr [[END]] to i64
+; CHECK-NEXT: [[TMP8:%.*]] = ptrtoint ptr [[TMP0]] to i64
+; CHECK-NEXT: [[TMP9:%.*]] = sub i64 [[TMP7]], [[TMP8]]
+; CHECK-NEXT: br label %[[BB10]]
+; CHECK: [[BB10]]:
+; CHECK-NEXT: [[TMP11:%.*]] = phi i64 [ ...
[truncated]
|
Hi @efriedma-quic we are continuing some loop upstreaming work which you had a comment on from a few years ago. I have addressed the comment about rewriting the LCSSA Phi Node instead of pattern matching on pointer diff. Sorry for the direct ping, don't have commit access. |
Just a gentle ping for review. Thanks |
if (!LCSSAEv || !dyn_cast<SCEVUnknown>(SE->getPointerBase(LCSSAEv)) || | ||
!LCSSAEv->isAffine()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this check? getStepRecurrence
has been checked before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failure case that I had in mind (taken from valid_strlen_1):
2:
%3 = phi ptr [ %0, %1 ], [ %6, %2 ]
%4 = load i8, ptr %3, align 1
%5 = icmp eq i8 %4, 0
%6 = getelementptr inbounds i8, ptr %3, i64 1
br i1 %5, label %7, label %2
7:
...
ret i8 %4
This uses %4
instead of %10
. LCSSA will create a PHI for it, but it is not the PHI for %3. I see you counting the number of exit PHIs, the wrong one. https://github.com/llvm/llvm-project/pull/108985/files#diff-0d3d299d16e73bef015777082e043f5353798421299f684d52028b977fea61e1R1654-R1656 seems to catch that. Some more tests for expected failures like this one would be good.
// It should have a preheader containing nothing but an unconditional branch. | ||
auto *Preheader = CurLoop->getLoopPreheader(); | ||
if (!Preheader || &Preheader->front() != Preheader->getTerminator()) | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition seems unnecessarily restrictive. For example, it gets the idiom in check1 but not check2:
#include <stdio.h>
#include <string.h>
void check1(char *s) {
char *p = s;
do {
p++;
} while (*p);
printf("%s %ld\n", s, strlen(s));
printf ("%ld\n", p - s);
}
void check2(char *s) {
printf("%s %ld\n", s, strlen(s));
char *p = s;
do {
p++;
} while (*p);
printf ("%ld\n", p - s);
}
int main(int argc, char **argv) {
check1(argv[0]);
check2(argv[0]);
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this was unnecessary
/// call, and returns true; otherwise, returns false. | ||
/// | ||
/// The core idiom we are trying to detect is: | ||
/// \code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a clearer explanation of what this function does and how it works is needed. The idiom detected is compute the end address of a string. An instance of a strlen computation that has an end address computation of the expected form will be end addr - start addr which will become strlen + start addr - start addr.
f5135b6
to
2c2b30a
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
Finally had some time to work on this patch again. Made some major improvements to increase the range of idioms that this pass would catch.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not find any major issue. LGTM.
You code looks nicely tidy.
630b00e
to
7f6aa84
Compare
It's a pure rebase since the last review. I finally had the chance to look at the CI problems. There were some failing LLDB test cases and a merge conflict. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/17681 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/1377 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/19060 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/174/builds/14565 Here is the relevant piece of the build log for the reference
|
…lvm#108985)" This reverts commit bf6357f.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/14638 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/6210 Here is the relevant piece of the build log for the reference
|
Looks like there was a logic error in if (!isLibFuncEmittable(Preheader->getModule(), TLI, LibFunc_wcslen) &&
!DisableLIRP::Wcslen)
return false; Fixing this and add a test for checking emittable. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/25840 Here is the relevant piece of the build log for the reference
|
…108985)" (#131412) Relands #108985 This PR continues the effort made in https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848 and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460 to extend `LoopIdiomRecognize` to find and replace loops of the form ```c base = str; while (*str) ++str; ``` and transforming the `strlen` loop idiom into the appropriate `strlen` and `wcslen` library call which will give a small performance boost if replaced. ```c str = base + strlen(base) len = str - base ```
…nd wcslen (#108985)" (#131412) Relands llvm/llvm-project#108985 This PR continues the effort made in https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848 and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460 to extend `LoopIdiomRecognize` to find and replace loops of the form ```c base = str; while (*str) ++str; ``` and transforming the `strlen` loop idiom into the appropriate `strlen` and `wcslen` library call which will give a small performance boost if replaced. ```c str = base + strlen(base) len = str - base ```
This change, as relanded in ac9049d, broke Wine (causing Wine to hang on startup). I'd like to revert this one for now. I'll dig in closer and pinpoint the actual file that is causing the issue and which is being broken by this change, and report back here with the smaller selfcontained reproducer of the problem, but I might not have time to do that investigation until maybe latest on Monday. |
…strlen and wcslen (#108985)" (#131412)" This reverts commit ac9049d. This change broke Wine (causing it to hang on startup), see llvm/llvm-project#108985 for discussion.
No problem, thanks for letting me know. |
@mstorsjo The hang seems to have come from https://github.com/wine-mirror/wine/blob/0927c5c3da7cda8cf476416260286bd299ad6319/dlls/ntdll/string.c#L423 size_t __cdecl strlen( const char *str )
{
const char *s = str;
while (*s) s++;
return s - str;
} Where the strlen idiom is recognized and replaced by a call to it self. These type of scenarios are problematic in general. For example, in GCC without $ gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22)
Copyright (C) 2018 Free Software Foundation, Inc.
void *memset(void *b, int c, size_t len) {
int i;
unsigned char *p = b;
i = 0;
while (len > 0) {
*p = c;
p++;
len--;
}
return b;
}
0000000000400590 <memset>:
400590: 48 85 d2 test %rdx,%rdx
400593: 74 1b je 4005b0 <memset+0x20>
400595: 48 83 ec 08 sub $0x8,%rsp
400599: 40 0f b6 f6 movzbl %sil,%esi
40059d: e8 ee ff ff ff callq 400590 <memset>
4005a2: 48 83 c4 08 add $0x8,%rsp
4005a6: c3 retq
4005a7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4005ae: 00 00
4005b0: 48 89 f8 mov %rdi,%rax
4005b3: c3 retq
4005b4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4005bb: 00 00 00
4005be: 66 90 xchg %ax,%ax I'm not sure if there exists a solution in general that can solve all cases of this scenario. The best solution for this is probably to pass Otherwise for this patch, what we can do is skip the loop idiom for functions if they have name matching But we can still get into similar problems with indirect recursion. size_t strlen(const char* str) {
return foo(str);
}
__attribute__((noinline)) size_t foo(const char* str) {
const char *s = str;
while (*s) s++;
return s - str;
} However, I envision that these type of scenarios will be exceedingly rare. |
…lvm#108985)" (llvm#131412) Relands llvm#108985 This PR continues the effort made in https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848 and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460 to extend `LoopIdiomRecognize` to find and replace loops of the form ```c base = str; while (*str) ++str; ``` and transforming the `strlen` loop idiom into the appropriate `strlen` and `wcslen` library call which will give a small performance boost if replaced. ```c str = base + strlen(base) len = str - base ```
Thanks for locating the issue, before I had time to dig deeper!
Indeed, this is a recurring problem when implementing many standard C functions. Wine does actually build some source files with
Thanks, that sounds like a good idea! First when I read this, I was a bit hesitant whether you should need to implement something like that just due to this case (as Wine probably should just extend their use of That kind of fix would probably be useful in many other places as well; e.g. if implementing
Yes, that sounds quite rare. And if one does that, one definitely needs to use
I think both sound like a good way forward; I would hope that Wine can extend their use of |
Yes, I agree, we should extend |
…108985" (#132572) Reland #108985 Extend `LoopIdiomRecognize` to find and replace loops of the form ```c base = str; while (*str) ++str; ``` and transforming the `strlen` loop idiom into the appropriate `strlen` and `wcslen` library call which will give a small performance boost if replaced. ```c str = base + strlen(base) len = str - base ```
…nd wcslen #108985" (#132572) Reland llvm/llvm-project#108985 Extend `LoopIdiomRecognize` to find and replace loops of the form ```c base = str; while (*str) ++str; ``` and transforming the `strlen` loop idiom into the appropriate `strlen` and `wcslen` library call which will give a small performance boost if replaced. ```c str = base + strlen(base) len = str - base ```
This PR continues the effort made in https://discourse.llvm.org/t/rfc-strlen-loop-idiom-recognition-folding/55848 and https://reviews.llvm.org/D83392 and https://reviews.llvm.org/D88460 to extend
LoopIdiomRecognize
to find and replace loops of the formand transforming the
strlen
loop idiom into the appropriatestrlen
andwcslen
library call which will give a small performance boost if replaced.