Skip to content

Commit 091422a

Browse files
committed
[LSR] Fix wrapping bug in lsr-term-fold logic
The existing logic was unsound, in two ways. First, due to wrapping on the trip count computation, it could compute a value which convert a loop which exiting on iteration 256, to one which exited at 255. (With i8 trip counts.) Second, it allowed rewriting when the trip count implies wrapping around the alternate IV. As a trivial example, it allowed rewriting an i128 exit test in terms of an i64 IV. This is obviously wrong. Note that the test change is fairly minimal - i.e. only the targeted test - but that's only because I precommitted a change which switched the test from 32 to 64 bit pointers. For 32 bit point architectures with 32 bit primary inductions, this transform is almost always unsound to perform. Differential Revision: https://reviews.llvm.org/D146429
1 parent eecb8c5 commit 091422a

File tree

2 files changed

+24
-8
lines changed

2 files changed

+24
-8
lines changed

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6762,7 +6762,25 @@ canFoldTermCondOfLoop(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
67626762
continue;
67636763
}
67646764

6765-
// FIXME: This does not properly account for overflow.
6765+
// Check that we can compute the value of AddRec on the exiting iteration
6766+
// without soundness problems. There are two cases to be worried about:
6767+
// 1) BECount could be 255 with type i8. Simply adding one would be
6768+
// incorrect. We may need one extra bit to represent the unsigned
6769+
// trip count.
6770+
// 2) The multiplication of stride by TC may wrap around. This is subtle
6771+
// because computing the result accounting for wrap is insufficient.
6772+
// In order to use the result in an exit test, we must also know that
6773+
// AddRec doesn't take the same value on any previous iteration.
6774+
// The simplest case to consider is a candidate IV which is narrower
6775+
// than the trip count (and thus original IV), but this can also
6776+
// happen due to non-unit strides on the candidate IVs.
6777+
ConstantRange StepCR = SE.getSignedRange(AddRec->getStepRecurrence(SE));
6778+
ConstantRange BECountCR = SE.getUnsignedRange(BECount);
6779+
unsigned NoOverflowBitWidth = BECountCR.getActiveBits() + 1 + StepCR.getMinSignedBits();
6780+
unsigned ARBitWidth = SE.getTypeSizeInBits(AddRec->getType());
6781+
if (NoOverflowBitWidth > ARBitWidth)
6782+
continue;
6783+
67666784
const SCEV *TermValueSLocal = SE.getAddExpr(
67676785
AddRec->getOperand(0),
67686786
SE.getTruncateOrZeroExtend(

llvm/test/Transforms/LoopStrengthReduce/lsr-term-fold.ll

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -192,20 +192,18 @@ for.end: ; preds = %for.body
192192
; In this case, the integer IV has a larger bitwidth than the pointer IV.
193193
; This means that the smaller IV may wrap around multiple times before
194194
; the original loop exit is taken.
195-
; FIXME: miscompile
196195
define void @iv_size(ptr %a, i128 %N) {
197196
; CHECK-LABEL: @iv_size(
198197
; CHECK-NEXT: entry:
199-
; CHECK-NEXT: [[TMP0:%.*]] = trunc i128 [[N:%.*]] to i64
200-
; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 2
201-
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[TMP1]]
202198
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
203199
; CHECK: for.body:
204-
; CHECK-NEXT: [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[A]], [[ENTRY:%.*]] ]
200+
; CHECK-NEXT: [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[A:%.*]], [[ENTRY:%.*]] ]
201+
; CHECK-NEXT: [[LSR_IV:%.*]] = phi i128 [ [[LSR_IV_NEXT:%.*]], [[FOR_BODY]] ], [ [[N:%.*]], [[ENTRY]] ]
205202
; CHECK-NEXT: store i32 1, ptr [[LSR_IV1]], align 4
203+
; CHECK-NEXT: [[LSR_IV_NEXT]] = add nsw i128 [[LSR_IV]], -1
206204
; CHECK-NEXT: [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i64 4
207-
; CHECK-NEXT: [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
208-
; CHECK-NEXT: br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
205+
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i128 [[LSR_IV_NEXT]], 0
206+
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
209207
; CHECK: for.end:
210208
; CHECK-NEXT: ret void
211209
;

0 commit comments

Comments
 (0)