Skip to content

Commit 6a5ce1f

Browse files
committed
[LoopInterchange] Prevent from undoing its own transformation
LoopInterchange uses the bubble-sort fashion algorithm to sort the loops in a loop nest. For two loops A and B, it calls isProfitable(A, B) to determine whether these loops should be exchanged. The result of isProfitable(A, B) should be conservative, that is, it should return true only when we are sure exchanging A and B will improve performance. If it's not conservative enough, it's possible that a subsequent isProfitable(B, A) will also return true, in which case LoopInterchange will undo its previous transformation. To make isProfitable conservative enough, isProfitable(B, A) must not return true if isProfitable(A, B) returned true in the past. However, the current implementation can be in such a situation. isProfitable consists of several multiple rules, and they are applied one by one. The first rule is isProfitablePerLoopCacheAnalysis, and the problem is that there are cases where isProfitablePerLoopCacheAnalysis(A, B) returns true but isProfitablePerLoopCacheAnalysis(B, A) returns nullopt. This happens when the cost computed by CacheCostAnalysis is the same for A and B. In this case, the result of isProfitablePerLoopCacheAnalysis depends on the value of CostMap, which maps a loop to an index that is pre-computed in some way: If CostMap[A] is less than CostMap[B], it returns true, otherwise it returns nullopt, where the above problem occurs. To avoid such a situation, if the cost of CacheCostAnalysis is the same for A and B, it should return nullopt regardless of CostMap.
1 parent 3ff3b29 commit 6a5ce1f

File tree

2 files changed

+84
-5
lines changed

2 files changed

+84
-5
lines changed

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1141,17 +1141,15 @@ LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(
11411141
if (OuterLoopIt == CostMap.end())
11421142
return std::nullopt;
11431143

1144+
if (CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop))
1145+
return std::nullopt;
11441146
unsigned InnerIndex = InnerLoopIt->second;
11451147
unsigned OuterIndex = OuterLoopIt->second;
11461148
LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex
11471149
<< ", OuterIndex = " << OuterIndex << "\n");
1148-
if (InnerIndex < OuterIndex)
1149-
return std::optional<bool>(true);
11501150
assert(InnerIndex != OuterIndex && "CostMap should assign unique "
11511151
"numbers to each loop");
1152-
if (CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop))
1153-
return std::nullopt;
1154-
return std::optional<bool>(false);
1152+
return std::optional<bool>(InnerIndex < OuterIndex);
11551153
}
11561154

11571155
std::optional<bool>
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
; RUN: opt < %s -passes=loop-interchange -cache-line-size=1 -pass-remarks-output=%t -disable-output \
2+
; RUN: -verify-dom-info -verify-loop-info
3+
; RUN: FileCheck -input-file %t %s
4+
5+
6+
; Test that loop-interchange doesn't undo its own transoformation. This is
7+
; the case when the cost computed by CacheCost is the same for the loop of `j`
8+
; and `k`.
9+
;
10+
; #define N 4
11+
; int a[N*N][N*N][N*N];
12+
; void f() {
13+
; for (int i = 0; i < N; i++)
14+
; for (int j = 1; j < 2*N; j++)
15+
; for (int k = 1; k < 2*N; k++)
16+
; a[i][k+1][j-1] -= a[i+N-1][k][j];
17+
; }
18+
19+
; CHECK: --- !Passed
20+
; CHECK-NEXT: Pass: loop-interchange
21+
; CHECK-NEXT: Name: Interchanged
22+
; CHECK-NEXT: Function: f
23+
; CHECK-NEXT: Args:
24+
; CHECK-NEXT: - String: Loop interchanged with enclosing loop.
25+
; CHECK-NEXT: ...
26+
; CHECK-NEXT: --- !Missed
27+
; CHECK-NEXT: Pass: loop-interchange
28+
; CHECK-NEXT: Name: Dependence
29+
; CHECK-NEXT: Function: f
30+
; CHECK-NEXT: Args:
31+
; CHECK-NEXT: - String: Cannot interchange loops due to dependences.
32+
; CHECK-NEXT: ...
33+
; CHECK-NEXT: --- !Missed
34+
; CHECK-NEXT: Pass: loop-interchange
35+
; CHECK-NEXT: Name: InterchangeNotProfitable
36+
; CHECK-NEXT: Function: f
37+
; CHECK-NEXT: Args:
38+
; CHECK-NEXT: - String: Interchanging loops is not considered to improve cache locality nor vectorization.
39+
; CHECK-NEXT: ...
40+
41+
@a = dso_local local_unnamed_addr global [16 x [16 x [16 x i32]]] zeroinitializer, align 4
42+
43+
define dso_local void @f() {
44+
entry:
45+
br label %for.cond1.preheader
46+
47+
for.cond1.preheader:
48+
%indvars.iv46 = phi i64 [ 0, %entry ], [ %indvars.iv.next47, %for.cond.cleanup3 ]
49+
%0 = add nuw nsw i64 %indvars.iv46, 3
50+
br label %for.cond5.preheader
51+
52+
for.cond5.preheader:
53+
%indvars.iv41 = phi i64 [ 1, %for.cond1.preheader ], [ %indvars.iv.next42, %for.cond.cleanup7 ]
54+
%1 = add nsw i64 %indvars.iv41, -1
55+
br label %for.body8
56+
57+
for.cond.cleanup3:
58+
%indvars.iv.next47 = add nuw nsw i64 %indvars.iv46, 1
59+
%exitcond50 = icmp ne i64 %indvars.iv.next47, 4
60+
br i1 %exitcond50, label %for.cond1.preheader, label %for.cond.cleanup
61+
62+
for.cond.cleanup7:
63+
%indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
64+
%exitcond45 = icmp ne i64 %indvars.iv.next42, 8
65+
br i1 %exitcond45, label %for.cond5.preheader, label %for.cond.cleanup3
66+
67+
for.body8:
68+
%indvars.iv = phi i64 [ 1, %for.cond5.preheader ], [ %indvars.iv.next, %for.body8 ]
69+
%arrayidx12 = getelementptr inbounds nuw [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %0, i64 %indvars.iv, i64 %indvars.iv41
70+
%2 = load i32, ptr %arrayidx12, align 4
71+
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
72+
%arrayidx20 = getelementptr inbounds [16 x [16 x [16 x i32]]], ptr @a, i64 0, i64 %indvars.iv46, i64 %indvars.iv.next, i64 %1
73+
%3 = load i32, ptr %arrayidx20, align 4
74+
%sub21 = sub nsw i32 %3, %2
75+
store i32 %sub21, ptr %arrayidx20, align 4
76+
%exitcond = icmp ne i64 %indvars.iv.next, 8
77+
br i1 %exitcond, label %for.body8, label %for.cond.cleanup7
78+
79+
for.cond.cleanup:
80+
ret void
81+
}

0 commit comments

Comments
 (0)