Skip to content

Commit 7f1a744

Browse files
authored
[TailDup][MachineSSAUpdater] Let RewriteUse insert a COPY when needed (#95553)
When running early-tailduplication we've seen problems with machine verifier errors due to register class mismatches after doing the machine SSA updates. Typical scenario is that there is a PHI node and another instruction that is using the same vreg: %othervreg:otherclass = PHI %vreg:origclass, %bb MInstr %vreg:origclass but then after TailDuplicator::tailDuplicateAndUpdate we get %othervreg:otherclass = PHI %vreg:origclass, %bb, ... MInstr %othervreg:otherclass Such rewrites are only valid if 'otherclass' is equal to (or a subclass of) 'origclass'. The solution here is based on adding a COPY instruction to make sure we satisfy constraints given by 'MInstr' in the example. So if 'otherclass' isn't equal to (or a subclass of) 'origclass' we insert a copy after the PHI like this: %othervreg:otherclass = PHI %vreg:origclass, %bb, ... %newvreg:origclass = COPY %othervreg:otherclass MInstr %newvreg:origclass A special case is when it is possible to constrain the register class instead of inserting a COPY. We currently prefer to constrain the register class instead of inserting a COPY, even if it is a bit unclear if that always is better (considering register pressure for the constrained class etc.). Fixes: #62712
1 parent 01fb529 commit 7f1a744

File tree

2 files changed

+319
-1
lines changed

2 files changed

+319
-1
lines changed

llvm/lib/CodeGen/MachineSSAUpdater.cpp

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
209209
// If the client wants to know about all new instructions, tell it.
210210
if (InsertedPHIs) InsertedPHIs->push_back(InsertedPHI);
211211

212-
LLVM_DEBUG(dbgs() << " Inserted PHI: " << *InsertedPHI << "\n");
212+
LLVM_DEBUG(dbgs() << " Inserted PHI: " << *InsertedPHI);
213213
return InsertedPHI.getReg(0);
214214
}
215215

@@ -236,6 +236,22 @@ void MachineSSAUpdater::RewriteUse(MachineOperand &U) {
236236
NewVR = GetValueInMiddleOfBlock(UseMI->getParent());
237237
}
238238

239+
// Insert a COPY if needed to satisfy register class constraints for the using
240+
// MO. Or, if possible, just constrain the class for NewVR to avoid the need
241+
// for a COPY.
242+
if (NewVR) {
243+
const TargetRegisterClass *UseRC =
244+
dyn_cast_or_null<const TargetRegisterClass *>(RegAttrs.RCOrRB);
245+
if (UseRC && !MRI->constrainRegClass(NewVR, UseRC)) {
246+
MachineBasicBlock *UseBB = UseMI->getParent();
247+
MachineInstr *InsertedCopy =
248+
InsertNewDef(TargetOpcode::COPY, UseBB, UseBB->getFirstNonPHI(),
249+
RegAttrs, MRI, TII)
250+
.addReg(NewVR);
251+
NewVR = InsertedCopy->getOperand(0).getReg();
252+
LLVM_DEBUG(dbgs() << " Inserted COPY: " << *InsertedCopy);
253+
}
254+
}
239255
U.setReg(NewVR);
240256
}
241257

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
2+
# RUN: llc -verify-machineinstrs -mtriple aarch64-none-linux-gnu -run-pass early-tailduplication -o - %s | FileCheck %s
3+
4+
# These tests used to hit verification failure due to register class
5+
# mismatches after tail dupilication (and the SSA form updates).
6+
7+
# In test1 we have a simple case when the COPY already is duplicated, but
8+
# TailDuplication will try to eliminate the PHI in bb.2 by adding a new PHI in
9+
# bb.4. The presence of a PHI node in bb.4, which happens to assign to a
10+
# fpr32 register, was enough to mess up the result. The PHI node was reused
11+
# and the use of %2 in the CBNZW was changed into using %3 instead. But the
12+
# register class of %3 is not correct for CBNZW.
13+
# The fix involves adding a COPY instruction that moves the value to
14+
# a register of correct regclass.
15+
---
16+
name: test1
17+
tracksRegLiveness: true
18+
body: |
19+
; CHECK-LABEL: name: test1
20+
; CHECK: bb.0:
21+
; CHECK-NEXT: successors: %bb.4(0x80000000)
22+
; CHECK-NEXT: {{ $}}
23+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
24+
; CHECK-NEXT: B %bb.4
25+
; CHECK-NEXT: {{ $}}
26+
; CHECK-NEXT: bb.1:
27+
; CHECK-NEXT: successors: %bb.4(0x80000000)
28+
; CHECK-NEXT: {{ $}}
29+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
30+
; CHECK-NEXT: B %bb.4
31+
; CHECK-NEXT: {{ $}}
32+
; CHECK-NEXT: bb.3:
33+
; CHECK-NEXT: successors: %bb.3(0x80000000)
34+
; CHECK-NEXT: {{ $}}
35+
; CHECK-NEXT: B %bb.3
36+
; CHECK-NEXT: {{ $}}
37+
; CHECK-NEXT: bb.4:
38+
; CHECK-NEXT: successors: %bb.5(0x80000000)
39+
; CHECK-NEXT: {{ $}}
40+
; CHECK-NEXT: [[PHI:%[0-9]+]]:fpr32 = PHI %0, %bb.0, %1, %bb.1
41+
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY [[PHI]]
42+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit [[PHI]]
43+
; CHECK-NEXT: CBNZW [[COPY]], %bb.5
44+
; CHECK-NEXT: {{ $}}
45+
; CHECK-NEXT: bb.5:
46+
bb.0:
47+
$x0 = COPY undef $x0, implicit-def %0:gpr32
48+
B %bb.2
49+
50+
bb.1:
51+
$x0 = COPY undef $x0, implicit-def %1:gpr32
52+
53+
bb.2:
54+
%2:gpr32 = PHI %1, %bb.1, %0, %bb.0
55+
B %bb.4
56+
57+
bb.3:
58+
B %bb.3
59+
60+
bb.4:
61+
%3:fpr32 = PHI %2, %bb.2
62+
$x0 = COPY undef $x0, implicit %3:fpr32
63+
CBNZW %2, %bb.5
64+
65+
bb.5:
66+
67+
...
68+
69+
# In test2 there are two PHIs already present, one with the wanted register
70+
# class. No idea if this is a common scenario in reality (hand written mir
71+
# test case).
72+
# FIXME: Can we pick the best PHI directly instead of getting a COPY from the
73+
# one with wrong register class?
74+
---
75+
name: test2
76+
tracksRegLiveness: true
77+
body: |
78+
; CHECK-LABEL: name: test2
79+
; CHECK: bb.0:
80+
; CHECK-NEXT: successors: %bb.4(0x80000000)
81+
; CHECK-NEXT: {{ $}}
82+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
83+
; CHECK-NEXT: B %bb.4
84+
; CHECK-NEXT: {{ $}}
85+
; CHECK-NEXT: bb.1:
86+
; CHECK-NEXT: successors: %bb.4(0x80000000)
87+
; CHECK-NEXT: {{ $}}
88+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
89+
; CHECK-NEXT: B %bb.4
90+
; CHECK-NEXT: {{ $}}
91+
; CHECK-NEXT: bb.3:
92+
; CHECK-NEXT: successors: %bb.3(0x80000000)
93+
; CHECK-NEXT: {{ $}}
94+
; CHECK-NEXT: B %bb.3
95+
; CHECK-NEXT: {{ $}}
96+
; CHECK-NEXT: bb.4:
97+
; CHECK-NEXT: successors: %bb.5(0x80000000)
98+
; CHECK-NEXT: {{ $}}
99+
; CHECK-NEXT: [[PHI:%[0-9]+]]:fpr32 = PHI %0, %bb.0, %1, %bb.1
100+
; CHECK-NEXT: [[PHI1:%[0-9]+]]:gpr32 = PHI %0, %bb.0, %1, %bb.1
101+
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY [[PHI]]
102+
; CHECK-NEXT: CBNZW [[COPY]], %bb.5
103+
; CHECK-NEXT: {{ $}}
104+
; CHECK-NEXT: bb.5:
105+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit [[PHI]]
106+
bb.0:
107+
$x0 = COPY undef $x0, implicit-def %0:gpr32
108+
B %bb.2
109+
110+
bb.1:
111+
$x0 = COPY undef $x0, implicit-def %1:gpr32
112+
113+
bb.2:
114+
%2:gpr32 = PHI %1, %bb.1, %0, %bb.0
115+
B %bb.4
116+
117+
bb.3:
118+
B %bb.3
119+
120+
bb.4:
121+
%3:fpr32 = PHI %2, %bb.2
122+
%4:gpr32 = PHI %2, %bb.2
123+
CBNZW %2, %bb.5
124+
125+
bb.5:
126+
$x0 = COPY undef $x0, implicit %3:fpr32
127+
...
128+
129+
# In test3 we have multiple uses, and in multiple BBs. This test is to show
130+
# that we get one COPY instruction inserted for each BB where there is a use.
131+
---
132+
name: test3
133+
tracksRegLiveness: true
134+
body: |
135+
; CHECK-LABEL: name: test3
136+
; CHECK: bb.0:
137+
; CHECK-NEXT: successors: %bb.4(0x80000000)
138+
; CHECK-NEXT: {{ $}}
139+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
140+
; CHECK-NEXT: B %bb.4
141+
; CHECK-NEXT: {{ $}}
142+
; CHECK-NEXT: bb.1:
143+
; CHECK-NEXT: successors: %bb.4(0x80000000)
144+
; CHECK-NEXT: {{ $}}
145+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
146+
; CHECK-NEXT: B %bb.4
147+
; CHECK-NEXT: {{ $}}
148+
; CHECK-NEXT: bb.3:
149+
; CHECK-NEXT: successors: %bb.3(0x80000000)
150+
; CHECK-NEXT: {{ $}}
151+
; CHECK-NEXT: B %bb.3
152+
; CHECK-NEXT: {{ $}}
153+
; CHECK-NEXT: bb.4:
154+
; CHECK-NEXT: successors: %bb.5(0x80000000)
155+
; CHECK-NEXT: {{ $}}
156+
; CHECK-NEXT: [[PHI:%[0-9]+]]:fpr32 = PHI %0, %bb.0, %1, %bb.1
157+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit [[PHI]]
158+
; CHECK-NEXT: {{ $}}
159+
; CHECK-NEXT: bb.5:
160+
; CHECK-NEXT: successors: %bb.6(0x80000000)
161+
; CHECK-NEXT: {{ $}}
162+
; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr32 = COPY [[PHI]]
163+
; CHECK-NEXT: CBNZW [[COPY]], %bb.6
164+
; CHECK-NEXT: {{ $}}
165+
; CHECK-NEXT: bb.6:
166+
; CHECK-NEXT: successors: %bb.7(0x80000000)
167+
; CHECK-NEXT: {{ $}}
168+
; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr32 = COPY [[PHI]]
169+
; CHECK-NEXT: CBNZW [[COPY1]], %bb.7
170+
; CHECK-NEXT: {{ $}}
171+
; CHECK-NEXT: bb.7:
172+
bb.0:
173+
$x0 = COPY undef $x0, implicit-def %0:gpr32
174+
B %bb.2
175+
176+
bb.1:
177+
$x0 = COPY undef $x0, implicit-def %1:gpr32
178+
179+
bb.2:
180+
%2:gpr32 = PHI %1, %bb.1, %0, %bb.0
181+
B %bb.4
182+
183+
bb.3:
184+
B %bb.3
185+
186+
bb.4:
187+
%3:fpr32 = PHI %2, %bb.2
188+
$x0 = COPY undef $x0, implicit %3:fpr32
189+
190+
bb.5:
191+
CBNZW %2, %bb.6
192+
193+
bb.6:
194+
CBNZW %2, %bb.7
195+
196+
bb.7:
197+
...
198+
199+
# In test4 we do not need to insert a COPY.
200+
# The register class can be constrained instead.
201+
---
202+
name: test4
203+
tracksRegLiveness: true
204+
body: |
205+
; CHECK-LABEL: name: test4
206+
; CHECK: bb.0:
207+
; CHECK-NEXT: successors: %bb.4(0x80000000)
208+
; CHECK-NEXT: {{ $}}
209+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
210+
; CHECK-NEXT: B %bb.4
211+
; CHECK-NEXT: {{ $}}
212+
; CHECK-NEXT: bb.1:
213+
; CHECK-NEXT: successors: %bb.4(0x80000000)
214+
; CHECK-NEXT: {{ $}}
215+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
216+
; CHECK-NEXT: B %bb.4
217+
; CHECK-NEXT: {{ $}}
218+
; CHECK-NEXT: bb.3:
219+
; CHECK-NEXT: successors: %bb.3(0x80000000)
220+
; CHECK-NEXT: {{ $}}
221+
; CHECK-NEXT: B %bb.3
222+
; CHECK-NEXT: {{ $}}
223+
; CHECK-NEXT: bb.4:
224+
; CHECK-NEXT: successors: %bb.5(0x80000000)
225+
; CHECK-NEXT: {{ $}}
226+
; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr32common = PHI %0, %bb.0, %1, %bb.1
227+
; CHECK-NEXT: CBNZW [[PHI]], %bb.5
228+
; CHECK-NEXT: {{ $}}
229+
; CHECK-NEXT: bb.5:
230+
bb.0:
231+
$x0 = COPY undef $x0, implicit-def %0:gpr32
232+
B %bb.2
233+
234+
bb.1:
235+
$x0 = COPY undef $x0, implicit-def %1:gpr32
236+
237+
bb.2:
238+
%2:gpr32 = PHI %1, %bb.1, %0, %bb.0
239+
B %bb.4
240+
241+
bb.3:
242+
B %bb.3
243+
244+
bb.4:
245+
%3:gpr32sp = PHI %2, %bb.2
246+
CBNZW %2, %bb.5
247+
248+
bb.5:
249+
250+
...
251+
252+
# In test5 we do not need to insert a COPY.
253+
# The register class can be constrained instead.
254+
---
255+
name: test5
256+
tracksRegLiveness: true
257+
body: |
258+
; CHECK-LABEL: name: test5
259+
; CHECK: bb.0:
260+
; CHECK-NEXT: successors: %bb.4(0x80000000)
261+
; CHECK-NEXT: {{ $}}
262+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %0
263+
; CHECK-NEXT: B %bb.4
264+
; CHECK-NEXT: {{ $}}
265+
; CHECK-NEXT: bb.1:
266+
; CHECK-NEXT: successors: %bb.4(0x80000000)
267+
; CHECK-NEXT: {{ $}}
268+
; CHECK-NEXT: $x0 = COPY undef $x0, implicit-def %1
269+
; CHECK-NEXT: B %bb.4
270+
; CHECK-NEXT: {{ $}}
271+
; CHECK-NEXT: bb.3:
272+
; CHECK-NEXT: successors: %bb.3(0x80000000)
273+
; CHECK-NEXT: {{ $}}
274+
; CHECK-NEXT: B %bb.3
275+
; CHECK-NEXT: {{ $}}
276+
; CHECK-NEXT: bb.4:
277+
; CHECK-NEXT: successors: %bb.5(0x80000000)
278+
; CHECK-NEXT: {{ $}}
279+
; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr32common = PHI %0, %bb.0, %1, %bb.1
280+
; CHECK-NEXT: CBNZW [[PHI]], %bb.5
281+
; CHECK-NEXT: {{ $}}
282+
; CHECK-NEXT: bb.5:
283+
bb.0:
284+
$x0 = COPY undef $x0, implicit-def %0:gpr32common
285+
B %bb.2
286+
287+
bb.1:
288+
$x0 = COPY undef $x0, implicit-def %1:gpr32common
289+
290+
bb.2:
291+
%2:gpr32common = PHI %1, %bb.1, %0, %bb.0
292+
B %bb.4
293+
294+
bb.3:
295+
B %bb.3
296+
297+
bb.4:
298+
%3:gpr32 = PHI %2, %bb.2
299+
CBNZW %2, %bb.5
300+
301+
bb.5:
302+
...

0 commit comments

Comments
 (0)