Skip to content

Commit 4e62554

Browse files
committed
[MCA] Add support for nested and overlapping region markers
This patch fixes PR41523 https://bugs.llvm.org/show_bug.cgi?id=41523 Regions can now nest/overlap provided that they have different names. Anonymous regions cannot overlap. Region end markers must specify the region name. The only exception is for when there is only one user-defined region; in that particular case, the region end marker doesn't need to specify a name. Incorrect region end markers are no longer ignored. Instead, the tool reports an error and we exit with an error code. Added test cases to verify the new diagnostic error messages. Updated the llvm-mca docs to reflect this feature change. Differential Revision: https://reviews.llvm.org/D61676 llvm-svn: 360351
1 parent dcdb3c6 commit 4e62554

File tree

12 files changed

+398
-38
lines changed

12 files changed

+398
-38
lines changed

llvm/docs/CommandGuide/llvm-mca.rst

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -192,16 +192,54 @@ example:
192192

193193
.. code-block:: none
194194
195-
# LLVM-MCA-BEGIN My Code Region
195+
# LLVM-MCA-BEGIN
196196
...
197197
# LLVM-MCA-END
198198
199-
Multiple regions can be specified provided that they do not overlap. A code
200-
region can have an optional description. If no user-defined region is specified,
201-
then :program:`llvm-mca` assumes a default region which contains every
202-
instruction in the input file. Every region is analyzed in isolation, and the
203-
final performance report is the union of all the reports generated for every
204-
code region.
199+
If no user-defined region is specified, then :program:`llvm-mca` assumes a
200+
default region which contains every instruction in the input file. Every region
201+
is analyzed in isolation, and the final performance report is the union of all
202+
the reports generated for every code region.
203+
204+
Code regions can have names. For example:
205+
206+
.. code-block:: none
207+
208+
# LLVM-MCA-BEGIN A simple example
209+
add %eax, %eax
210+
# LLVM-MCA-END
211+
212+
The code from the example above defines a region named "A simple example" with a
213+
single instruction in it. Note how the region name doesn't have to be repeated
214+
in the ``LLVM-MCA-END`` directive. In the absence of overlapping regions,
215+
an anonymous ``LLVM-MCA-END`` directive always ends the currently active user
216+
defined region.
217+
218+
Example of nesting regions:
219+
220+
.. code-block:: none
221+
222+
# LLVM-MCA-BEGIN foo
223+
add %eax, %edx
224+
# LLVM-MCA-BEGIN bar
225+
sub %eax, %edx
226+
# LLVM-MCA-END bar
227+
# LLVM-MCA-END foo
228+
229+
Example of overlapping regions:
230+
231+
.. code-block:: none
232+
233+
# LLVM-MCA-BEGIN foo
234+
add %eax, %edx
235+
# LLVM-MCA-BEGIN bar
236+
sub %eax, %edx
237+
# LLVM-MCA-END foo
238+
add %eax, %edx
239+
# LLVM-MCA-END bar
240+
241+
Note that multiple anonymous regions cannot overlap. Also, overlapping regions
242+
cannot have the same name.
205243

206244
Inline assembly directives may be used from source code to annotate the
207245
assembly text:
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
2+
# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 < %s | FileCheck %s
3+
4+
testloop:
5+
# LLVM-MCA-BEGIN upper
6+
leal 42(%rdi), %eax
7+
# LLVM-MCA-BEGIN lower
8+
imull %esi, %eax
9+
# LLVM-MCA-END upper
10+
leal 42(%rdi), %eax
11+
# LLVM-MCA-END lower
12+
imull %esi, %eax
13+
14+
# CHECK: [0] Code Region - upper
15+
16+
# CHECK: Iterations: 100
17+
# CHECK-NEXT: Instructions: 200
18+
# CHECK-NEXT: Total Cycles: 205
19+
# CHECK-NEXT: Total uOps: 300
20+
21+
# CHECK: Dispatch Width: 2
22+
# CHECK-NEXT: uOps Per Cycle: 1.46
23+
# CHECK-NEXT: IPC: 0.98
24+
# CHECK-NEXT: Block RThroughput: 1.5
25+
26+
# CHECK: Instruction Info:
27+
# CHECK-NEXT: [1]: #uOps
28+
# CHECK-NEXT: [2]: Latency
29+
# CHECK-NEXT: [3]: RThroughput
30+
# CHECK-NEXT: [4]: MayLoad
31+
# CHECK-NEXT: [5]: MayStore
32+
# CHECK-NEXT: [6]: HasSideEffects (U)
33+
34+
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
35+
# CHECK-NEXT: 1 1 0.50 leal 42(%rdi), %eax
36+
# CHECK-NEXT: 2 3 1.00 imull %esi, %eax
37+
38+
# CHECK: Resources:
39+
# CHECK-NEXT: [0] - JALU0
40+
# CHECK-NEXT: [1] - JALU1
41+
# CHECK-NEXT: [2] - JDiv
42+
# CHECK-NEXT: [3] - JFPA
43+
# CHECK-NEXT: [4] - JFPM
44+
# CHECK-NEXT: [5] - JFPU0
45+
# CHECK-NEXT: [6] - JFPU1
46+
# CHECK-NEXT: [7] - JLAGU
47+
# CHECK-NEXT: [8] - JMul
48+
# CHECK-NEXT: [9] - JSAGU
49+
# CHECK-NEXT: [10] - JSTC
50+
# CHECK-NEXT: [11] - JVALU0
51+
# CHECK-NEXT: [12] - JVALU1
52+
# CHECK-NEXT: [13] - JVIMUL
53+
54+
# CHECK: Resource pressure per iteration:
55+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
56+
# CHECK-NEXT: 0.99 1.01 - - - - - - 1.00 - - - - -
57+
58+
# CHECK: Resource pressure by instruction:
59+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
60+
# CHECK-NEXT: 0.99 0.01 - - - - - - - - - - - - leal 42(%rdi), %eax
61+
# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %esi, %eax
62+
63+
# CHECK: [1] Code Region - lower
64+
65+
# CHECK: Iterations: 100
66+
# CHECK-NEXT: Instructions: 200
67+
# CHECK-NEXT: Total Cycles: 204
68+
# CHECK-NEXT: Total uOps: 300
69+
70+
# CHECK: Dispatch Width: 2
71+
# CHECK-NEXT: uOps Per Cycle: 1.47
72+
# CHECK-NEXT: IPC: 0.98
73+
# CHECK-NEXT: Block RThroughput: 1.5
74+
75+
# CHECK: Instruction Info:
76+
# CHECK-NEXT: [1]: #uOps
77+
# CHECK-NEXT: [2]: Latency
78+
# CHECK-NEXT: [3]: RThroughput
79+
# CHECK-NEXT: [4]: MayLoad
80+
# CHECK-NEXT: [5]: MayStore
81+
# CHECK-NEXT: [6]: HasSideEffects (U)
82+
83+
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
84+
# CHECK-NEXT: 2 3 1.00 imull %esi, %eax
85+
# CHECK-NEXT: 1 1 0.50 leal 42(%rdi), %eax
86+
87+
# CHECK: Resources:
88+
# CHECK-NEXT: [0] - JALU0
89+
# CHECK-NEXT: [1] - JALU1
90+
# CHECK-NEXT: [2] - JDiv
91+
# CHECK-NEXT: [3] - JFPA
92+
# CHECK-NEXT: [4] - JFPM
93+
# CHECK-NEXT: [5] - JFPU0
94+
# CHECK-NEXT: [6] - JFPU1
95+
# CHECK-NEXT: [7] - JLAGU
96+
# CHECK-NEXT: [8] - JMul
97+
# CHECK-NEXT: [9] - JSAGU
98+
# CHECK-NEXT: [10] - JSTC
99+
# CHECK-NEXT: [11] - JVALU0
100+
# CHECK-NEXT: [12] - JVALU1
101+
# CHECK-NEXT: [13] - JVIMUL
102+
103+
# CHECK: Resource pressure per iteration:
104+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
105+
# CHECK-NEXT: 1.00 1.00 - - - - - - 1.00 - - - - -
106+
107+
# CHECK: Resource pressure by instruction:
108+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
109+
# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %esi, %eax
110+
# CHECK-NEXT: 1.00 - - - - - - - - - - - - - leal 42(%rdi), %eax
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# RUN: not llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 %s 2>&1 | FileCheck %s
2+
3+
# LLVM-MCA-BEGIN foo
4+
add %eax, %eax
5+
# LLVM-MCA-BEGIN foo
6+
add %eax, %eax
7+
8+
# CHECK: llvm-mca-markers-11.s:5:2: error: overlapping regions cannot have the same name
9+
# CHECK-NEXT: # LLVM-MCA-BEGIN foo
10+
# CHECK-NEXT: ^
11+
# CHECK-NEXT: llvm-mca-markers-11.s:3:2: note: region foo was previously defined here
12+
# CHECK-NEXT: # LLVM-MCA-BEGIN foo
13+
# CHECK-NEXT: ^
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# RUN: not llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 %s 2>&1 | FileCheck %s
2+
3+
# LLVM-MCA-BEGIN
4+
add %eax, %eax
5+
# LLVM-MCA-BEGIN
6+
add %eax, %eax
7+
8+
# CHECK: llvm-mca-markers-12.s:5:2: error: found multiple overlapping anonymous regions
9+
# CHECK-NEXT: # LLVM-MCA-BEGIN
10+
# CHECK-NEXT: ^
11+
# CHECK-NEXT: llvm-mca-markers-12.s:3:2: note: Previous anonymous region was defined here
12+
# CHECK-NEXT: # LLVM-MCA-BEGIN
13+
# CHECK-NEXT: ^

llvm/test/tools/llvm-mca/X86/llvm-mca-markers-6.s

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@
66

77
# LLVM-MCA-END
88

9-
# CHECK: llvm-mca-markers-6.s:5:2: warning: Ignoring invalid region start
10-
# CHECK-NEXT: # LLVM-MCA-BEGIN bar
9+
# CHECK: llvm-mca-markers-6.s:7:2: error: found an invalid region end directive
10+
# CHECK-NEXT: # LLVM-MCA-END
11+
# CHECK-NEXT: ^
12+
# CHECK-NEXT: llvm-mca-markers-6.s:7:2: note: unable to find an active anonymous region
13+
# CHECK-NEXT: # LLVM-MCA-END
1114
# CHECK-NEXT: ^
12-
# CHECK-NEXT: error: no assembly instructions found.

llvm/test/tools/llvm-mca/X86/llvm-mca-markers-7.s

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66

77
# LLVM-MCA-END
88

9-
# CHECK: llvm-mca-markers-7.s:7:2: warning: Ignoring invalid region end
9+
# CHECK: llvm-mca-markers-7.s:7:2: error: found an invalid region end directive
10+
# CHECK-NEXT: # LLVM-MCA-END
11+
# CHECK-NEXT: ^
12+
# CHECK-NEXT: llvm-mca-markers-7.s:7:2: note: unable to find an active anonymous region
1013
# CHECK-NEXT: # LLVM-MCA-END
1114
# CHECK-NEXT: ^
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# RUN: not llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 %s 2>&1 | FileCheck %s
2+
3+
# LLVM-MCA-END foo
4+
5+
# CHECK: llvm-mca-markers-8.s:3:2: error: found an invalid region end directive
6+
# CHECK-NEXT: # LLVM-MCA-END foo
7+
# CHECK-NEXT: ^
8+
# CHECK-NEXT: llvm-mca-markers-8.s:3:2: note: unable to find an active region named foo
9+
# CHECK-NEXT: # LLVM-MCA-END foo
10+
# CHECK-NEXT: ^
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
2+
# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 < %s | FileCheck %s
3+
4+
testloop:
5+
# LLVM-MCA-BEGIN outer
6+
leal 42(%rdi), %eax
7+
# LLVM-MCA-BEGIN inner
8+
imull %esi, %eax
9+
# LLVM-MCA-END inner
10+
leal 42(%rdi), %eax
11+
# LLVM-MCA-END outer
12+
imull %esi, %eax
13+
14+
# CHECK: [0] Code Region - outer
15+
16+
# CHECK: Iterations: 100
17+
# CHECK-NEXT: Instructions: 300
18+
# CHECK-NEXT: Total Cycles: 205
19+
# CHECK-NEXT: Total uOps: 400
20+
21+
# CHECK: Dispatch Width: 2
22+
# CHECK-NEXT: uOps Per Cycle: 1.95
23+
# CHECK-NEXT: IPC: 1.46
24+
# CHECK-NEXT: Block RThroughput: 2.0
25+
26+
# CHECK: Instruction Info:
27+
# CHECK-NEXT: [1]: #uOps
28+
# CHECK-NEXT: [2]: Latency
29+
# CHECK-NEXT: [3]: RThroughput
30+
# CHECK-NEXT: [4]: MayLoad
31+
# CHECK-NEXT: [5]: MayStore
32+
# CHECK-NEXT: [6]: HasSideEffects (U)
33+
34+
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
35+
# CHECK-NEXT: 1 1 0.50 leal 42(%rdi), %eax
36+
# CHECK-NEXT: 2 3 1.00 imull %esi, %eax
37+
# CHECK-NEXT: 1 1 0.50 leal 42(%rdi), %eax
38+
39+
# CHECK: Resources:
40+
# CHECK-NEXT: [0] - JALU0
41+
# CHECK-NEXT: [1] - JALU1
42+
# CHECK-NEXT: [2] - JDiv
43+
# CHECK-NEXT: [3] - JFPA
44+
# CHECK-NEXT: [4] - JFPM
45+
# CHECK-NEXT: [5] - JFPU0
46+
# CHECK-NEXT: [6] - JFPU1
47+
# CHECK-NEXT: [7] - JLAGU
48+
# CHECK-NEXT: [8] - JMul
49+
# CHECK-NEXT: [9] - JSAGU
50+
# CHECK-NEXT: [10] - JSTC
51+
# CHECK-NEXT: [11] - JVALU0
52+
# CHECK-NEXT: [12] - JVALU1
53+
# CHECK-NEXT: [13] - JVIMUL
54+
55+
# CHECK: Resource pressure per iteration:
56+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
57+
# CHECK-NEXT: 1.00 2.00 - - - - - - 1.00 - - - - -
58+
59+
# CHECK: Resource pressure by instruction:
60+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
61+
# CHECK-NEXT: - 1.00 - - - - - - - - - - - - leal 42(%rdi), %eax
62+
# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %esi, %eax
63+
# CHECK-NEXT: 1.00 - - - - - - - - - - - - - leal 42(%rdi), %eax
64+
65+
# CHECK: [1] Code Region - inner
66+
67+
# CHECK: Iterations: 100
68+
# CHECK-NEXT: Instructions: 100
69+
# CHECK-NEXT: Total Cycles: 303
70+
# CHECK-NEXT: Total uOps: 200
71+
72+
# CHECK: Dispatch Width: 2
73+
# CHECK-NEXT: uOps Per Cycle: 0.66
74+
# CHECK-NEXT: IPC: 0.33
75+
# CHECK-NEXT: Block RThroughput: 1.0
76+
77+
# CHECK: Instruction Info:
78+
# CHECK-NEXT: [1]: #uOps
79+
# CHECK-NEXT: [2]: Latency
80+
# CHECK-NEXT: [3]: RThroughput
81+
# CHECK-NEXT: [4]: MayLoad
82+
# CHECK-NEXT: [5]: MayStore
83+
# CHECK-NEXT: [6]: HasSideEffects (U)
84+
85+
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
86+
# CHECK-NEXT: 2 3 1.00 imull %esi, %eax
87+
88+
# CHECK: Resources:
89+
# CHECK-NEXT: [0] - JALU0
90+
# CHECK-NEXT: [1] - JALU1
91+
# CHECK-NEXT: [2] - JDiv
92+
# CHECK-NEXT: [3] - JFPA
93+
# CHECK-NEXT: [4] - JFPM
94+
# CHECK-NEXT: [5] - JFPU0
95+
# CHECK-NEXT: [6] - JFPU1
96+
# CHECK-NEXT: [7] - JLAGU
97+
# CHECK-NEXT: [8] - JMul
98+
# CHECK-NEXT: [9] - JSAGU
99+
# CHECK-NEXT: [10] - JSTC
100+
# CHECK-NEXT: [11] - JVALU0
101+
# CHECK-NEXT: [12] - JVALU1
102+
# CHECK-NEXT: [13] - JVIMUL
103+
104+
# CHECK: Resource pressure per iteration:
105+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
106+
# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - -
107+
108+
# CHECK: Resource pressure by instruction:
109+
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
110+
# CHECK-NEXT: - 1.00 - - - - - - 1.00 - - - - - imull %esi, %eax

0 commit comments

Comments
 (0)