Skip to content

Commit 16a13c8

Browse files
authored
Merge pull request swiftlang#17455 from graydon/batch-mode-docs-update
2 parents 12bb4ef + 19eab4f commit 16a13c8

File tree

1 file changed

+72
-34
lines changed

1 file changed

+72
-34
lines changed

docs/CompilerPerformance.md

Lines changed: 72 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,13 @@ significant modes are:
103103
driver is run with the flag `-wmo`, `-whole-module-optimization` or
104104
`-force-single-frontend-invocation` (all these options are synonymous).
105105

106+
- **Batch** vs. **single-file** primary-file mode. This distinction refines
107+
the behaviour of primary-file mode, with the new batch mode added in the
108+
Swift 4.2 release cycle. Batching eliminates much of the overhead of
109+
primary-file mode, and will eventually become the default way of running
110+
primary-file mode, but until that time it is explicitly enabled by passing
111+
the `-enable-batch-mode` flag.
112+
106113
- **Optimizing** vs. **non-optimizing**: this varies depending on whether the
107114
driver (and thus each frontend) is run with the flags `-O`, `-Osize`, or
108115
`-Ounchecked` (each of which turn on one or more sets of optimizations), or
@@ -120,39 +127,60 @@ But these parameters can be varied independently and the compiler will spend its
120127
time very differently depending on their settings, so it's worth understanding
121128
both dimensions in a bit more detail.
122129

123-
#### Primary-file vs. WMO
130+
#### Primary-file (with and without batching) vs. WMO
124131

125132
This is the most significant variable in how the compiler behaves, so it's worth
126133
getting perfectly clear:
127134

128-
- In **primary-file mode**, the driver runs _one frontend job per file_ in the
129-
module, merging the results when all the frontends finish. Each frontend job
130-
itself reads _all_ the files in the module, and focuses on one _primary_
131-
file among the set it read, which it compiles, lazily analyzing other
132-
referenced definitions from the module as needed.
135+
- In **primary-file mode**, the driver divides the work it has to do between
136+
multiple frontend processes, emitting partial results and merging those
137+
results when all the frontends finish. Each frontend job itself reads _all_
138+
the files in the module, and focuses on one or more _primary_ file(s) among
139+
the set it read, which it compiles, lazily analyzing other referenced
140+
definitions from the module as needed.
141+
This mode has two sub-modes:
142+
143+
- In the **single-file** sub-mode, it runs _one frontend job per file_, with
144+
each job having a single primary.
145+
146+
- In the **batch** sub-mode, it runs _one frontend job per CPU_, identifying an
147+
equal-sized "batch" of the module's files as primaries.
133148

134149
- In **whole-module optimization (WMO) mode**, the driver runs one frontend
135150
job for the entire module, no matter what. That frontend reads all the files
136151
in the module _once_ and compiles them all at once.
137152

138-
For example: if your module has 100 files in it, running `swiftc *.swift` will
139-
run 100 frontend subprocesses, each of which will parse all 100 inputs (for a
140-
total of 10,000 parses), and then each subprocess will (in parallel) compile the
141-
definitions in its single primary file. In contrast, running `swiftc -wmo
142-
*.swift` will run _one_ frontend subprocess, which then reads all 100 files
143-
_once_ and compiles the definitions in all of them, in order (serially).
153+
For example: if your module has 100 files in it:
154+
155+
- Running `swiftc *.swift` will compile in **single-file mode**, and will thus
156+
run 100 frontend subprocesses, each of which will parse all 100 inputs (for
157+
a total of 10,000 parses), and then each subprocess will (in parallel)
158+
compile the definitions in its single primary file.
144159

145-
Why do both modes exist? Because they have different strengths and weaknesses;
160+
- Running `swiftc -enable-batch-mode *.swift` will compile in **batch** mode,
161+
and on a system with 4 CPUs will run 4 frontend subprocesses, each of which
162+
will parse all 100 inputs (for a total of 400 parses), and then each subprocess
163+
will (in parallel) compile the definitions of 25 primary files (one quarter
164+
of the module in each process).
165+
166+
- Running `swiftc -wmo *.swift` will compile in **whole-module** mode,
167+
and will thus run _one_ frontend subprocess, which then reads all 100 files
168+
_once_ (for a total of 100 parses) and compiles the definitions in all of them,
169+
in order (serially).
170+
171+
Why do multiple modes exist? Because they have different strengths and weaknesses;
146172
neither is perfect:
147173

148-
- Primary-file mode's advantages are that the driver can do incremental
149-
compilation by only running frontends for files that it thinks are out of
150-
date, as well as running multiple frontend jobs at the same time, making use
174+
- Primary-file mode's advantages are that the driver can do **incremental
175+
compilation** by only running frontends for files that it thinks are out of
176+
date, as well as running multiple frontend jobs **in parallel**, making use
151177
of multiple cores. Its disadvantage is that each frontend job has to read
152-
_all the source files_ in the module before focusing on its primary-file of
178+
_all the source files_ in the module before focusing on its primary-files of
153179
interest, which means that a _portion_ of the frontend job's work is being
154-
done _quadratically_. Usually this portion is relatively small and fast, but
155-
because it's quadratic, it can easily go wrong.
180+
done _quadratically_ in the number of jobs. Usually this portion is relatively
181+
small and fast, but because it's quadratic, it can easily go wrong. The addition
182+
of **batch mode** was specifically to eliminate this quadratic increase in
183+
early work.
156184

157185
- WMO mode's advantages are that it can do certain optimizations that only
158186
work when they are sure they're looking at the entire module, and it avoids
@@ -161,13 +189,17 @@ neither is perfect:
161189
parallelism worse (at least before LLVM IR code-generation, which is always
162190
multithreaded).
163191

164-
Many people get confused by the word `optimization` in the option name
165-
`-whole-module-optimization`, and assume the option has only to do with enabling
166-
"very aggressive" optimizations. It does enable such optimizations, but it also
167-
_significantly changes_ the way the compiler runs, so much so that some other
168-
people have taken to running the compiler in an unsupported (and somewhat
169-
unfortunate) hybrid compilation mode `-wmo -Onone`, which combines
170-
non-optimizing compilation with whole-module compilation.
192+
Whole-module mode does enable a set of optimizations that are not possible when
193+
compiling in primary-file mode. In particular, in modules with a lot of private
194+
dead code, whole-module mode can eliminate the dead code earlier and avoid
195+
needless work compiling it, making for both smaller output and faster compilation.
196+
197+
It is therefore possible that, in certain cases (such as with limited available
198+
parallelism / many modules built in parallel), building in whole-module mode
199+
with optimization disabled can complete in less time than batched primary-file
200+
mode. This scenario depends on many factors seldom gives a significant advantage,
201+
and since using it trades-away support for incremental compilation entirely, it
202+
is not a recommended configuration.
171203

172204
#### Amount of optimization
173205

@@ -269,7 +301,7 @@ definitions than it should.
269301
Swift compilation performance varies _significantly_ by at least the following
270302
parameters:
271303

272-
- WMO vs. primary-file (non-WMO) mode
304+
- WMO vs. primary-file (non-WMO) mode, including batching thereof
273305
- Optimizing vs. non-optimizing mode
274306
- Quantity of incremental work avoided (if in non-WMO)
275307
- Quantity of external definitions lazily loaded
@@ -288,7 +320,6 @@ problem you're seeing to some of the existing strategies and plans for
288320
improvement:
289321

290322
- Incremental mode is over-approximate, runs too many subprocesses.
291-
- Name resolution is over-eager, deserializes too many definitions.
292323
- Too many referenced (non-primary-file) definitions are type-checked beyond
293324
the point they need to be, during the quadratic phase.
294325
- Expression type inference solves constraints inefficiently, and can
@@ -516,20 +547,22 @@ compilers on hand while you're working.
516547
early investigation to see which file in a primary-file-mode compilation is
517548
taking the majority of time, or is taking more or less time than when
518549
comparing compilations. Its output looks like this:
550+
519551
```
520552
===-------------------------------------------------------------------------===
521-
Driver Time Compilation
553+
Driver Compilation Time
522554
===-------------------------------------------------------------------------===
523-
Total Execution Time: 0.0002 seconds (1.3390 wall clock)
555+
Total Execution Time: 0.0001 seconds (0.0490 wall clock)
524556
525-
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
526-
0.0000 ( 87.2%) 0.0001 ( 58.7%) 0.0001 ( 67.5%) 1.0983 ( 82.0%) compile t.swift
527-
0.0000 ( 12.8%) 0.0000 ( 41.3%) 0.0000 ( 32.5%) 0.2407 ( 18.0%) link t.swift
528-
0.0000 (100.0%) 0.0001 (100.0%) 0.0002 (100.0%) 1.3390 (100.0%) Total
557+
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
558+
0.0000 ( 82.0%) 0.0001 ( 59.5%) 0.0001 ( 69.0%) 0.0284 ( 58.0%) {compile: t-177627.o <= t.swift}
559+
0.0000 ( 18.0%) 0.0000 ( 40.5%) 0.0000 ( 31.0%) 0.0206 ( 42.0%) {link: t <= t-177627.o}
560+
0.0001 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) 0.0490 (100.0%) Total
529561
```
530562
531563
- `-Xfrontend -debug-time-compilation`: asks each frontend to print out timers
532564
for each phase of its execution. Its output (per-frontend) looks like this:
565+
533566
```
534567
===-------------------------------------------------------------------------===
535568
Swift compilation
@@ -556,6 +589,7 @@ compilers on hand while you're working.
556589
taken. The output is therefore voluminous, but can help when reducing a
557590
testcase to the "one bad function" that causes it. The output looks like
558591
this:
592+
559593
```
560594
9.16ms test.swift:15:6 func find<R>(_ range: R, value: R.Element) -> R where R : IteratorProtocol, R.Element : Eq
561595
0.28ms test.swift:27:6 func findIf<R>(_ range: R, predicate: (R.Element) -> Bool) -> R where R : IteratorProtocol
@@ -568,6 +602,7 @@ compilers on hand while you're working.
568602
`-debug-time-function-bodies`, but prints a separate timer for _every
569603
expression_ in the program, much more detail than just the functions. The
570604
output looks like this:
605+
571606
```
572607
0.20ms test.swift:17:16
573608
1.82ms test.swift:18:12
@@ -582,6 +617,7 @@ compilers on hand while you're working.
582617
frontend, printing them out when the frontend exits. By default, most
583618
statistics are enabled only in assert builds, so in a release build this
584619
option will do nothing. In an assert build, its output will look like this:
620+
585621
```
586622
===-------------------------------------------------------------------------===
587623
... Statistics Collected ...
@@ -612,6 +648,7 @@ compilers on hand while you're working.
612648
AST reader, which is operated as a subsystem fo the swift compiler when
613649
importing definitions from C/ObjC. Its output is added to the end of
614650
whatever output comes from `-print-stats`, and looks like this:
651+
615652
```
616653
*** AST File Statistics:
617654
1/194 source location entries read (0.515464%)
@@ -630,6 +667,7 @@ compilers on hand while you're working.
630667
- `-Xfrontend -print-stats -Xfrontend -print-inst-counts`: an extended form of
631668
`-print-stats` that activates a separate statistic counter for every kind of
632669
SIL instruction generated during compilation. Its output looks like this:
670+
633671
```
634672
...
635673
163 sil-instcount - Number of AllocStackInst

0 commit comments

Comments
 (0)