@@ -103,6 +103,13 @@ significant modes are:
103
103
driver is run with the flag ` -wmo ` , ` -whole-module-optimization ` or
104
104
` -force-single-frontend-invocation ` (all these options are synonymous).
105
105
106
+ - ** Batch** vs. ** single-file** primary-file mode. This distinction refines
107
+ the behaviour of primary-file mode, with the new batch mode added in the
108
+ Swift 4.2 release cycle. Batching eliminates much of the overhead of
109
+ primary-file mode, and will eventually become the default way of running
110
+ primary-file mode, but until that time it is explicitly enabled by passing
111
+ the ` -enable-batch-mode ` flag.
112
+
106
113
- ** Optimizing** vs. ** non-optimizing** : this varies depending on whether the
107
114
driver (and thus each frontend) is run with the flags ` -O ` , ` -Osize ` , or
108
115
` -Ounchecked ` (each of which turn on one or more sets of optimizations), or
@@ -120,39 +127,60 @@ But these parameters can be varied independently and the compiler will spend its
120
127
time very differently depending on their settings, so it's worth understanding
121
128
both dimensions in a bit more detail.
122
129
123
- #### Primary-file vs. WMO
130
+ #### Primary-file (with and without batching) vs. WMO
124
131
125
132
This is the most significant variable in how the compiler behaves, so it's worth
126
133
getting perfectly clear:
127
134
128
- - In ** primary-file mode** , the driver runs _ one frontend job per file_ in the
129
- module, merging the results when all the frontends finish. Each frontend job
130
- itself reads _ all_ the files in the module, and focuses on one _ primary_
131
- file among the set it read, which it compiles, lazily analyzing other
132
- referenced definitions from the module as needed.
135
+ - In ** primary-file mode** , the driver divides the work it has to do between
136
+ multiple frontend processes, emitting partial results and merging those
137
+ results when all the frontends finish. Each frontend job itself reads _ all_
138
+ the files in the module, and focuses on one or more _ primary_ file(s) among
139
+ the set it read, which it compiles, lazily analyzing other referenced
140
+ definitions from the module as needed.
141
+ This mode has two sub-modes:
142
+
143
+ - In the ** single-file** sub-mode, it runs _ one frontend job per file_ , with
144
+ each job having a single primary.
145
+
146
+ - In the ** batch** sub-mode, it runs _ one frontend job per CPU_ , identifying an
147
+ equal-sized "batch" of the module's files as primaries.
133
148
134
149
- In ** whole-module optimization (WMO) mode** , the driver runs one frontend
135
150
job for the entire module, no matter what. That frontend reads all the files
136
151
in the module _ once_ and compiles them all at once.
137
152
138
- For example: if your module has 100 files in it, running ` swiftc *.swift ` will
139
- run 100 frontend subprocesses, each of which will parse all 100 inputs (for a
140
- total of 10,000 parses), and then each subprocess will ( in parallel) compile the
141
- definitions in its single primary file. In contrast, running `swiftc -wmo
142
- * .swift` will run _ one _ frontend subprocess, which then reads all 100 files
143
- _ once _ and compiles the definitions in all of them, in order (serially) .
153
+ For example: if your module has 100 files in it:
154
+
155
+ - Running ` swiftc *.swift ` will compile in ** single-file mode ** , and will thus
156
+ run 100 frontend subprocesses, each of which will parse all 100 inputs (for
157
+ a total of 10,000 parses), and then each subprocess will (in parallel)
158
+ compile the definitions in its single primary file .
144
159
145
- Why do both modes exist? Because they have different strengths and weaknesses;
160
+ - Running ` swiftc -enable-batch-mode *.swift ` will compile in ** batch** mode,
161
+ and on a system with 4 CPUs will run 4 frontend subprocesses, each of which
162
+ will parse all 100 inputs (for a total of 400 parses), and then each subprocess
163
+ will (in parallel) compile the definitions of 25 primary files (one quarter
164
+ of the module in each process).
165
+
166
+ - Running ` swiftc -wmo *.swift ` will compile in ** whole-module** mode,
167
+ and will thus run _ one_ frontend subprocess, which then reads all 100 files
168
+ _ once_ (for a total of 100 parses) and compiles the definitions in all of them,
169
+ in order (serially).
170
+
171
+ Why do multiple modes exist? Because they have different strengths and weaknesses;
146
172
neither is perfect:
147
173
148
- - Primary-file mode's advantages are that the driver can do incremental
149
- compilation by only running frontends for files that it thinks are out of
150
- date, as well as running multiple frontend jobs at the same time , making use
174
+ - Primary-file mode's advantages are that the driver can do ** incremental
175
+ compilation** by only running frontends for files that it thinks are out of
176
+ date, as well as running multiple frontend jobs ** in parallel ** , making use
151
177
of multiple cores. Its disadvantage is that each frontend job has to read
152
- _ all the source files_ in the module before focusing on its primary-file of
178
+ _ all the source files_ in the module before focusing on its primary-files of
153
179
interest, which means that a _ portion_ of the frontend job's work is being
154
- done _ quadratically_ . Usually this portion is relatively small and fast, but
155
- because it's quadratic, it can easily go wrong.
180
+ done _ quadratically_ in the number of jobs. Usually this portion is relatively
181
+ small and fast, but because it's quadratic, it can easily go wrong. The addition
182
+ of ** batch mode** was specifically to eliminate this quadratic increase in
183
+ early work.
156
184
157
185
- WMO mode's advantages are that it can do certain optimizations that only
158
186
work when they are sure they're looking at the entire module, and it avoids
@@ -161,13 +189,17 @@ neither is perfect:
161
189
parallelism worse (at least before LLVM IR code-generation, which is always
162
190
multithreaded).
163
191
164
- Many people get confused by the word ` optimization ` in the option name
165
- ` -whole-module-optimization ` , and assume the option has only to do with enabling
166
- "very aggressive" optimizations. It does enable such optimizations, but it also
167
- _ significantly changes_ the way the compiler runs, so much so that some other
168
- people have taken to running the compiler in an unsupported (and somewhat
169
- unfortunate) hybrid compilation mode ` -wmo -Onone ` , which combines
170
- non-optimizing compilation with whole-module compilation.
192
+ Whole-module mode does enable a set of optimizations that are not possible when
193
+ compiling in primary-file mode. In particular, in modules with a lot of private
194
+ dead code, whole-module mode can eliminate the dead code earlier and avoid
195
+ needless work compiling it, making for both smaller output and faster compilation.
196
+
197
+ It is therefore possible that, in certain cases (such as with limited available
198
+ parallelism / many modules built in parallel), building in whole-module mode
199
+ with optimization disabled can complete in less time than batched primary-file
200
+ mode. This scenario depends on many factors seldom gives a significant advantage,
201
+ and since using it trades-away support for incremental compilation entirely, it
202
+ is not a recommended configuration.
171
203
172
204
#### Amount of optimization
173
205
@@ -269,7 +301,7 @@ definitions than it should.
269
301
Swift compilation performance varies _ significantly_ by at least the following
270
302
parameters:
271
303
272
- - WMO vs. primary-file (non-WMO) mode
304
+ - WMO vs. primary-file (non-WMO) mode, including batching thereof
273
305
- Optimizing vs. non-optimizing mode
274
306
- Quantity of incremental work avoided (if in non-WMO)
275
307
- Quantity of external definitions lazily loaded
@@ -288,7 +320,6 @@ problem you're seeing to some of the existing strategies and plans for
288
320
improvement:
289
321
290
322
- Incremental mode is over-approximate, runs too many subprocesses.
291
- - Name resolution is over-eager, deserializes too many definitions.
292
323
- Too many referenced (non-primary-file) definitions are type-checked beyond
293
324
the point they need to be, during the quadratic phase.
294
325
- Expression type inference solves constraints inefficiently, and can
@@ -516,20 +547,22 @@ compilers on hand while you're working.
516
547
early investigation to see which file in a primary-file-mode compilation is
517
548
taking the majority of time, or is taking more or less time than when
518
549
comparing compilations. Its output looks like this:
550
+
519
551
```
520
552
===-------------------------------------------------------------------------===
521
- Driver Time Compilation
553
+ Driver Compilation Time
522
554
===-------------------------------------------------------------------------===
523
- Total Execution Time: 0.0002 seconds (1.3390 wall clock)
555
+ Total Execution Time: 0.0001 seconds (0.0490 wall clock)
524
556
525
- ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
526
- 0.0000 ( 87.2 %) 0.0001 ( 58.7 %) 0.0001 ( 67.5 %) 1.0983 ( 82 .0%) compile t. swift
527
- 0.0000 ( 12.8 %) 0.0000 ( 41.3 %) 0.0000 ( 32.5 %) 0.2407 ( 18 .0%) link t.swift
528
- 0.0000 (100.0%) 0.0001 (100.0%) 0.0002 (100.0%) 1.3390 (100.0%) Total
557
+ ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
558
+ 0.0000 ( 82.0 %) 0.0001 ( 59.5 %) 0.0001 ( 69.0 %) 0.0284 ( 58 .0%) { compile: t-177627.o <= t. swift}
559
+ 0.0000 ( 18.0 %) 0.0000 ( 40.5 %) 0.0000 ( 31.0 %) 0.0206 ( 42 .0%) { link: t <= t-177627.o}
560
+ 0.0001 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) 0.0490 (100.0%) Total
529
561
```
530
562
531
563
- `-Xfrontend -debug-time-compilation`: asks each frontend to print out timers
532
564
for each phase of its execution. Its output (per-frontend) looks like this:
565
+
533
566
```
534
567
===-------------------------------------------------------------------------===
535
568
Swift compilation
@@ -556,6 +589,7 @@ compilers on hand while you're working.
556
589
taken. The output is therefore voluminous, but can help when reducing a
557
590
testcase to the "one bad function" that causes it. The output looks like
558
591
this:
592
+
559
593
```
560
594
9.16ms test.swift:15:6 func find<R>(_ range: R, value: R.Element) -> R where R : IteratorProtocol, R.Element : Eq
561
595
0.28ms test.swift:27:6 func findIf<R>(_ range: R, predicate: (R.Element) -> Bool) -> R where R : IteratorProtocol
@@ -568,6 +602,7 @@ compilers on hand while you're working.
568
602
`-debug-time-function-bodies`, but prints a separate timer for _every
569
603
expression_ in the program, much more detail than just the functions. The
570
604
output looks like this:
605
+
571
606
```
572
607
0.20ms test.swift:17:16
573
608
1.82ms test.swift:18:12
@@ -582,6 +617,7 @@ compilers on hand while you're working.
582
617
frontend, printing them out when the frontend exits. By default, most
583
618
statistics are enabled only in assert builds, so in a release build this
584
619
option will do nothing. In an assert build, its output will look like this:
620
+
585
621
```
586
622
===-------------------------------------------------------------------------===
587
623
... Statistics Collected ...
@@ -612,6 +648,7 @@ compilers on hand while you're working.
612
648
AST reader, which is operated as a subsystem fo the swift compiler when
613
649
importing definitions from C/ObjC. Its output is added to the end of
614
650
whatever output comes from `-print-stats`, and looks like this:
651
+
615
652
```
616
653
*** AST File Statistics:
617
654
1/194 source location entries read (0.515464%)
@@ -630,6 +667,7 @@ compilers on hand while you're working.
630
667
- `-Xfrontend -print-stats -Xfrontend -print-inst-counts`: an extended form of
631
668
`-print-stats` that activates a separate statistic counter for every kind of
632
669
SIL instruction generated during compilation. Its output looks like this:
670
+
633
671
```
634
672
...
635
673
163 sil-instcount - Number of AllocStackInst
0 commit comments