Support Python 3 in the benchmark suite #30085

broadwaylamb · 2020-02-26T23:57:12Z

I've tested this patch with both Python 2 and 3, and it seems to work as expected.

compnerd · 2020-02-27T04:50:20Z

@swift-ci please benchmark

palimondo · 2020-02-27T05:01:53Z

Could we please hold on this for a moment? It builds on #29719 which I’d like to back out in order to land #26462 first.

swift-ci · 2020-02-27T05:19:05Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	2717	3257	+19.9%	0.83x (?)
FlattenListFlatMap	5259	5949	+13.1%	0.88x (?)

Improvement	OLD	NEW	DELTA	RATIO
ArrayPlusEqualSingleElementCollection	470	423	-10.0%	1.11x (?)

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

broadwaylamb · 2020-02-27T09:35:26Z

Could we please hold on this for a moment? It builds on #29719 which I’d like to back out in order to land #26462 first.

Sure! I’ll try to do the same for other Python scripts for the time being.

gottesmm · 2020-02-27T18:46:57Z

@palimondo I don't think it is reasonable to block support for python 2 -> python 3 b/c of an out of tree patch that will not be promptly landed. The last substantive comment on that other PR is from Nov and the patch stream is from Sept. It is almost March, that is too long to stop progress here IMO, especially since python 2 is EOL and swift needs to finish transitioning ASAP.

compnerd · 2020-02-27T18:50:30Z

@palimondo, the measurement improvements are extremely valuable and look like a move in the right direction. I think that the python2 -> python3 conversion is extremely important as well though. python2 has been EOL'ed and there are other places where the conversion is absolutely critical (e.g. lldb requires python3 support, but parts of the build are still python2 dependent). I realize that you have been working on the patch set for quite some time, and it does seem like it is nearing where you want it to be. Perhaps you could split the work up such that some of the earlier pieces can be rebased and merged into the tree standalone to allow both sides to make progress? I believe that we need to continue to make progress here though in order to keep both the build and the debugging scenarios healthy.

compnerd · 2020-02-27T19:02:18Z

@swift-ci please smoke test

compnerd · 2020-02-27T19:03:21Z

@swift-ci please test Windows platform

palimondo · 2020-02-27T19:42:44Z

@gottesmm @compnerd How about you give me this weekend to finish the patch?

gottesmm · 2020-02-27T21:14:15Z

@palimondo How big of a change are we talking about? We depend on this infrastructure, so we can't just throw things in. We need to be careful.

gottesmm · 2020-02-27T21:17:34Z

@eeckstein I noticed that you are the original reviewer of this code. What are your thoughts?

palimondo · 2020-02-27T21:28:11Z

@gottesmm did you mean to ask @eeckstein the above in #26462?

broadwaylamb · 2020-02-27T21:30:32Z

The tests have failed because the linter complained. I'll update the patch.

broadwaylamb · 2020-02-27T22:43:57Z

It should be fixed now, could anyone please trigger the CI again?

compnerd · 2020-02-27T22:49:34Z

@swift-ci please smoke test

broadwaylamb · 2020-02-27T22:58:01Z

benchmark/scripts/Benchmark_Driver

-        runtime = min(
-            [
-                (result.samples.min - correction)
-                for i_series in [
-                    BenchmarkDoctor._select(measurements, num_iters=i)
-                    for correction in [(setup / i) for i in [1, 2]]
-                ]
-                for result in i_series
-            ]
-        )
+
+        runtimes = []
+        for i in range(1, 3):
+            correction = setup / i
+            i_series = BenchmarkDoctor._select(measurements, num_iters=i)
+            for result in i_series:
+                runtimes.append(result.samples.min - correction)
+        runtime = min(runtimes)


I'd like whoever will review this to pay attention at this particular change.

The old version relied on Python 2-only feature where the variables defined in a list comprehension leak to the function scope. But I believe this was erroneous: since in Python 2 list comprehensions are eagerly evaluated, the value of i was always 2 when evaluating the BenchmarkDoctor._select(measurements, num_iters=i) expression. Similarly, when evaluating (result.samples.min - correction), the value of correction was always the same, namely, the last value it took while evaluating this expression:

[ BenchmarkDoctor._select(measurements, num_iters=i) for correction in [(setup / i) for i in [1, 2]] ]

Here, I've rewritten it to be a stupid old for loop, which IMHO reads way better, and also seems to behave correctly.

Hmmm… I’ll need to check this.

I’ve corrected the Python3 incompatibility by explicitly propagating the variables from the nested scopes. I don’t see any issue of incorrect evaluation. Here’s standalone simplified test case base on the original source code before the automatic formatter made mess of it in #29719:

$ python Python 2.7.10 (default, Feb 22 2019, 21:55:15) [GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> setup = 134 >>> measurements = {1: [[324], [326], [325]], 2: [[257], [259], [258]]} >>> >>> >>> def select(measurements, num_iters): ... return measurements[num_iters] ... >>> >>> runtimesP3 = [ ... (result[0] - correction) for correction, i_series in ... [(correction, select(measurements, num_iters=i)) ... for i, correction in [(i, setup // i) for i in [1, 2]] ... ] for result in i_series] >>> >>> runtimes = [ ... (result[0] - correction) for i_series in ... [select(measurements, num_iters=i) ... for correction in [(setup / i) for i in [1, 2]] ... ] for result in i_series] >>> >>> runtimes == runtimesP3 True >>> runtimesP3 [190, 192, 191, 190, 192, 191]

It works correctly in Python 3, too:

$ python3 Python 3.6.5 (default, May 24 2018, 10:57:11) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> setup = 134 >>> measurements = {1: [[324], [326], [325]], 2: [[257], [259], [258]]} >>> >>> >>> def select(measurements, num_iters): ... return measurements[num_iters] ... >>> >>> runtimesP3 = [ ... (result[0] - correction) for correction, i_series in ... [(correction, select(measurements, num_iters=i)) ... for i, correction in [(i, setup // i) for i in [1, 2]] ... ] for result in i_series] >>> >>> runtimesP3 [190, 192, 191, 190, 192, 191]

I’ll try to incorporate the Python 3 changes in #26462 (it already had fix for the assertEqualtats typo...

broadwaylamb · 2020-02-27T23:01:36Z

benchmark/scripts/Benchmark_Driver

-        setup = min([result.setup for result in BenchmarkDoctor._select(measurements)])
+        setup = min(
+            [result.setup or 0 for result in BenchmarkDoctor._select(measurements)]
+        )


None compares less than anything in Python 2. In Python 3 None is not comparable at all, so I needed to provide a reasonable default value. 0 seems okay, since this function only has effect if setup is greater than 200000.

broadwaylamb · 2020-02-27T23:09:20Z

benchmark/scripts/Benchmark_DTrace.in

+        return DTraceResult(
+            test_name, int(not foundInstability), results, self.csv_output
+        )


DTraceResult's initializer takes exactly 4 arguments. I don't know why it worked before.

compnerd · 2020-02-28T01:37:34Z

@swift-ci please benchmark

swift-ci · 2020-02-28T02:45:21Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeFromNSArrayAnyObjectForced	4880	5420	+11.1%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	9367	8518	-9.1%	1.10x (?)

Code size: -O

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubFromNSStringRef	174	158	-9.2%	1.10x (?)
ObjectiveCBridgeFromNSArrayAnyObjectForced	5440	5080	-6.6%	1.07x (?)

Code size: -Osize

Performance: -Onone

Improvement	OLD	NEW	DELTA	RATIO
TypeFlood	217	199	-8.3%	1.09x (?)
DictionaryBridgeToObjC_Access	1077	1002	-7.0%	1.07x (?)

Code size: -swiftlibs

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

palimondo · 2020-03-01T14:06:39Z

@swift-ci test

palimondo · 2020-03-01T14:11:49Z

I’m porting these changes into my branch, but I got stuck a bit on the differences between open in Python 2 and 3 (the need to use encoding=‘utf-8’ on latter), when running test_compare_perf_test.py. Looks like there’s no elegant solution except explicit version checks… @broadwaylamb did you work around that in some other way?

broadwaylamb · 2020-03-01T14:24:38Z

@palimondo can you give an example, what issue are you experiencing? Why do you need to use encoding='utf-8'?

palimondo · 2020-03-01T14:44:37Z

@broadwaylamb The way I’m testing the Python 3 port as well as continued support for Python 2.7 is by adjusting the shebang between: #!/usr/bin/env python and #!/usr/bin/env python3 and running the unit tests. In this case test_compare_perf_tests.py. There is a read/write pair of open calls between compare_perf_test.py’s main function and execute_main_with_format on class Test_compare_perf_tests_main in test_compare_perf_tests.py. Switching to python3 with current code was causing encoding (UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 1064: ordinal not in range(128)) and decoding (UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1064: ordinal not in range(128)) issues.

So far I have this solution and was wondering if that’s the best it can get:

    if args.output:
        if sys.version_info < (3, 0):
            with open(args.output, 'w') as f:
                f.write(report)
        else:
            with open(args.output, 'w', encoding='utf-8') as f:
                f.write(report)

Other solutions I found on StackOverflow weren’t IMO much better.

eeckstein · 2020-03-02T09:33:18Z

@gottesmm I'm in favor of landing this change and not wait for #26462

broadwaylamb · 2020-03-02T10:31:21Z

@palimondo sorry, I haven't been able to reproduce this. It's all working for me with Python 3 even without encoding='utf-8'.

broadwaylamb · 2020-03-02T10:34:48Z

By the way, this PR doesn't change shebangs to use python3, the CI builders will still be using python.

This makes it easy to accidentally break Python 3 compatibility. What do you people think about switching the CI to use Python 3 for the benchmarking suite?

eeckstein · 2020-03-02T10:46:37Z

What do you people think about switching the CI to use Python 3 for the benchmarking suite?

@shahmishal Any thoughts on this?

palimondo · 2020-03-02T17:42:30Z

@eeckstein Is there a deadline for Python 3 compatibility you are trying to meet?
I think I could split #26462 in two parts:

the cleanup and simplification of benchmarking infrastructure (removal of Sample from PerformanceTestSamples etc.) now in PR [benchmark] Groundwork for Robust Measurements #30158
the changes to measurement methodology in [benchmark] Robust Measurements #26462 will be rebased on the previous commit

shahmishal · 2020-03-03T00:25:33Z

What do you people think about switching the CI to use Python 3 for the benchmarking suite?

@shahmishal Any thoughts on this?

I don't mind if you switch to using python3, however python3 will be only available with xcrun python3 on macOS nodes.

eeckstein · 2020-03-03T13:32:14Z

Can we do something like that inside the script (pseudocode):

if running on python2 on mac:
  invoke this script with 'xcrun python3'
  exit

With "script", I mean e.g. 'run_smoke_bench'

palimondo · 2020-03-03T18:25:54Z

Or… we could add rely on @swift-ci’s test command to catch the problems, if we modified test/benchmark/benchmark-scripts.test-sh from

// RUN: %{python} -m unittest discover -s %swift_src_root/benchmark/scripts/

to

// RUN: %{xcrun python3} -m unittest discover -s %swift_src_root/benchmark/scripts/

broadwaylamb · 2020-03-04T10:14:41Z

Anyway, if we do this (switch to Python 3), it'd probably make sense to make this change in a separate PR so it can easily be reverted.

What's the plan here? Are we waiting for #30149 and #30158 to land first?

compnerd · 2020-03-04T22:53:52Z

@swift-ci please test Windows platform

palimondo · 2020-03-05T20:49:19Z

@compnerd Does the Windows test run the benchmarks? If so, what version is used to run those python scripts? (I assume that python isn’t shipped on Windows and you control the configuration?)

compnerd · 2020-03-05T23:18:39Z

@palimondo no, the current scripts don't run it - Windows uses Python 2.7 and Python 3.7.5 IIRC - those are the versions which are shipped as part of VS2019. Nearly the entirety of the build/test infrastructure is whatever is Visual Studio is shipping. Unfortunately, that becomes really slow, so the one special case is CMake which is 3.16.x rather than the Visual Studio version (3.15.x).

broadwaylamb · 2020-03-30T10:52:56Z

What would be the plan regarding this change?

compnerd · 2020-04-03T15:42:25Z

CC: @drexin I expect that this should be safe to merge, thoughts?

drexin · 2020-04-03T20:04:57Z

No objections from my end. @shahmishal Are you okay with merging this?

shahmishal · 2020-04-03T20:07:33Z

Thanks @broadwaylamb!

broadwaylamb force-pushed the benchmark-python-3 branch from 6c44eea to 1edd93c Compare February 26, 2020 23:59

broadwaylamb force-pushed the benchmark-python-3 branch from 1edd93c to dc8efca Compare February 27, 2020 22:42

Support Python 3 in the benchmark suite

cce9e81

broadwaylamb force-pushed the benchmark-python-3 branch from dc8efca to cce9e81 Compare February 27, 2020 22:45

broadwaylamb commented Feb 27, 2020

View reviewed changes

palimondo mentioned this pull request Feb 28, 2020

[benchmark] Robust Measurements #26462

Closed

palimondo mentioned this pull request Mar 2, 2020

[benchmark] Groundwork for Robust Measurements #30158

Closed

shahmishal merged commit df53c20 into swiftlang:master Apr 3, 2020

broadwaylamb deleted the benchmark-python-3 branch April 4, 2020 08:33

Support Python 3 in the benchmark suite #30085

Support Python 3 in the benchmark suite #30085

Uh oh!

Conversation

broadwaylamb commented Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

compnerd commented Feb 27, 2020

Uh oh!

palimondo commented Feb 27, 2020

Uh oh!

swift-ci commented Feb 27, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

broadwaylamb commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

compnerd commented Feb 27, 2020

Uh oh!

compnerd commented Feb 27, 2020

Uh oh!

compnerd commented Feb 27, 2020

Uh oh!

palimondo commented Feb 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

gottesmm commented Feb 27, 2020

Uh oh!

palimondo commented Feb 27, 2020

Uh oh!

broadwaylamb commented Feb 27, 2020

Uh oh!

broadwaylamb commented Feb 27, 2020

Uh oh!

compnerd commented Feb 27, 2020

Uh oh!

broadwaylamb Feb 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palimondo Feb 28, 2020

Choose a reason for hiding this comment

Uh oh!

palimondo Feb 29, 2020

Choose a reason for hiding this comment

Uh oh!

broadwaylamb Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

broadwaylamb Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

compnerd commented Feb 28, 2020

Uh oh!

swift-ci commented Feb 28, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

Uh oh!

palimondo commented Mar 1, 2020

Uh oh!

palimondo commented Mar 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broadwaylamb commented Mar 1, 2020

Uh oh!

palimondo commented Mar 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broadwaylamb commented Feb 26, 2020 •

edited

Loading

palimondo commented Feb 27, 2020 •

edited

Loading

broadwaylamb Feb 27, 2020 •

edited

Loading

palimondo commented Mar 1, 2020 •

edited

Loading

palimondo commented Mar 1, 2020 •

edited

Loading

palimondo commented Mar 2, 2020 •

edited

Loading

eeckstein commented Mar 3, 2020 •

edited

Loading