-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Support Python 3 in the benchmark suite #30085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6c44eea
to
1edd93c
Compare
@swift-ci please benchmark |
Performance: -O
Code size: -OPerformance: -OsizeCode size: -OsizePerformance: -OnoneCode size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@palimondo I don't think it is reasonable to block support for python 2 -> python 3 b/c of an out of tree patch that will not be promptly landed. The last substantive comment on that other PR is from Nov and the patch stream is from Sept. It is almost March, that is too long to stop progress here IMO, especially since python 2 is EOL and swift needs to finish transitioning ASAP. |
@palimondo, the measurement improvements are extremely valuable and look like a move in the right direction. I think that the python2 -> python3 conversion is extremely important as well though. python2 has been EOL'ed and there are other places where the conversion is absolutely critical (e.g. lldb requires python3 support, but parts of the build are still python2 dependent). I realize that you have been working on the patch set for quite some time, and it does seem like it is nearing where you want it to be. Perhaps you could split the work up such that some of the earlier pieces can be rebased and merged into the tree standalone to allow both sides to make progress? I believe that we need to continue to make progress here though in order to keep both the build and the debugging scenarios healthy. |
@swift-ci please smoke test |
@swift-ci please test Windows platform |
@palimondo How big of a change are we talking about? We depend on this infrastructure, so we can't just throw things in. We need to be careful. |
@eeckstein I noticed that you are the original reviewer of this code. What are your thoughts? |
@gottesmm did you mean to ask @eeckstein the above in #26462? |
The tests have failed because the linter complained. I'll update the patch. |
1edd93c
to
dc8efca
Compare
It should be fixed now, could anyone please trigger the CI again? |
dc8efca
to
cce9e81
Compare
@swift-ci please smoke test |
runtime = min( | ||
[ | ||
(result.samples.min - correction) | ||
for i_series in [ | ||
BenchmarkDoctor._select(measurements, num_iters=i) | ||
for correction in [(setup / i) for i in [1, 2]] | ||
] | ||
for result in i_series | ||
] | ||
) | ||
|
||
runtimes = [] | ||
for i in range(1, 3): | ||
correction = setup / i | ||
i_series = BenchmarkDoctor._select(measurements, num_iters=i) | ||
for result in i_series: | ||
runtimes.append(result.samples.min - correction) | ||
runtime = min(runtimes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like whoever will review this to pay attention at this particular change.
The old version relied on Python 2-only feature where the variables defined in a list comprehension leak to the function scope. But I believe this was erroneous: since in Python 2 list comprehensions are eagerly evaluated, the value of i
was always 2 when evaluating the BenchmarkDoctor._select(measurements, num_iters=i)
expression. Similarly, when evaluating (result.samples.min - correction)
, the value of correction
was always the same, namely, the last value it took while evaluating this expression:
[
BenchmarkDoctor._select(measurements, num_iters=i)
for correction in [(setup / i) for i in [1, 2]]
]
Here, I've rewritten it to be a stupid old for loop, which IMHO reads way better, and also seems to behave correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm… I’ll need to check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve corrected the Python3 incompatibility by explicitly propagating the variables from the nested scopes. I don’t see any issue of incorrect evaluation. Here’s standalone simplified test case base on the original source code before the automatic formatter made mess of it in #29719:
$ python
Python 2.7.10 (default, Feb 22 2019, 21:55:15)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> setup = 134
>>> measurements = {1: [[324], [326], [325]], 2: [[257], [259], [258]]}
>>>
>>>
>>> def select(measurements, num_iters):
... return measurements[num_iters]
...
>>>
>>> runtimesP3 = [
... (result[0] - correction) for correction, i_series in
... [(correction, select(measurements, num_iters=i))
... for i, correction in [(i, setup // i) for i in [1, 2]]
... ] for result in i_series]
>>>
>>> runtimes = [
... (result[0] - correction) for i_series in
... [select(measurements, num_iters=i)
... for correction in [(setup / i) for i in [1, 2]]
... ] for result in i_series]
>>>
>>> runtimes == runtimesP3
True
>>> runtimesP3
[190, 192, 191, 190, 192, 191]
It works correctly in Python 3, too:
$ python3
Python 3.6.5 (default, May 24 2018, 10:57:11)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> setup = 134
>>> measurements = {1: [[324], [326], [325]], 2: [[257], [259], [258]]}
>>>
>>>
>>> def select(measurements, num_iters):
... return measurements[num_iters]
...
>>>
>>> runtimesP3 = [
... (result[0] - correction) for correction, i_series in
... [(correction, select(measurements, num_iters=i))
... for i, correction in [(i, setup // i) for i in [1, 2]]
... ] for result in i_series]
>>>
>>> runtimesP3
[190, 192, 191, 190, 192, 191]
I’ll try to incorporate the Python 3 changes in #26462 (it already had fix for the assertEqualtats
typo...
setup = min([result.setup for result in BenchmarkDoctor._select(measurements)]) | ||
setup = min( | ||
[result.setup or 0 for result in BenchmarkDoctor._select(measurements)] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
compares less than anything in Python 2. In Python 3 None
is not comparable at all, so I needed to provide a reasonable default value. 0 seems okay, since this function only has effect if setup
is greater than 200000.
return DTraceResult( | ||
test_name, int(not foundInstability), results, self.csv_output | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DTraceResult
's initializer takes exactly 4 arguments. I don't know why it worked before.
@swift-ci please benchmark |
Performance: -O
Code size: -OPerformance: -Osize
Code size: -OsizePerformance: -Onone
Code size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci test |
I’m porting these changes into my branch, but I got stuck a bit on the differences between |
@palimondo can you give an example, what issue are you experiencing? Why do you need to use |
@broadwaylamb The way I’m testing the Python 3 port as well as continued support for Python 2.7 is by adjusting the shebang between: So far I have this solution and was wondering if that’s the best it can get: if args.output:
if sys.version_info < (3, 0):
with open(args.output, 'w') as f:
f.write(report)
else:
with open(args.output, 'w', encoding='utf-8') as f:
f.write(report) Other solutions I found on StackOverflow weren’t IMO much better. |
@palimondo sorry, I haven't been able to reproduce this. It's all working for me with Python 3 even without |
By the way, this PR doesn't change shebangs to use This makes it easy to accidentally break Python 3 compatibility. What do you people think about switching the CI to use Python 3 for the benchmarking suite? |
@shahmishal Any thoughts on this? |
@eeckstein Is there a deadline for Python 3 compatibility you are trying to meet?
|
I don't mind if you switch to using python3, however python3 will be only available with |
Can we do something like that inside the script (pseudocode):
With "script", I mean e.g. 'run_smoke_bench' |
Or… we could add rely on @swift-ci’s
to
|
@swift-ci please test Windows platform |
@compnerd Does the Windows test run the benchmarks? If so, what version is used to run those python scripts? (I assume that python isn’t shipped on Windows and you control the configuration?) |
@palimondo no, the current scripts don't run it - Windows uses Python 2.7 and Python 3.7.5 IIRC - those are the versions which are shipped as part of VS2019. Nearly the entirety of the build/test infrastructure is whatever is Visual Studio is shipping. Unfortunately, that becomes really slow, so the one special case is CMake which is 3.16.x rather than the Visual Studio version (3.15.x). |
What would be the plan regarding this change? |
CC: @drexin I expect that this should be safe to merge, thoughts? |
No objections from my end. @shahmishal Are you okay with merging this? |
Thanks @broadwaylamb! |
I've tested this patch with both Python 2 and 3, and it seems to work as expected.
cc @gottesmm @palimondo