bpo-47131: Speedup AST comparisons in test_unparse by using node traversal #32132

jkloth · 2022-03-26T19:19:08Z

https://bugs.python.org/issue47131

AlexWaygood · 2022-03-27T12:34:43Z

You appear to only be modifying a test file in this PR — is that deliberate?

jkloth · 2022-03-27T17:43:05Z

Yes, as stated in the BPO issue. Speeding up the test harness is good for the buildbots.

AlexWaygood · 2022-03-27T17:54:04Z

Yes, as stated in the BPO issue. Speeding up the test harness is good for the buildbots.

Apologies — I read the BPO description but not the BPO title!

AlexWaygood · 2022-03-27T18:04:34Z

(It would be good to make it clear in the PR title, as well as the BPO title, that this change is specifically related to speeding up the test suite — I assumed it was going to be a performance improvement to a stdlib function.)

vstinner

@pablogsal @isidentical: Would you mind to review this AST change?

vstinner · 2022-03-28T06:53:17Z

Lib/test/test_unparse.py

@@ -130,7 +130,23 @@ class Foo: pass

 class ASTTestCase(unittest.TestCase):
    def assertASTEqual(self, ast1, ast2):
-        self.assertEqual(ast.dump(ast1), ast.dump(ast2))
+        missing = object()
+        def compare(a, b):


Defining a function at each assertASTEqual() call sounds expensive. Can you pass self parameter to compare() and move it as the class level (method or staticmethod), or even at the module level (only define it once)?

For example:

def _assertASTEqual(self, a, b): # accept AST nodes, lists and objects ... # your code def assertASTEqual(self, ast1, ast2): # accept AST nodes self._assertASTEqual(ast1, ast2)

I timed three approaches to see exactly the costs involved. All test runs were performed using:
amd64\python_d.exe -m test -uall test_unparse
All three versions, inner function, module function and method come in at 59s +/- 0.2s over 5 runs. Contrasted to the current runtime of 95s.

My chosen approach is modeled loosely after ast.dump() by using the inner function. An advantage of the inner function is that the missing value does not need to pollute the function signature nor global namespace. Ultimately, the design is just stylistic.

If you prefer to redefine the function at each call, would you mind to add a comment explaining why you prefer that?

My worry is mostly about performance. If there is no impact on performance, you can leave it as it is.

I do note, however, in micro-benchmarks creating inner functions have a substantial runtime cost. But in this case, that time is dwarfed by time spent calling that function.

pablogsal · 2022-03-30T11:04:08Z

Can you provide some comparisons before and after this PRs in test running time, specify exactly what are you running?

That way we can try to reproduce it :)

pablogsal · 2022-03-30T11:05:10Z

Lib/test/test_unparse.py

+                    traverse_compare(value1, value2)
+            elif isinstance(a, list):
+                if len(a) != len(b):
+                    self.fail("len(a) != len(b)")


You want an f-string here likely, becase the names "a" and "b" are meaningless in the error message

@pablogsal Wouldn't assertNotEqual be better here?

elif isinstance(a, list): - if len(a) != len(b): - self.fail("len(a) != len(b)") + self.assertNotEqual(len(a), len(b))

Edit: ... since self.fail is used instead of assert anyway.

Not really, because you want also to show a and b somehow in the error, not just that 4 is different than 6

bedevere-bot · 2022-03-30T11:05:19Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

pablogsal · 2022-03-30T11:05:49Z

Lib/test/test_unparse.py

+            elif isinstance(a, list):
+                if len(a) != len(b):
+                    self.fail("len(a) != len(b)")
+                for node1, node2 in zip(a, b):


You could also use the new zip(a,b, strict=True) and skip the previous test.

I did originally, but using self.fail() from an except block results in a nested exception. I don't know of a way to do raise ... from None without the raise :)

pablogsal · 2022-03-30T11:06:31Z

Lib/test/test_unparse.py

+                for field in a._fields:
+                    value1 = getattr(a, field, missing)
+                    value2 = getattr(b, field, missing)
+                    traverse_compare(value1, value2)


You can avoid the call if both are missing

True, but that adds an additonal if for the successful case, which IMO should be the common path.

A function call is extremely more expensive than a conditional, especially if you compare them with is.

As I thought, adding a check for missing ended up with a statistically insignificant slowdown, however it did lead to the realization of singleton checking, which results in a further 2% improvement.

pablogsal · 2022-03-30T11:08:29Z

Lib/test/test_unparse.py

+        # the `missing` value from polluting the "real" function
+        # signature or module namespace.


You can make the "missing" value a default argument to achieve the same, and the load will be slightly faster than the DEREF. The function is not public so is not a problem

In the same commit, I moved missing to an argument default of the inner function. I added the detail per Victor's request, but I'd gladly strike it just as well.

jkloth · 2022-03-30T15:24:16Z

As stated in the BPO, there is a 33% improvement in runtime. My comment to Victor has exact timings, 95s before, 59s after.

jkloth · 2022-03-30T16:49:53Z

I've pushed new changes which should resolve all comments

vstinner · 2022-03-31T07:51:23Z

Lib/test/test_unparse.py

+        # An AST comparison routine modeled after ast.dump(), but
+        # instead of string building, it traverses the two trees
+        # in lock-step.
+        def traverse_compare(a, b, missing=object()):


Since you moved missing inside the function definition, now you may move the function definition at the class level (method) of the module level (function, you should pass the testcase as an argument in this case), to avoid the cost of defining a new function at each assertASTEqual() call.

If you are concerned about performance, using an inner function is faster than calling a method. True, the setup cost is greater, but that time is more than recovered due to attribute lookup after a few hundred comparisons. Global (module level) functions suffer the same fate as well but require more comparisons before getting overtaken.

After instrumenting test_unparse to get the number of calls to assertASTEqual and compares, I whipped up a reproducible minimal test script to show the timings. The script is here for you to run yourself.

Using main on Win10:
100 -> inner: 29 usec method: 27.4 usec
200 -> inner: 56.8 usec method: 48 usec
300 -> inner: 76.3 usec method: 86.4 usec
CosmeticTestCase (14 compares)
inner: 88.9 usec
method: 77.9 usec
DirectoryTestCase (400 compares)
inner: 74.1 msec
method: 79.5 msec
UnparseTestCase (400 compares)
inner: 91 msec
method: 97.6 msec

As I wrote previously, if there is a good reason to define a "local" (nested?) function, please add a comment here for future readers. Otherwise, someone who doesn't like nested functions can move the function (during a random refactoring) without knowing that you ran benchmarks and the code is faster when written like that. It's very easy to lose information when code is read/modified 5, 10 or 20 years later. And yes, it happens to me often to read 20 years old Python code and I have to dig into Git history (with unpleasant issues about CVS, SVN and HG commit numbers) and very old bug tracker issues to attempt to rebuild the rationale for the existing code before feeling safe to change it.

The performance doesn't make sense (method/module level code) to me. It should be as fast or faster. But my only god are benchmarks: I only trust benchmarks numbers :-)

jkloth · 2022-03-31T18:49:13Z

I'm sorry but I need to comment as this is preventing me from doing further work. This discussion has left a sour taste in my mouth. I've been contributing to Python in some form or another (patches, reviews, PRs, buildbots, monetarily) for 20+ years. Out of the desire to see my buildbots (and by extension, all of the others) run quickly and failure free, I have offered an improvement that reduces runtime by 33% (depending on the machine ~30s), and at least on my buildbot, results in a total runtime reduction of 10%.

There have been quibbles over 0.1% differences, overlooked information that was available here or on BPO, and misunderstandings of just how some things work in practice. I personally try to not give uninformed advice and would hope others respond in kind.

As gatekeepers to Python's source, I get that its quality must be preserved, however uninformed pushback is not helpful. Or at least, feels to me, as uninformed. Maybe it's just lack time to do more thorough responses, I get that Python development time can be a scarce resource. Would it not, then, be better spent, encouraging more contributors to lessen that burden? One could assume, due to me running buildbots, that I care about testing? Good faith should mean I have no desire to degrade the test suite, quite the opposite.

Anyway, I've provided detailed testings of my approach, I would like to see this merged and believe that this implementation is currently is as good as it can be without getting obscure and difficult to reason about. I have no desire to make any more changes. Merge or close, at this point, it makes little matter to me.

Thank you for your consideration. If there is any part of this that people would like to respond to, I'm more than happy to have that conversation, public or private. This was not meant to offend anyone, but if I didn't write this, I would not be inclined to continue providing any PRs.

vstinner · 2022-03-31T21:51:20Z

I'm sorry but I need to comment as this is preventing me from doing further work. This discussion has left a sour taste in my mouth. I've been contributing to Python in some form or another (patches, reviews, PRs, buildbots, monetarily) for 20+ years.

Thanks for that :-)

Out of the desire to see my buildbots (and by extension, all of the others) run quickly and failure free, I have offered an improvement that reduces runtime by 33% (depending on the machine ~30s), and at least on my buildbot, results in a total runtime reduction of 10%.

There have been quibbles over 0.1% differences, overlooked information that was available here or on BPO, and misunderstandings of just how some things work in practice. I personally try to not give uninformed advice and would hope others respond in kind.

As gatekeepers to Python's source, I get that its quality must be preserved, however uninformed pushback is not helpful. Or at least, feels to me, as uninformed. Maybe it's just lack time to do more thorough responses, I get that Python development time can be a scarce resource. Would it not, then, be better spent, encouraging more contributors to lessen that burden? One could assume, due to me running buildbots, that I care about testing? Good faith should mean I have no desire to degrade the test suite, quite the opposite.

Don't give us ill intents. Reviewing takes a lot of time. It's not just about reading the code, but change context at each PR/issue, read the code around, understand what's going on, try to dig into the history, consider better alternatives, etc.

I personally have a limited bandwidth to care about the Windows buildbots, but I know that you spent a lot of energy on fixing tons of Windows issues that no one else cares about, so I'm forcing myself to spend more time on this specific PR :-)

AST is out of my scope/knowledge, I prefer to leave the local expert, Pablo and Batuhan. Pablo has limited availability and Batuhan simply seems unavailable. Be patient, it will be fine ;-)

While you have a laser focus on Windows buildbots, there are always more urgent blocker issues to fix, other bugs to fix, etc.

Anyway, I've provided detailed testings of my approach, I would like to see this merged

The PR of merging any PR is that the person who merges it takes the responsibility of regressions and futures issues around the change in the next 5 years. When I don't want to take this responsibility, I prefer to merge, or even not review.

Thank you for your consideration. If there is any part of this that people would like to respond to, I'm more than happy to have that conversation, public or private. This was not meant to offend anyone, but if I didn't write this, I would not be inclined to continue providing any PRs.

Contributing to Python is very frustrating for contributors, but also for core developers who are usually seen as persons in charge for reviews (whereas in practice anyone is free to review).

vstinner · 2022-03-31T21:52:16Z

IMO so far, the review was productive, the changes already made the code better.

jkloth · 2022-03-31T23:14:53Z

Contributing to Python is very frustrating for contributors, but also for core developers who are usually seen as persons in charge for reviews (whereas in practice anyone is free to review).

Core developers are in charge of the merge and would be remiss if not doing a review (as in making sure things work as they should, not strictly a GH code review) prior to merge anyway. I suppose some may see it as core only, but it really is core finally.

I know time is precious, so I have no (well, minimal, TBH) issue with leaving changes sit for weeks, months and yes, even years.

IMO so far, the review was productive, the changes already made the code better.

I have no issues with changes that align with my stated goal (speed). Pablo pointed out some of those. The problem with that is a conflation of changes being done. For example, changes are requested to further improve failure messages. Existing failure reporting (comparing string results) is abysmal at best, so anything I added is already an improvement, but just a side-effect. If improved error reporting is desired, that should have been done as a subsequent PR, but now I'm roped into doing that too, just to have any hope of getting my speed-up merged. It took longer to develop the improved error reporting than to do the initial PR.

Perfect is the enemy of good.

I hope you can see how frustrating it can be to be asked to do things that have nothing to do with your stated goal, just because of "hey, look at what's possible now!" And without stating what are your actual goals. Basically, why you would want a particular change contrasted to the current state of affairs, not the isolation of just changed lines.

I'm not here to downplay any effort already put forth, but my hope is that insights into where I've had frustrations will help improve interactions for all others as well.

pablogsal · 2022-04-02T01:54:26Z

Thanks for the PR and for addressing the changes @jkloth !

bpo-47131: Speedup AST comparisons by using node traversal

b7bf1da

bedevere-bot added the awaiting review label Mar 26, 2022

the-knights-who-say-ni added the CLA signed label Mar 26, 2022

bedevere-bot added the tests Tests in the Lib/test dir label Mar 26, 2022

AlexWaygood added the skip news label Mar 27, 2022

jkloth changed the title ~~bpo-47131: Speedup AST comparisons by using node traversal~~ bpo-47131: Speedup AST comparisons in test_unparse by using node traversal Mar 27, 2022

vstinner reviewed Mar 28, 2022

View reviewed changes

Add additional error checking with a bit of description

1f03499

pablogsal requested changes Mar 30, 2022

View reviewed changes

bedevere-bot removed the awaiting review label Mar 30, 2022

bedevere-bot added the awaiting changes label Mar 30, 2022

pablogsal reviewed Mar 30, 2022

View reviewed changes

Improve error messaging

dd8db6a

vstinner reviewed Mar 31, 2022

View reviewed changes

pablogsal merged commit 0f68c20 into python:main Apr 2, 2022

bedevere-bot removed the awaiting changes label Apr 2, 2022

jkloth deleted the issue47131 branch April 5, 2022 22:26

		# the `missing` value from polluting the "real" function
		# signature or module namespace.

Uh oh!

bpo-47131: Speedup AST comparisons in test_unparse by using node traversal #32132

bpo-47131: Speedup AST comparisons in test_unparse by using node traversal #32132

Uh oh!

Conversation

jkloth commented Mar 26, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commented Mar 27, 2022

Uh oh!

jkloth commented Mar 27, 2022

Uh oh!

AlexWaygood commented Mar 27, 2022

Uh oh!

AlexWaygood commented Mar 27, 2022

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal commented Mar 30, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arhadthedev Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Mar 30, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal Mar 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkloth commented Mar 30, 2022

Uh oh!

jkloth commented Mar 30, 2022

Uh oh!

vstinner Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkloth commented Mar 31, 2022

Uh oh!

vstinner commented Mar 31, 2022

Uh oh!

vstinner commented Mar 31, 2022

Uh oh!

jkloth commented Mar 31, 2022

Uh oh!

pablogsal commented Apr 2, 2022

Uh oh!

Uh oh!

jkloth commented Mar 26, 2022 •

edited by bedevere-bot

Loading

arhadthedev Mar 30, 2022 •

edited

Loading

pablogsal Mar 30, 2022 •

edited

Loading

vstinner Mar 31, 2022 •

edited

Loading