[mypyc] Build lists using a primitive op #10807

97littleleaf11 · 2021-07-12T09:07:28Z

Description

This PR adds a primitive op and a C helper function for building lists.

Test Plan

This change helps reduce the generated code size, however adds some overhead in calling.

Microbenchmark tested on master branch:

interpreted: 0.000256s (avg of 3582 iterations; stdev 2.2%)
compiled:    0.000063s (avg of 3582 iterations; stdev 2.1%)

compiled is 4.073x faster

On this PR:

interpreted: 0.000275s (avg of 3331 iterations; stdev 4.4%)
compiled:    0.000083s (avg of 3331 iterations; stdev 4.5%)

compiled is 3.321x faster

@benchmark
def list_build() -> None:
    n = 0
    for i in range(1000):
        x = ["x", "y", "1", "2", str(i)]
        n += len(x)
    assert n == 5000, n

TH3CHARLie · 2021-07-12T09:43:07Z

Well, #9378

97littleleaf11 · 2021-07-12T10:11:41Z

Oops, it seems like a revert

JukkaL · 2021-07-12T15:01:56Z

The performance impact seems a bit too big to make this worthwhile as is. However, maybe we can come up with something reasonable with some tweaks.

First, in most cases the constructed list has 0 or 1 items. In this case I'd expect the performance impact to be bigger than for larger lists. It would be good to have microbenchmark results for these cases as well (esp. when the item is a literal, such as a string).

Since 0 and 1 items are very common, I think that we can't really afford to lose any performance there. One option would be to fall back to the original behavior for small lists -- at least 0 and 1 items, possibly up to 3-5 items or so. The code size impact becomes significant for larger lists, but it's not such a problem for small lists.

Second, this will sometimes result in extra incref/decref that we don't really need. I think that we can avoid the incref in the function by stealing the items (see StealsDescription).

97littleleaf11 · 2021-07-13T15:48:38Z

Microbenchmark results

	SetMem	list_build_op	ratio
0	0.000120	0.000148	0.81
1	0.000228	0.000252	0.90
5	0.000289	0.000353	0.82
10	0.000349	0.000448	0.78

Empty list:

setmem:

interpreted: 0.000743s (avg of 1268 iterations; stdev 1.4%)
compiled:    0.000120s (avg of 1268 iterations; stdev 0.57%)

compiled is 6.166x faster

list_build_op:

interpreted: 0.000740s (avg of 1293 iterations; stdev 1.3%)
compiled:    0.000148s (avg of 1293 iterations; stdev 0.74%)

compiled is 5.006x faster

One item: 

setmem

interpreted: 0.000967s (avg of 942 iterations; stdev 2.3%)
compiled:    0.000228s (avg of 942 iterations; stdev 3.4%)

compiled is 4.241x faster

Op

interpreted: 0.000952s (avg of 1009 iterations; stdev 0.96%)
compiled:    0.000252s (avg of 1009 iterations; stdev 1.2%)

compiled is 3.779x faster

5 items:

setmem

interpreted: 0.001188s (avg of 814 iterations; stdev 0.96%)
compiled:    0.000289s (avg of 814 iterations; stdev 0.86%)

compiled is 4.107x faster

op

interpreted: 0.001189s (avg of 812 iterations; stdev 0.96%)
compiled:    0.000353s (avg of 812 iterations; stdev 1.2%)

compiled is 3.369x faster

10 items:

setmem

interpreted: 0.001386s (avg of 683 iterations; stdev 2.5%)
compiled:    0.000349s (avg of 683 iterations; stdev 1.5%)

compiled is 3.973x faster

op

interpreted: 0.001394s (avg of 674 iterations; stdev 2.5%)
compiled:    0.000448s (avg of 674 iterations; stdev 0.6%)

compiled is 3.111x faster

97littleleaf11 · 2021-07-14T10:55:53Z

I probably would set 5 as the threshold. Larger lists should use list_build_op and small ones keep the current logic.

TH3CHARLie · 2021-07-14T13:14:50Z

I feel that given the performance numbers you've collected in this PR, 10 would be the ideal threshold. Also I feel 10 is the upper bound people tend to write inline list construction code so making larger than 10 items a single function call is totally acceptable.

Also, there's another more pragmatic way to do this is to collect stats from open-source code and analyze them to see what's the threshold value to cover, say, 95% cases and make the decision based on this observation.

97littleleaf11 · 2021-07-15T15:50:02Z

mypyc/irbuild/ll_builder.py

@@ -927,23 +927,8 @@ def new_list_op_with_length(self, length: Value, line: int) -> Value:
        return self.call_c(new_list_op, [length], line)

    def new_list_op(self, values: List[Value], line: int) -> Value:
-        length = Integer(len(values), c_pyssize_t_rprimitive, line)


Is make_list a good name? since we have another new_list_op_with_length for creating empty list.

Ummm, I saw we have different namings now like make_dict and new_set_op, I think we should adopt one and stick to it, like make_xxx or new_xxx_op but not both.

ilevkivskyi

Thanks! I have few comments here.

ilevkivskyi · 2021-07-22T11:13:19Z

mypyc/test-data/irbuild-lists.test

 from typing import List
-def f(a: List[int]) -> None:
-    b = a * 2
-    b = 3 * [4]


Why are you deleting this test? It looks like i served some unrelated purpose (check list multiplication)

oops, it seems that I mistakenly overwritten it.

ilevkivskyi · 2021-07-22T11:14:46Z

mypyc/test-data/run-lists.test

    l.pop()
    l.pop()
-    assert l == [1, 2, 10, 3]
+    l.pop()
+    assert l == [1, 2, 10, 3, 4, 5, 7]


Please keep test for both short and long lists, and add a comment explaining this, that mentions the LIST_BUILDING_EXPANSION_THRESHOLD,

ilevkivskyi · 2021-07-22T11:15:17Z

mypyc/test-data/irbuild-lists.test

+    r8 = 'i'
+    r9 = 'j'
+    r10 = CPyList_Build(10, r0, r1, r2, r3, r4, r5, r6, r7, r8, r9)
+    x = r10


Do we have IR tests for both short and long list literals?

We have dozens of list building cases, whose length is less than 10, in other irbuilding tests. So I think we only need one more test for building a 10 items list.

ilevkivskyi · 2021-07-22T11:17:56Z

mypyc/irbuild/ll_builder.py

+        length: List[Value] = [Integer(len(values), c_pyssize_t_rprimitive, line)]
+        if len(values) >= LIST_BUILDING_EXPANSION_THRESHOLD:
+            return self.call_c(list_build_op, length + values, line)
+        else:


The else is not needed here, since if body is a plain return. This will make the diff simpler.

ilevkivskyi · 2021-07-22T11:19:36Z

mypyc/irbuild/ll_builder.py

+        if len(values) >= LIST_BUILDING_EXPANSION_THRESHOLD:
+            return self.call_c(list_build_op, length + values, line)
+        else:
+            result_list = self.call_c(new_list_op, length, line)


Add a comment to explain that this tricky code is here to speed-up creation of short list literals.

97littleleaf11 added 4 commits July 11, 2021 07:22

Add list_build_op

3e7a562

Fix IR build cases

ac96099

Add Py_INCREF()

78820e2

Remove deleted code

22dc25e

Fix issue

1ab6915

Steals ref

a79f98e

97littleleaf11 commented Jul 15, 2021

View reviewed changes

97littleleaf11 added 3 commits July 17, 2021 19:50

Add threshold for switching two ways

a47ec79

Add run test and ir building test

d66f7a0

Fix the typing error and several coding style warnings

b81bc92

ilevkivskyi reviewed Jul 22, 2021

View reviewed changes

97littleleaf11 added 2 commits July 22, 2021 20:39

Revert irbuild test and and new run tests for list building

101b2a6

Fix

c25232b

97littleleaf11 requested a review from ilevkivskyi July 22, 2021 15:24

ilevkivskyi approved these changes Jul 22, 2021

View reviewed changes

ilevkivskyi merged commit 6bb2266 into python:master Jul 22, 2021

97littleleaf11 deleted the list-build-op branch July 22, 2021 20:28

Uh oh!

[mypyc] Build lists using a primitive op #10807

[mypyc] Build lists using a primitive op #10807

Uh oh!

Conversation

97littleleaf11 commented Jul 12, 2021

Description

Test Plan

Uh oh!

TH3CHARLie commented Jul 12, 2021

Uh oh!

97littleleaf11 commented Jul 12, 2021

Uh oh!

JukkaL commented Jul 12, 2021

Uh oh!

97littleleaf11 commented Jul 13, 2021

Uh oh!

97littleleaf11 commented Jul 14, 2021

Uh oh!

TH3CHARLie commented Jul 14, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TH3CHARLie Jul 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TH3CHARLie Jul 15, 2021 •

edited

Loading