Skip to content

[mypyc] Build lists using a primitive op #10807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 22, 2021

Conversation

97littleleaf11
Copy link
Collaborator

Description

Closes mypyc/mypyc#264

This PR adds a primitive op and a C helper function for building lists.

Test Plan

This change helps reduce the generated code size, however adds some overhead in calling.

Microbenchmark tested on master branch:

interpreted: 0.000256s (avg of 3582 iterations; stdev 2.2%)
compiled:    0.000063s (avg of 3582 iterations; stdev 2.1%)

compiled is 4.073x faster

On this PR:

interpreted: 0.000275s (avg of 3331 iterations; stdev 4.4%)
compiled:    0.000083s (avg of 3331 iterations; stdev 4.5%)

compiled is 3.321x faster
@benchmark
def list_build() -> None:
    n = 0
    for i in range(1000):
        x = ["x", "y", "1", "2", str(i)]
        n += len(x)
    assert n == 5000, n

@TH3CHARLie
Copy link
Collaborator

Well, #9378

@97littleleaf11
Copy link
Collaborator Author

Oops, it seems like a revert

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 12, 2021

The performance impact seems a bit too big to make this worthwhile as is. However, maybe we can come up with something reasonable with some tweaks.

First, in most cases the constructed list has 0 or 1 items. In this case I'd expect the performance impact to be bigger than for larger lists. It would be good to have microbenchmark results for these cases as well (esp. when the item is a literal, such as a string).

Since 0 and 1 items are very common, I think that we can't really afford to lose any performance there. One option would be to fall back to the original behavior for small lists -- at least 0 and 1 items, possibly up to 3-5 items or so. The code size impact becomes significant for larger lists, but it's not such a problem for small lists.

Second, this will sometimes result in extra incref/decref that we don't really need. I think that we can avoid the incref in the function by stealing the items (see StealsDescription).

@97littleleaf11
Copy link
Collaborator Author

Microbenchmark results

SetMem list_build_op ratio
0 0.000120 0.000148 0.81
1 0.000228 0.000252 0.90
5 0.000289 0.000353 0.82
10 0.000349 0.000448 0.78

Empty list:

setmem:

interpreted: 0.000743s (avg of 1268 iterations; stdev 1.4%)
compiled:    0.000120s (avg of 1268 iterations; stdev 0.57%)

compiled is 6.166x faster

list_build_op:

interpreted: 0.000740s (avg of 1293 iterations; stdev 1.3%)
compiled:    0.000148s (avg of 1293 iterations; stdev 0.74%)

compiled is 5.006x faster

One item:


setmem

interpreted: 0.000967s (avg of 942 iterations; stdev 2.3%)
compiled:    0.000228s (avg of 942 iterations; stdev 3.4%)

compiled is 4.241x faster

Op

interpreted: 0.000952s (avg of 1009 iterations; stdev 0.96%)
compiled:    0.000252s (avg of 1009 iterations; stdev 1.2%)

compiled is 3.779x faster

5 items:

setmem

interpreted: 0.001188s (avg of 814 iterations; stdev 0.96%)
compiled:    0.000289s (avg of 814 iterations; stdev 0.86%)

compiled is 4.107x faster

op

interpreted: 0.001189s (avg of 812 iterations; stdev 0.96%)
compiled:    0.000353s (avg of 812 iterations; stdev 1.2%)

compiled is 3.369x faster

10 items:

setmem

interpreted: 0.001386s (avg of 683 iterations; stdev 2.5%)
compiled:    0.000349s (avg of 683 iterations; stdev 1.5%)

compiled is 3.973x faster

op

interpreted: 0.001394s (avg of 674 iterations; stdev 2.5%)
compiled:    0.000448s (avg of 674 iterations; stdev 0.6%)

compiled is 3.111x faster

@97littleleaf11
Copy link
Collaborator Author

I probably would set 5 as the threshold. Larger lists should use list_build_op and small ones keep the current logic.

@TH3CHARLie
Copy link
Collaborator

I feel that given the performance numbers you've collected in this PR, 10 would be the ideal threshold. Also I feel 10 is the upper bound people tend to write inline list construction code so making larger than 10 items a single function call is totally acceptable.

Also, there's another more pragmatic way to do this is to collect stats from open-source code and analyze them to see what's the threshold value to cover, say, 95% cases and make the decision based on this observation.

@@ -927,23 +927,8 @@ def new_list_op_with_length(self, length: Value, line: int) -> Value:
return self.call_c(new_list_op, [length], line)

def new_list_op(self, values: List[Value], line: int) -> Value:
length = Integer(len(values), c_pyssize_t_rprimitive, line)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is make_list a good name? since we have another new_list_op_with_length for creating empty list.

Copy link
Collaborator

@TH3CHARLie TH3CHARLie Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummm, I saw we have different namings now like make_dict and new_set_op, I think we should adopt one and stick to it, like make_xxx or new_xxx_op but not both.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have few comments here.

from typing import List
def f(a: List[int]) -> None:
b = a * 2
b = 3 * [4]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this test? It looks like i served some unrelated purpose (check list multiplication)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, it seems that I mistakenly overwritten it.

l.pop()
l.pop()
assert l == [1, 2, 10, 3]
l.pop()
assert l == [1, 2, 10, 3, 4, 5, 7]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep test for both short and long lists, and add a comment explaining this, that mentions the LIST_BUILDING_EXPANSION_THRESHOLD,

r8 = 'i'
r9 = 'j'
r10 = CPyList_Build(10, r0, r1, r2, r3, r4, r5, r6, r7, r8, r9)
x = r10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have IR tests for both short and long list literals?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have dozens of list building cases, whose length is less than 10, in other irbuilding tests. So I think we only need one more test for building a 10 items list.

length: List[Value] = [Integer(len(values), c_pyssize_t_rprimitive, line)]
if len(values) >= LIST_BUILDING_EXPANSION_THRESHOLD:
return self.call_c(list_build_op, length + values, line)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else is not needed here, since if body is a plain return. This will make the diff simpler.

if len(values) >= LIST_BUILDING_EXPANSION_THRESHOLD:
return self.call_c(list_build_op, length + values, line)
else:
result_list = self.call_c(new_list_op, length, line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to explain that this tricky code is here to speed-up creation of short list literals.

@ilevkivskyi ilevkivskyi merged commit 6bb2266 into python:master Jul 22, 2021
@97littleleaf11 97littleleaf11 deleted the list-build-op branch July 22, 2021 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimized construction of collection literals
4 participants