Skip to content

[mypyc] Promotion to Value Type optimization #17932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

jairov4
Copy link
Contributor

@jairov4 jairov4 commented Oct 13, 2024

This PR introduce to mypyc the ability to generate efficient classes passed by value
unleashing deeper optimizations in the C generated code.
With this change mypyc will be finding immutable classes to be promoted as Value Types.
Value types are unboxed representation of classes that can be passed by value avoiding
dynamic memory allocations and replacing those with stack based values.

For example:

For this class

@final
@mypyc_attr(value_type=True)
class Vector2:
    def __init__(self, x: i32, y: i32) -> None:
        self.x: Final = x
        self.y: Final = y

    def __add__(self, other: "Vector2") -> "Vector2"
        return Vector2(self.x + other.x, self.y + other.y)

The add method will produce the code:

Vector2Object CPyDef_Vector2_____add__(Vector2Object cpy_r_self, Vector2Object cpy_r_other) {
    int32_t cpy_r_r0;
    int32_t cpy_r_r1;
    int32_t cpy_r_r2;
    int32_t cpy_r_r3;
    int32_t cpy_r_r4;
    int32_t cpy_r_r5;
    Vector2Object cpy_r_r6;
    Vector2Object cpy_r_r7;
    cpy_r_r0 = cpy_r_self._x;
    cpy_r_r1 = cpy_r_other._x;
    cpy_r_r2 = cpy_r_r0 + cpy_r_r1;
    cpy_r_r3 = cpy_r_self._y;
    cpy_r_r4 = cpy_r_other._y;
    cpy_r_r5 = cpy_r_r3 + cpy_r_r4;
    cpy_r_r6 = CPyDef_Vector2(cpy_r_r2, cpy_r_r5);
    if (unlikely(cpy_r_r6.vtable == NULL)) {
        CPy_AddTraceback("engine/vectors2.py", "__add__", 12, CPyStatic_globals);
        goto CPyL2;
    }
    return cpy_r_r6;
CPyL2: ;
    cpy_r_r7 = ((Vector2Object){0});
    return cpy_r_r7;
}

Note the values passed as structs directly and returning it directly too.
It has an impact similar to the RTuple but should work transparently with more flexible definitions due the class syntax.
It works similar to Value Types in C# (CLR) or structures in Swift.
It will allow the language be a more suitable option for performance centric tasks like video game engines or real time simulation computation.

Implementation details

It uses the vtable field to signal if any exception happen similar to the empty flag in RTuple.
A new RType called RInstanceValue is created to represent value types.
All semantics of boxed and unboxed values are preserved where (RInstanceValue is unboxed) and RInstance continues being (boxed).
In order to easily box values using the existing infrastructure, the type's setup method has been used and promoted to exported function.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 15, 2024

Value types are a great feature, thanks! This seems to address mypyc/mypyc#841 -- what do you think?

Since using a value type can change semantics (at least is / object identity), and sometimes a heap type is preferred even if all attributes are final (e.g. there are dozens of attributes -> copying values would be slow), it may be better to require value classes to be annotated explicitly. We could start with @mypyc_attr(<something>=True) and come up with a better decorator name in mypy_extensions. See the referred issue for more about this.

@jairov4 jairov4 force-pushed the mypyc-opt-value-types branch from 5471267 to 6868d34 Compare October 15, 2024 22:11
@jairov4 jairov4 force-pushed the mypyc-opt-value-types branch from 6755c87 to 51379e2 Compare October 17, 2024 04:20
@jairov4
Copy link
Contributor Author

jairov4 commented Oct 17, 2024

@JukkaL

Value types are a great feature, thanks! This seems to address mypyc/mypyc#841 -- what do you think?

Yes, it goes in that direction, however the implementation presented here do not cover the @record for your use case.
Additionally, I think that functionality is already provided by python using NamedTuple, where a NamedTuple._replace() method (maybe ugly named) can be used for modifying the object keeping the immutability (thats the way I use today to avoid extra deps in some contexts). Not sure if frozen dataclasses have something similar. We could adapt it to namedtuple as next step to avoid additional syntatic sugar

Since using a value type can change semantics (at least is / object identity), and sometimes a heap type is preferred even if all attributes are final (e.g. there are dozens of attributes -> copying values would be slow), it may be better to require value classes to be annotated explicitly. We could start with @mypyc_attr(<something>=True) and come up with a better decorator name in mypy_extensions. See the referred issue for more about this.

For sure makes sense, I already updated the PR for using @mypyc_attr(value_type=True)

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 17, 2024

Additionally, I think that functionality is already provided by python using NamedTuple

I don't think NamedTuple can be a value type, since it supports subclassing and subclasses can add settable attributes. The idea with @record (or whatever syntax we'd use for this) is that it's both a value type and makes it easy to modify values in-place.

Not sure if frozen dataclasses have something similar.

I don't think they anything like that, and besides dataclasses have reference semantics, even if they are frozen, and support subclassing.

@jairov4
Copy link
Contributor Author

jairov4 commented Oct 17, 2024

makes sense, I didnt saw that NamedTuples can be subclassed and add new mutable members

@jairov4
Copy link
Contributor Author

jairov4 commented Oct 18, 2024

@JukkaL can you review at this point. I think is ready due the tests I have made, let me know if you consider any other test.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 18, 2024

Can you merge master? It now includes the dunder changes. This is a big PR so it will take some time to review and test. I'll start looking into this early next week probably.

@jairov4 jairov4 force-pushed the mypyc-opt-value-types branch from d926eb6 to 4fde6f9 Compare October 18, 2024 14:47
@jairov4
Copy link
Contributor Author

jairov4 commented Oct 19, 2024

Done

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review pass, focusing on the semantics of value types (didn't look at implementation details carefully yet).

@JukkaL JukkaL changed the title [mpypc] Promotion to Value Type optimization [mypyc] Promotion to Value Type optimization Oct 22, 2024
@JukkaL
Copy link
Collaborator

JukkaL commented Oct 22, 2024

These features may not have tests (and some are probably not implemented):

  • Fixed-length tuple items with value types
  • Undefined attribute/local variable with value type (e.g. conditionally assign value)
  • Module-level attribute with value type (may involve boxing, unfortunately)
  • Final module-level attribute with value type (shouldn't involve boxing)
  • Default argument value with value type
  • Reference counting when value type attribute uses reference counting (similar to tuples)

If some of these are unclear, I can help. Also, implementing all of these in a single PR may make the PR too big for comfortable reviewing. It's okay to implement these in multiple PRs, if the functionality is behind a feature flag until complete. We have way of hiding features behind feature flags (--enable-incomplete-feature, see INCOMPLETE_FEATURES in mypy.options).

This comment has been minimized.

This comment has been minimized.

@jairov4
Copy link
Contributor Author

jairov4 commented Nov 24, 2024

@JukkaL Hey, I have made some modifications there.
Including hiding it behind a feature flag.
I can make the additional verification for the cases you mentioned in a new PR to avoid making this grow more

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@jairov4
Copy link
Contributor Author

jairov4 commented Nov 28, 2024

@JukkaL any other idea of aspect to test? I have covered:

  • Fixed-length tuple items with value types
  • Module-level attribute with value type (may involve boxing, unfortunately)
  • Final module-level attribute with value type (shouldn't involve boxing)
  • Default argument value with value type
  • Reference counting when value type attribute uses reference counting (similar to tuples)

@jairov4
Copy link
Contributor Author

jairov4 commented Dec 5, 2024

any comment?

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates, and apologies for the delay. Did another review pass, focused on the big picture. Left some comments and questions. Looks better now.

@@ -39,10 +39,15 @@ def main() -> None:
opt_level = os.getenv("MYPYC_OPT_LEVEL", "3")
debug_level = os.getenv("MYPYC_DEBUG_LEVEL", "1")
strict_dunder_typing = bool(int(os.getenv("MYPYC_STRICT_DUNDER_TYPING", "0")))
compiler = os.getenv("MYPYC_COMPILER", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the compiler type related changes related to the value type optimization? If not, it would be cleaner to separate them into another PR.

@@ -0,0 +1,35 @@
[case testValueTypeNotFinal]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch this file to use unix line endings, for consistency?

return self.name


class RInstanceValue(RInstance):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inheriting from RInstance seems a bit dangerous -- it's an easy mistake to do isinstance(x, RInstance) and assume that x is a boxed type. I think making this a subclass of RType would be cleaner, even if it will require some additional code.

def test_value_type_operator_is() -> None:
# The 'is' operator have undefined result for value types.
# however it must not raise an error.
Vector2I(1, 2) is Vector2I(1, 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be reasonable to generate a compile error in mypyc when we encounter is operation where one of the operands is a value type (but it doesn't need to happen in this PR).

@mypyc_attr(value_type=True)
class A:
def __init__(self, x: i32, y: i32) -> None:
self.x: Final = x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The y parameter is unused. Is this intentional?

What happens if an uninitialized attribute is accessed after construction?

l = [A(1, 2), A(5, 7)]
assert l[0].x == 1
assert l[1].x == 5
assert l[0] != A(1, 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels unintuitive, but it's fine as a temporary approach. Eventually we should require a class decorator for value types that synthesizes an __eq__ method. This could also allow equality without boxing.

assert l[0] != A(1, 2)
assert isinstance(l[0], A)
assert l[0].x == 1
assert l[1] == "test"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test also accessing the boxed object via an Any type, if it works (or add a TODO). E.g. l: List[Any] = [...] and so on. Eventually it would be good to have tests that all operations work via boxed objects using Any, i.e. Python semantics.


def test_bool() -> None:
assert bool(A(1, 2))
assert not bool(A(0, 2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if A(1, 2): ... etc?


@final
@mypyc_attr(value_type=True)
class B(A): pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is a non-final attribute?

I think long-term all attributes of value classes should be implicitly final, but it can be implemented later.

def __init__(self, x: i32, y: i32) -> None:
self.x: Final = x

def __hash__(self) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are non-dunder methods currently supported? If yes, can you add some tests? If not, can you make them generate a compile error, or at least add a TODO in this test file? What about properties?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants