Skip to content

Allow serializers to accept a model instance #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2023
Merged

Conversation

adriangb
Copy link
Member

@adriangb adriangb commented Mar 12, 2023

Closes #435

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 12, 2023

CodSpeed Performance Report

Merging #437 serializer-self (9b1acd2) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 93 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2023

Codecov Report

Merging #437 (9b1acd2) into main (5efeaf9) will decrease coverage by 0.07%.
The diff coverage is 88.70%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
- Coverage   94.90%   94.84%   -0.07%     
==========================================
  Files          93       93              
  Lines       11157    11257     +100     
  Branches       22       25       +3     
==========================================
+ Hits        10589    10677      +88     
- Misses        563      573      +10     
- Partials        5        7       +2     
Impacted Files Coverage Δ
src/serializers/type_serializers/function.rs 91.81% <86.15%> (-1.76%) ⬇️
src/serializers/type_serializers/typed_dict.rs 94.39% <86.66%> (+0.24%) ⬆️
pydantic_core/core_schema.py 96.68% <89.65%> (-0.33%) ⬇️
src/serializers/extra.rs 95.42% <100.00%> (+0.16%) ⬆️
src/serializers/type_serializers/model.rs 90.00% <100.00%> (+2.24%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5efeaf9...9b1acd2. Read the comment docs.

Comment on lines 129 to 184
class SerializePlainFunctionInstanceMethod(Protocol): # pragma: no cover
def __call__(self, __self: Any, __input_value: Any, __info: SerializationInfo) -> Any:
...


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal with this is to keep pydantic-core strongly typed and free of introspection. This forces the caller to determine which type of function they are passing (a free standing function / bound class or static method or an unbound instance method).

Comment on lines 60 to 139
let info = SerializationInfo::new(py, include, exclude, extra);
if self.when_used.should_use(value, extra) {
self.func
.call1(py, (value, SerializationInfo::new(py, include, exclude, extra)))
if self.needs_model_instance {
self.func.call1(py, (extra.model.unwrap(), value, info))
} else {
self.func.call1(py, (value, info))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is nothing stopping the model from being passed to a non-field serializer. That is, you could in theory do something like:

class CustomType(int):
  def __pydantic_core_serializer__(self):
      return core_schema. function_plain_ser_schema_model(lambda model, value, info: str(value))

class Model(BaseModel):
  x: CustomType

(Or the equivalent with Annotated once we support that).

And then the model parameter in the lambda would be an instance of Model, which is super weird.

But I don't think it's worth trying to block this, it would add complexity and I think users are judicious enough to not do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not worth trying to block. However, I think it's probably worth us being clear that we aren't making a lot of promises about the long-term stability of the pydantic-core api; ideally we could make changes in the future to block this if we find a better approach or whatever.

@@ -185,6 +191,7 @@ pub struct FunctionWrapSerializer {
function_name: String,
json_return_ob_type: Option<ObType>,
when_used: WhenUsed,
needs_model_instance: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this as a bool since there's only two options, it could be nicer as an enum but that'd be adding boilerplate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bool is fine here, probably worth minimizing boilerplate for now considering this is nascent enough that it might still get significant changes for reasons we haven't thought of yet.

if self.allow_value(value, extra)? {
let dict = object_to_dict(value, true, extra)?;
self.serializer.to_python(dict, include, exclude, extra)
let extra = extra.with_model(Some(value));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the comment above this keeps getting passed down until another model is encountered as is the case for data in validators.

Comment on lines 359 to 361
def ser_x(self, v: Any, serializer: core_schema.SerializeWrapHandler, _) -> str:
x = serializer(v)
return f'{x:_}'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having implemented this what comes across as a bit weird is that these functions accept self and the value. If you have access to `self, why do you need us to pass in the value? Besides, if you get it from the model instance it will already be typed and it has a good name. It's pretty common for validators at least to duplicate the field name and type hint into the validator signature. Which makes me think it would make more sense to just not pass in the value here as a parameter.

# if you want to customize serialization for a type you can use Annotated
# or create a custom type with __pydantic_core_serializer__ implemented
MyInt = Annotated[int, PlainSerializer(lambda x, _: x * 2)]

class Model(BaseModel):
   # if you don't need the model
   # and want to customize serialization for just this field (not the type)
   # you can do that using Annotated in the field type hint

   # you get just the field's value
    x: Annotated[int, PlainSerializer(lambda x, _: x * 2)]
    # and can nest it inside a generic type
    y: Dict[str, Annotated[int, PlainSerializer(lambda x, _: x * 2)]]
    # wrap serializer applied to a field
    z: Annotated[List[int], WrapSerializer(lambda x, ser, _: ser(item, idx) for idx, item in enumerate(x))]

   # but if you want the whole model, either because
   # you want to customize serialization based on some other attribute of the model
   # (e.g. internationalization)
   # you can use @serialize applied to an instance method

    # you get the whole model but only serialize one field
    # this gets applied at the `typed-dict-field` level
    @serializer("x")
    def ser_x(self, info: core_schema.SerializationInfo) -> str:
        return f'{self.x:_}'

    # you get the whole model and serialize the entire thing
    # this gets applied at the `model` schema level
    @serializer
    def ser_model(self, info: core_schema.SerializationInfo) -> str:
        return json.dumps(self.model_dict())

    # you can also use a class or instance method
    # which would be the same as using Annotated on the field
    @serializer("x")
    @staticmethod
    def ser_x(x: int, info: core_schema.SerializationInfo) -> str:
        return f'{self.x:_}'

Obviously we're not going to implement all of this for this PR. But I think what would make sense is to change this PR to support two signatures:

  • (__self, __info)
  • (__value, __info)
    The two signatures are compatible, we'll have to "choose" one from pydantic depending on (1) if it was done via Annotated, a custom type or the @serialize decorator and (2) in the case of the @serialize decorator we'll introspect to detect if the first argument is self or slf and if so assume it's an unbound instance method, else assume it's a static or class method that accepts just the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmontagu your thoughts on this would be appreciated

Copy link
Collaborator

@dmontagu dmontagu Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a very strong opinion about what should be done here, but I will say that I don't fully buy into the "If you have access to self, why do you need us to pass in the value? Besides, if you get it from the model instance it will already be typed and it has a good name." line of reasoning — in general, I would expect interacting with self to be significantly less common than just interacting with the value alone, and I think it's okay if there's multiple ways of getting at the value if you do choose to interact with self. (Again, that's not to argue for or against supporting instance methods, I see good reasons both for and against.)

I think ease-of-use/simplicity for users is significantly more important than "normalizing" the API, and to that end I'm inclined to try to reduce the number of supported signatures, rather than have extra logic/patterns for people to learn. I think my gut reaction is that the more options we expose, the more challenging it will be for typical users to get comfortable with the full spectrum of functionality (especially considering that, in my experience, most typical users are writing at most a handful of custom validators in their whole application, built over months/years).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, I would expect interacting with self to be significantly less common than just interacting with the value alone

I'm inclined to try to reduce the number of supported signatures, rather than have extra logic/patterns for people to learn.

Just to be clear, I think we're always going to have to support two signatures. It's either going to be (__self, __info) and (__value, __info) or (__self, __value, __info) and (__value, __info). Or are you suggesting something else?

Copy link
Collaborator

@dmontagu dmontagu Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm not understanding; I'm mostly just concerned with making sure that it's hard to find yourself in a confusing situation as a user (and to be clear — I don't have strong opinions beyond ensuring this is the case).

(EDIT: And yes, I suspect that (__self, __value, __info) and (__value, __info) would be a little more consistent from users' perspectives, though I guess it depends a lot on how it's implemented; in particular, my gut reaction is that I don't really like the idea that if you define a validator as a method you are forced to not use the form that accepts the value you are validating, but I'm not sure whether that's what you had in mind.)

When you say:

I think what would make sense is to change this PR to support two signatures:

  • (__self, __info)
  • (__value, __info)

I see the double-leading-underscore and think this means these variables will be treated as positional, and my concern is that whatever logic gets used to determine which one is used could result in surprises for the user and/or be something that is hard to discover without reading the docs. (Maybe that's unavoidable but the more "discoverable" we can make this sort of thing, the better, I think. For example, self could live on info — that would potentially make it more discoverable, though has the obvious shortcoming of losing type-hints. So I don't actually think that's a good alternative, just making the point. Perhaps a better alternative is just listing all the allowed signatures in the docstring for the validator decorator.)

Ultimately I think it's okay as long as users get early and clear feedback about what they've done wrong; I think it would be more of a problem if they don't find out the signature is wrong until the function gets called to validate data. (As opposed to during class initialization.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken. But unless we want to not support instance method validators we're going to have to deal with a signature including self. I think we can get it to work nicely using callable protocols and a bit of introspection in pydantic:

https://mypy-play.net/?mypy=latest&python=3.11&gist=a2ae207967b569eff9fee5960d65cd08

Pylance/Pyright

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this more I'm definitely more in favor of dropping value as an input if you are getting self. As long as that doesn't make the implementation more complicated. If it does, I'm fine with keeping value as an input to the function call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be much different, I'll change this to be (self, info)

Copy link
Collaborator

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some new notes and some comments on existing ones

)


def function_plain_ser_schema_model(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another minor naming thing — I would prefer method_plain_ser_schema or instance_method_plain_ser_schema over function_plain_ser_schema_model. Same for the wrap version.

Also not a strong opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would model_field_function_plain_ser_schema work? That's what I have over in #439

function: SerializePlainFunction | SerializePlainFunctionInstanceMethod,
json_return_type: JsonReturnTypes | None,
when_used: WhenUsed,
needs_model_instance: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor naming thing, I think I would prefer is_instance_method as the name of this argument. Same for the wrap version below.

Not a strong opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've iterated through several. None seem great because I think fundamentally this is the wrong sorta abstraction (we should be calling the method on the instance or something so that the caller doesn't care). But I'm good with whatever, we should just make this and #439 match

needs_model_instance: bool


def _function_plain_ser_schema(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if there was something indicating that this function exists for the purpose of sharing logic between the other functions that call it. That could be just an update to the name (something like _common_function_plain_ser_schema), or could be a very brief docstring.

Without a more explicit indication, while skimming a file like this I get concerned that there's a broader convention I'm not aware of that would apply to the other schema types too, and I have to dig into the code enough to realize by actually understanding the code that this is a special case. With a little more naming clarity I can more quickly recognize that it's safe for me to not pay attention to this if it's not something I'm actively concerned with.

Same for the wrap version below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll get rid of these functions and just call dict_not_none directly like I do in #439. It's a couple more LOC but worth avoiding the private function.

self.func
.call1(py, (value, SerializationInfo::new(py, include, exclude, extra)))
if self.needs_model_instance {
self.func.call1(py, (extra.model.unwrap(), value, info))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I think it would be better to actually check that extra.model is not None; don't want a schema generation bug to cause a rust panic.

I'm not quite sure yet how the needs_model_instance and extra.model values get plumbed, but I almost think you could change if self.needs_model_instance to if let Some(model) = extra.model; if you do find a way to produce a scenario where these conditions aren't equivalent, I would think you might even get a useful python error (saying something to the effect of 'you didn't provide enough arguments', or 'you provided too many arguments').

Comment on lines 60 to 139
let info = SerializationInfo::new(py, include, exclude, extra);
if self.when_used.should_use(value, extra) {
self.func
.call1(py, (value, SerializationInfo::new(py, include, exclude, extra)))
if self.needs_model_instance {
self.func.call1(py, (extra.model.unwrap(), value, info))
} else {
self.func.call1(py, (value, info))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not worth trying to block. However, I think it's probably worth us being clear that we aren't making a lot of promises about the long-term stability of the pydantic-core api; ideally we could make changes in the future to block this if we find a better approach or whatever.

@@ -185,6 +191,7 @@ pub struct FunctionWrapSerializer {
function_name: String,
json_return_ob_type: Option<ObType>,
when_used: WhenUsed,
needs_model_instance: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a bool is fine here, probably worth minimizing boilerplate for now considering this is nascent enough that it might still get significant changes for reasons we haven't thought of yet.

Comment on lines 359 to 361
def ser_x(self, v: Any, serializer: core_schema.SerializeWrapHandler, _) -> str:
x = serializer(v)
return f'{x:_}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this more I'm definitely more in favor of dropping value as an input if you are getting self. As long as that doesn't make the implementation more complicated. If it does, I'm fine with keeping value as an input to the function call.

@adriangb adriangb self-assigned this Mar 15, 2023
@adriangb
Copy link
Member Author

Ok this has been rebased and rewritten to match the feedback and implementation in #439. @dmontagu @samuelcolvin please review when you can

@adriangb adriangb enabled auto-merge (squash) March 17, 2023 23:33
@adriangb adriangb merged commit dfb5be6 into main Mar 17, 2023
@adriangb adriangb deleted the serializer-self branch March 17, 2023 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support accessing a model instance on (some) Serializers
3 participants