Allow serializers to accept a model instance #437

adriangb · 2023-03-12T18:27:33Z

Closes #435

codspeed-hq · 2023-03-12T18:31:39Z

CodSpeed Performance Report

Merging #437 serializer-self (9b1acd2) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 93 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

codecov-commenter · 2023-03-12T18:31:40Z

Codecov Report

Merging #437 (9b1acd2) into main (5efeaf9) will decrease coverage by 0.07%.
The diff coverage is 88.70%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
- Coverage   94.90%   94.84%   -0.07%     
==========================================
  Files          93       93              
  Lines       11157    11257     +100     
  Branches       22       25       +3     
==========================================
+ Hits        10589    10677      +88     
- Misses        563      573      +10     
- Partials        5        7       +2

Impacted Files	Coverage Δ
src/serializers/type_serializers/function.rs	`91.81% <86.15%> (-1.76%)`	⬇️
src/serializers/type_serializers/typed_dict.rs	`94.39% <86.66%> (+0.24%)`	⬆️
pydantic_core/core_schema.py	`96.68% <89.65%> (-0.33%)`	⬇️
src/serializers/extra.rs	`95.42% <100.00%> (+0.16%)`	⬆️
src/serializers/type_serializers/model.rs	`90.00% <100.00%> (+2.24%)`	⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5efeaf9...9b1acd2. Read the comment docs.

adriangb · 2023-03-12T18:34:48Z

pydantic_core/core_schema.py

+class SerializePlainFunctionInstanceMethod(Protocol):  # pragma: no cover
+    def __call__(self, __self: Any, __input_value: Any, __info: SerializationInfo) -> Any:
+        ...
+
+


My goal with this is to keep pydantic-core strongly typed and free of introspection. This forces the caller to determine which type of function they are passing (a free standing function / bound class or static method or an unbound instance method).

adriangb · 2023-03-12T18:40:10Z

src/serializers/type_serializers/function.rs

+        let info = SerializationInfo::new(py, include, exclude, extra);
        if self.when_used.should_use(value, extra) {
-            self.func
-                .call1(py, (value, SerializationInfo::new(py, include, exclude, extra)))
+            if self.needs_model_instance {
+                self.func.call1(py, (extra.model.unwrap(), value, info))
+            } else {
+                self.func.call1(py, (value, info))
+            }


Note that there is nothing stopping the model from being passed to a non-field serializer. That is, you could in theory do something like:

class CustomType(int): def __pydantic_core_serializer__(self): return core_schema. function_plain_ser_schema_model(lambda model, value, info: str(value)) class Model(BaseModel): x: CustomType

(Or the equivalent with Annotated once we support that).

And then the model parameter in the lambda would be an instance of Model, which is super weird.

But I don't think it's worth trying to block this, it would add complexity and I think users are judicious enough to not do that.

I agree it's not worth trying to block. However, I think it's probably worth us being clear that we aren't making a lot of promises about the long-term stability of the pydantic-core api; ideally we could make changes in the future to block this if we find a better approach or whatever.

adriangb · 2023-03-12T18:41:03Z

src/serializers/type_serializers/function.rs

@@ -185,6 +191,7 @@ pub struct FunctionWrapSerializer {
    function_name: String,
    json_return_ob_type: Option<ObType>,
    when_used: WhenUsed,
+    needs_model_instance: bool,


I kept this as a bool since there's only two options, it could be nicer as an enum but that'd be adding boilerplate.

I think a bool is fine here, probably worth minimizing boilerplate for now considering this is nascent enough that it might still get significant changes for reasons we haven't thought of yet.

adriangb · 2023-03-12T18:41:41Z

src/serializers/type_serializers/model.rs

-        if self.allow_value(value, extra)? {
-            let dict = object_to_dict(value, true, extra)?;
-            self.serializer.to_python(dict, include, exclude, extra)
+        let extra = extra.with_model(Some(value));


As per the comment above this keeps getting passed down until another model is encountered as is the case for data in validators.

adriangb · 2023-03-12T19:00:32Z

tests/serializers/test_functions.py

+        def ser_x(self, v: Any, serializer: core_schema.SerializeWrapHandler, _) -> str:
+            x = serializer(v)
+            return f'{x:_}'


Having implemented this what comes across as a bit weird is that these functions accept self and the value. If you have access to `self, why do you need us to pass in the value? Besides, if you get it from the model instance it will already be typed and it has a good name. It's pretty common for validators at least to duplicate the field name and type hint into the validator signature. Which makes me think it would make more sense to just not pass in the value here as a parameter.

# if you want to customize serialization for a type you can use Annotated # or create a custom type with __pydantic_core_serializer__ implemented MyInt = Annotated[int, PlainSerializer(lambda x, _: x * 2)] class Model(BaseModel): # if you don't need the model # and want to customize serialization for just this field (not the type) # you can do that using Annotated in the field type hint # you get just the field's value x: Annotated[int, PlainSerializer(lambda x, _: x * 2)] # and can nest it inside a generic type y: Dict[str, Annotated[int, PlainSerializer(lambda x, _: x * 2)]] # wrap serializer applied to a field z: Annotated[List[int], WrapSerializer(lambda x, ser, _: ser(item, idx) for idx, item in enumerate(x))] # but if you want the whole model, either because # you want to customize serialization based on some other attribute of the model # (e.g. internationalization) # you can use @serialize applied to an instance method # you get the whole model but only serialize one field # this gets applied at the `typed-dict-field` level @serializer("x") def ser_x(self, info: core_schema.SerializationInfo) -> str: return f'{self.x:_}' # you get the whole model and serialize the entire thing # this gets applied at the `model` schema level @serializer def ser_model(self, info: core_schema.SerializationInfo) -> str: return json.dumps(self.model_dict()) # you can also use a class or instance method # which would be the same as using Annotated on the field @serializer("x") @staticmethod def ser_x(x: int, info: core_schema.SerializationInfo) -> str: return f'{self.x:_}'

Obviously we're not going to implement all of this for this PR. But I think what would make sense is to change this PR to support two signatures:

(__self, __info)

(__value, __info)
The two signatures are compatible, we'll have to "choose" one from pydantic depending on (1) if it was done via Annotated, a custom type or the @serialize decorator and (2) in the case of the @serialize decorator we'll introspect to detect if the first argument is self or slf and if so assume it's an unbound instance method, else assume it's a static or class method that accepts just the value.

@dmontagu your thoughts on this would be appreciated

I don't have a very strong opinion about what should be done here, but I will say that I don't fully buy into the "If you have access to self, why do you need us to pass in the value? Besides, if you get it from the model instance it will already be typed and it has a good name." line of reasoning — in general, I would expect interacting with self to be significantly less common than just interacting with the value alone, and I think it's okay if there's multiple ways of getting at the value if you do choose to interact with self. (Again, that's not to argue for or against supporting instance methods, I see good reasons both for and against.)

I think ease-of-use/simplicity for users is significantly more important than "normalizing" the API, and to that end I'm inclined to try to reduce the number of supported signatures, rather than have extra logic/patterns for people to learn. I think my gut reaction is that the more options we expose, the more challenging it will be for typical users to get comfortable with the full spectrum of functionality (especially considering that, in my experience, most typical users are writing at most a handful of custom validators in their whole application, built over months/years).

in general, I would expect interacting with self to be significantly less common than just interacting with the value alone

I'm inclined to try to reduce the number of supported signatures, rather than have extra logic/patterns for people to learn.

Just to be clear, I think we're always going to have to support two signatures. It's either going to be (__self, __info) and (__value, __info) or (__self, __value, __info) and (__value, __info). Or are you suggesting something else?

Maybe I'm not understanding; I'm mostly just concerned with making sure that it's hard to find yourself in a confusing situation as a user (and to be clear — I don't have strong opinions beyond ensuring this is the case).

(EDIT: And yes, I suspect that (__self, __value, __info) and (__value, __info) would be a little more consistent from users' perspectives, though I guess it depends a lot on how it's implemented; in particular, my gut reaction is that I don't really like the idea that if you define a validator as a method you are forced to not use the form that accepts the value you are validating, but I'm not sure whether that's what you had in mind.)

When you say:

I think what would make sense is to change this PR to support two signatures:

(__self, __info)

(__value, __info)

I see the double-leading-underscore and think this means these variables will be treated as positional, and my concern is that whatever logic gets used to determine which one is used could result in surprises for the user and/or be something that is hard to discover without reading the docs. (Maybe that's unavoidable but the more "discoverable" we can make this sort of thing, the better, I think. For example, self could live on info — that would potentially make it more discoverable, though has the obvious shortcoming of losing type-hints. So I don't actually think that's a good alternative, just making the point. Perhaps a better alternative is just listing all the allowed signatures in the docstring for the validator decorator.)

Ultimately I think it's okay as long as users get early and clear feedback about what they've done wrong; I think it would be more of a problem if they don't find out the signature is wrong until the function gets called to validate data. (As opposed to during class initialization.)

Point taken. But unless we want to not support instance method validators we're going to have to deal with a signature including self. I think we can get it to work nicely using callable protocols and a bit of introspection in pydantic:

https://mypy-play.net/?mypy=latest&python=3.11&gist=a2ae207967b569eff9fee5960d65cd08

Pylance/Pyright

Having thought about this more I'm definitely more in favor of dropping value as an input if you are getting self. As long as that doesn't make the implementation more complicated. If it does, I'm fine with keeping value as an input to the function call.

It won't be much different, I'll change this to be (self, info)

dmontagu

Added some new notes and some comments on existing ones

dmontagu · 2023-03-15T20:33:14Z

pydantic_core/core_schema.py

+    )
+
+
+def function_plain_ser_schema_model(


Another minor naming thing — I would prefer method_plain_ser_schema or instance_method_plain_ser_schema over function_plain_ser_schema_model. Same for the wrap version.

Also not a strong opinion.

Would model_field_function_plain_ser_schema work? That's what I have over in #439

dmontagu · 2023-03-15T20:34:56Z

pydantic_core/core_schema.py

+    function: SerializePlainFunction | SerializePlainFunctionInstanceMethod,
+    json_return_type: JsonReturnTypes | None,
+    when_used: WhenUsed,
+    needs_model_instance: bool,


Minor naming thing, I think I would prefer is_instance_method as the name of this argument. Same for the wrap version below.

Not a strong opinion.

I've iterated through several. None seem great because I think fundamentally this is the wrong sorta abstraction (we should be calling the method on the instance or something so that the caller doesn't care). But I'm good with whatever, we should just make this and #439 match

dmontagu · 2023-03-15T20:41:40Z

pydantic_core/core_schema.py

+    needs_model_instance: bool
+
+
+def _function_plain_ser_schema(


I would prefer if there was something indicating that this function exists for the purpose of sharing logic between the other functions that call it. That could be just an update to the name (something like _common_function_plain_ser_schema), or could be a very brief docstring.

Without a more explicit indication, while skimming a file like this I get concerned that there's a broader convention I'm not aware of that would apply to the other schema types too, and I have to dig into the code enough to realize by actually understanding the code that this is a special case. With a little more naming clarity I can more quickly recognize that it's safe for me to not pay attention to this if it's not something I'm actively concerned with.

Same for the wrap version below.

I think I'll get rid of these functions and just call dict_not_none directly like I do in #439. It's a couple more LOC but worth avoiding the private function.

dmontagu · 2023-03-15T20:45:37Z

src/serializers/type_serializers/function.rs

-            self.func
-                .call1(py, (value, SerializationInfo::new(py, include, exclude, extra)))
+            if self.needs_model_instance {
+                self.func.call1(py, (extra.model.unwrap(), value, info))


Unless I'm missing something, I think it would be better to actually check that extra.model is not None; don't want a schema generation bug to cause a rust panic.

I'm not quite sure yet how the needs_model_instance and extra.model values get plumbed, but I almost think you could change if self.needs_model_instance to if let Some(model) = extra.model; if you do find a way to produce a scenario where these conditions aren't equivalent, I would think you might even get a useful python error (saying something to the effect of 'you didn't provide enough arguments', or 'you provided too many arguments').

dmontagu · 2023-03-15T20:48:28Z

src/serializers/type_serializers/function.rs

+        let info = SerializationInfo::new(py, include, exclude, extra);
        if self.when_used.should_use(value, extra) {
-            self.func
-                .call1(py, (value, SerializationInfo::new(py, include, exclude, extra)))
+            if self.needs_model_instance {
+                self.func.call1(py, (extra.model.unwrap(), value, info))
+            } else {
+                self.func.call1(py, (value, info))
+            }


I agree it's not worth trying to block. However, I think it's probably worth us being clear that we aren't making a lot of promises about the long-term stability of the pydantic-core api; ideally we could make changes in the future to block this if we find a better approach or whatever.

dmontagu · 2023-03-15T20:49:39Z

src/serializers/type_serializers/function.rs

@@ -185,6 +191,7 @@ pub struct FunctionWrapSerializer {
    function_name: String,
    json_return_ob_type: Option<ObType>,
    when_used: WhenUsed,
+    needs_model_instance: bool,


I think a bool is fine here, probably worth minimizing boilerplate for now considering this is nascent enough that it might still get significant changes for reasons we haven't thought of yet.

dmontagu · 2023-03-15T20:51:15Z

tests/serializers/test_functions.py

+        def ser_x(self, v: Any, serializer: core_schema.SerializeWrapHandler, _) -> str:
+            x = serializer(v)
+            return f'{x:_}'


Having thought about this more I'm definitely more in favor of dropping value as an input if you are getting self. As long as that doesn't make the implementation more complicated. If it does, I'm fine with keeping value as an input to the function call.

adriangb · 2023-03-17T22:55:13Z

Ok this has been rebased and rewritten to match the feedback and implementation in #439. @dmontagu @samuelcolvin please review when you can

pydantic_core/core_schema.py

adriangb commented Mar 12, 2023

View reviewed changes

adriangb mentioned this pull request Mar 13, 2023

Add ValidationInfo param in @validator & add backwards compat shims pydantic/pydantic#5153

Merged

dmontagu reviewed Mar 15, 2023

View reviewed changes

adriangb self-assigned this Mar 15, 2023

adriangb force-pushed the serializer-self branch from 7912122 to 7521456 Compare March 17, 2023 22:51

dmontagu reviewed Mar 17, 2023

View reviewed changes

pydantic_core/core_schema.py Show resolved Hide resolved

adriangb added 2 commits March 17, 2023 18:32

Pass field name and model into field serializers

6a0effd

update docstring

9b1acd2

adriangb force-pushed the serializer-self branch from 759c640 to 9b1acd2 Compare March 17, 2023 23:32

adriangb enabled auto-merge (squash) March 17, 2023 23:33

adriangb merged commit dfb5be6 into main Mar 17, 2023

adriangb deleted the serializer-self branch March 17, 2023 23:50

samuelcolvin pushed a commit that referenced this pull request Mar 20, 2023

Allow serializers to accept a model instance (#437)

18b8b97

Allow serializers to accept a model instance #437

Allow serializers to accept a model instance #437

Uh oh!

Conversation

adriangb commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

codecov-commenter commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmontagu Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmontagu Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmontagu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Mar 17, 2023

Uh oh!

Uh oh!

Uh oh!

adriangb commented Mar 12, 2023 •

edited

Loading

codspeed-hq bot commented Mar 12, 2023 •

edited

Loading

codecov-commenter commented Mar 12, 2023 •

edited

Loading

dmontagu Mar 13, 2023 •

edited

Loading

dmontagu Mar 13, 2023 •

edited

Loading