-
Notifications
You must be signed in to change notification settings - Fork 291
Allow serializers to accept a model instance #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #437 Summary
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #437 +/- ##
==========================================
- Coverage 94.90% 94.84% -0.07%
==========================================
Files 93 93
Lines 11157 11257 +100
Branches 22 25 +3
==========================================
+ Hits 10589 10677 +88
- Misses 563 573 +10
- Partials 5 7 +2
Continue to review full report in Codecov by Sentry.
|
pydantic_core/core_schema.py
Outdated
class SerializePlainFunctionInstanceMethod(Protocol): # pragma: no cover | ||
def __call__(self, __self: Any, __input_value: Any, __info: SerializationInfo) -> Any: | ||
... | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal with this is to keep pydantic-core strongly typed and free of introspection. This forces the caller to determine which type of function they are passing (a free standing function / bound class or static method or an unbound instance method).
let info = SerializationInfo::new(py, include, exclude, extra); | ||
if self.when_used.should_use(value, extra) { | ||
self.func | ||
.call1(py, (value, SerializationInfo::new(py, include, exclude, extra))) | ||
if self.needs_model_instance { | ||
self.func.call1(py, (extra.model.unwrap(), value, info)) | ||
} else { | ||
self.func.call1(py, (value, info)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is nothing stopping the model from being passed to a non-field serializer. That is, you could in theory do something like:
class CustomType(int):
def __pydantic_core_serializer__(self):
return core_schema. function_plain_ser_schema_model(lambda model, value, info: str(value))
class Model(BaseModel):
x: CustomType
(Or the equivalent with Annotated
once we support that).
And then the model
parameter in the lambda would be an instance of Model
, which is super weird.
But I don't think it's worth trying to block this, it would add complexity and I think users are judicious enough to not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not worth trying to block. However, I think it's probably worth us being clear that we aren't making a lot of promises about the long-term stability of the pydantic-core api; ideally we could make changes in the future to block this if we find a better approach or whatever.
@@ -185,6 +191,7 @@ pub struct FunctionWrapSerializer { | |||
function_name: String, | |||
json_return_ob_type: Option<ObType>, | |||
when_used: WhenUsed, | |||
needs_model_instance: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this as a bool
since there's only two options, it could be nicer as an enum but that'd be adding boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a bool is fine here, probably worth minimizing boilerplate for now considering this is nascent enough that it might still get significant changes for reasons we haven't thought of yet.
if self.allow_value(value, extra)? { | ||
let dict = object_to_dict(value, true, extra)?; | ||
self.serializer.to_python(dict, include, exclude, extra) | ||
let extra = extra.with_model(Some(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the comment above this keeps getting passed down until another model is encountered as is the case for data
in validators.
tests/serializers/test_functions.py
Outdated
def ser_x(self, v: Any, serializer: core_schema.SerializeWrapHandler, _) -> str: | ||
x = serializer(v) | ||
return f'{x:_}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having implemented this what comes across as a bit weird is that these functions accept self
and the value. If you have access to `self, why do you need us to pass in the value? Besides, if you get it from the model instance it will already be typed and it has a good name. It's pretty common for validators at least to duplicate the field name and type hint into the validator signature. Which makes me think it would make more sense to just not pass in the value here as a parameter.
# if you want to customize serialization for a type you can use Annotated
# or create a custom type with __pydantic_core_serializer__ implemented
MyInt = Annotated[int, PlainSerializer(lambda x, _: x * 2)]
class Model(BaseModel):
# if you don't need the model
# and want to customize serialization for just this field (not the type)
# you can do that using Annotated in the field type hint
# you get just the field's value
x: Annotated[int, PlainSerializer(lambda x, _: x * 2)]
# and can nest it inside a generic type
y: Dict[str, Annotated[int, PlainSerializer(lambda x, _: x * 2)]]
# wrap serializer applied to a field
z: Annotated[List[int], WrapSerializer(lambda x, ser, _: ser(item, idx) for idx, item in enumerate(x))]
# but if you want the whole model, either because
# you want to customize serialization based on some other attribute of the model
# (e.g. internationalization)
# you can use @serialize applied to an instance method
# you get the whole model but only serialize one field
# this gets applied at the `typed-dict-field` level
@serializer("x")
def ser_x(self, info: core_schema.SerializationInfo) -> str:
return f'{self.x:_}'
# you get the whole model and serialize the entire thing
# this gets applied at the `model` schema level
@serializer
def ser_model(self, info: core_schema.SerializationInfo) -> str:
return json.dumps(self.model_dict())
# you can also use a class or instance method
# which would be the same as using Annotated on the field
@serializer("x")
@staticmethod
def ser_x(x: int, info: core_schema.SerializationInfo) -> str:
return f'{self.x:_}'
Obviously we're not going to implement all of this for this PR. But I think what would make sense is to change this PR to support two signatures:
(__self, __info)
(__value, __info)
The two signatures are compatible, we'll have to "choose" one frompydantic
depending on (1) if it was done viaAnnotated
, a custom type or the@serialize
decorator and (2) in the case of the@serialize
decorator we'll introspect to detect if the first argument isself
orslf
and if so assume it's an unbound instance method, else assume it's a static or class method that accepts just the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmontagu your thoughts on this would be appreciated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a very strong opinion about what should be done here, but I will say that I don't fully buy into the "If you have access to self
, why do you need us to pass in the value? Besides, if you get it from the model instance it will already be typed and it has a good name." line of reasoning — in general, I would expect interacting with self
to be significantly less common than just interacting with the value alone, and I think it's okay if there's multiple ways of getting at the value if you do choose to interact with self
. (Again, that's not to argue for or against supporting instance methods, I see good reasons both for and against.)
I think ease-of-use/simplicity for users is significantly more important than "normalizing" the API, and to that end I'm inclined to try to reduce the number of supported signatures, rather than have extra logic/patterns for people to learn. I think my gut reaction is that the more options we expose, the more challenging it will be for typical users to get comfortable with the full spectrum of functionality (especially considering that, in my experience, most typical users are writing at most a handful of custom validators in their whole application, built over months/years).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, I would expect interacting with
self
to be significantly less common than just interacting with the value alone
I'm inclined to try to reduce the number of supported signatures, rather than have extra logic/patterns for people to learn.
Just to be clear, I think we're always going to have to support two signatures. It's either going to be (__self, __info)
and (__value, __info)
or (__self, __value, __info)
and (__value, __info)
. Or are you suggesting something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not understanding; I'm mostly just concerned with making sure that it's hard to find yourself in a confusing situation as a user (and to be clear — I don't have strong opinions beyond ensuring this is the case).
(EDIT: And yes, I suspect that (__self, __value, __info)
and (__value, __info)
would be a little more consistent from users' perspectives, though I guess it depends a lot on how it's implemented; in particular, my gut reaction is that I don't really like the idea that if you define a validator as a method you are forced to not use the form that accepts the value you are validating, but I'm not sure whether that's what you had in mind.)
When you say:
I think what would make sense is to change this PR to support two signatures:
- (__self, __info)
- (__value, __info)
I see the double-leading-underscore and think this means these variables will be treated as positional, and my concern is that whatever logic gets used to determine which one is used could result in surprises for the user and/or be something that is hard to discover without reading the docs. (Maybe that's unavoidable but the more "discoverable" we can make this sort of thing, the better, I think. For example, self
could live on info
— that would potentially make it more discoverable, though has the obvious shortcoming of losing type-hints. So I don't actually think that's a good alternative, just making the point. Perhaps a better alternative is just listing all the allowed signatures in the docstring for the validator decorator.)
Ultimately I think it's okay as long as users get early and clear feedback about what they've done wrong; I think it would be more of a problem if they don't find out the signature is wrong until the function gets called to validate data. (As opposed to during class initialization.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken. But unless we want to not support instance method validators we're going to have to deal with a signature including self
. I think we can get it to work nicely using callable protocols and a bit of introspection in pydantic
:
https://mypy-play.net/?mypy=latest&python=3.11&gist=a2ae207967b569eff9fee5960d65cd08
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about this more I'm definitely more in favor of dropping value
as an input if you are getting self
. As long as that doesn't make the implementation more complicated. If it does, I'm fine with keeping value
as an input to the function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be much different, I'll change this to be (self, info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some new notes and some comments on existing ones
pydantic_core/core_schema.py
Outdated
) | ||
|
||
|
||
def function_plain_ser_schema_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another minor naming thing — I would prefer method_plain_ser_schema
or instance_method_plain_ser_schema
over function_plain_ser_schema_model
. Same for the wrap version.
Also not a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would model_field_function_plain_ser_schema
work? That's what I have over in #439
pydantic_core/core_schema.py
Outdated
function: SerializePlainFunction | SerializePlainFunctionInstanceMethod, | ||
json_return_type: JsonReturnTypes | None, | ||
when_used: WhenUsed, | ||
needs_model_instance: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor naming thing, I think I would prefer is_instance_method
as the name of this argument. Same for the wrap version below.
Not a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've iterated through several. None seem great because I think fundamentally this is the wrong sorta abstraction (we should be calling the method on the instance or something so that the caller doesn't care). But I'm good with whatever, we should just make this and #439 match
pydantic_core/core_schema.py
Outdated
needs_model_instance: bool | ||
|
||
|
||
def _function_plain_ser_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if there was something indicating that this function exists for the purpose of sharing logic between the other functions that call it. That could be just an update to the name (something like _common_function_plain_ser_schema
), or could be a very brief docstring.
Without a more explicit indication, while skimming a file like this I get concerned that there's a broader convention I'm not aware of that would apply to the other schema types too, and I have to dig into the code enough to realize by actually understanding the code that this is a special case. With a little more naming clarity I can more quickly recognize that it's safe for me to not pay attention to this if it's not something I'm actively concerned with.
Same for the wrap version below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll get rid of these functions and just call dict_not_none
directly like I do in #439. It's a couple more LOC but worth avoiding the private function.
self.func | ||
.call1(py, (value, SerializationInfo::new(py, include, exclude, extra))) | ||
if self.needs_model_instance { | ||
self.func.call1(py, (extra.model.unwrap(), value, info)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, I think it would be better to actually check that extra.model is not None; don't want a schema generation bug to cause a rust panic.
I'm not quite sure yet how the needs_model_instance and extra.model values get plumbed, but I almost think you could change if self.needs_model_instance
to if let Some(model) = extra.model
; if you do find a way to produce a scenario where these conditions aren't equivalent, I would think you might even get a useful python error (saying something to the effect of 'you didn't provide enough arguments', or 'you provided too many arguments').
let info = SerializationInfo::new(py, include, exclude, extra); | ||
if self.when_used.should_use(value, extra) { | ||
self.func | ||
.call1(py, (value, SerializationInfo::new(py, include, exclude, extra))) | ||
if self.needs_model_instance { | ||
self.func.call1(py, (extra.model.unwrap(), value, info)) | ||
} else { | ||
self.func.call1(py, (value, info)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not worth trying to block. However, I think it's probably worth us being clear that we aren't making a lot of promises about the long-term stability of the pydantic-core api; ideally we could make changes in the future to block this if we find a better approach or whatever.
@@ -185,6 +191,7 @@ pub struct FunctionWrapSerializer { | |||
function_name: String, | |||
json_return_ob_type: Option<ObType>, | |||
when_used: WhenUsed, | |||
needs_model_instance: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a bool is fine here, probably worth minimizing boilerplate for now considering this is nascent enough that it might still get significant changes for reasons we haven't thought of yet.
tests/serializers/test_functions.py
Outdated
def ser_x(self, v: Any, serializer: core_schema.SerializeWrapHandler, _) -> str: | ||
x = serializer(v) | ||
return f'{x:_}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about this more I'm definitely more in favor of dropping value
as an input if you are getting self
. As long as that doesn't make the implementation more complicated. If it does, I'm fine with keeping value
as an input to the function call.
7912122
to
7521456
Compare
Ok this has been rebased and rewritten to match the feedback and implementation in #439. @dmontagu @samuelcolvin please review when you can |
759c640
to
9b1acd2
Compare
Closes #435