-
Notifications
You must be signed in to change notification settings - Fork 291
Add uuid validator #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add uuid validator #772
Conversation
CodSpeed Performance ReportMerging #772 will not alter performanceComparing Summary
Benchmarks breakdown
|
b80f4d6
to
69fa15e
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #772 +/- ##
==========================================
- Coverage 93.53% 93.50% -0.03%
==========================================
Files 100 102 +2
Lines 14376 14587 +211
Branches 25 25
==========================================
+ Hits 13446 13639 +193
- Misses 924 942 +18
Partials 6 6
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
please review |
99348cb
to
adcee23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This has been on my list of things to tackle so I'm grateful to have this picked up.
I've given a few initial thoughts. I guess we need to be careful to keep the pydantic
behaviour the same. Maybe create a corresponding PR in pydantic
which uses pydantic-core
from this branch so we can see how the whole thing fits together end-to-end?
Also while reading around I saw uuid-rs/uuid#406, so it sounds like we need to be careful with endianness with the integer conversions going on here.
I've opened this MR pydantic/pydantic#6711 to test integration @davidhewitt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Overall it looks like this implementation is good to me, I've added some final comments and then otherwise I think this is mergeable.
Before we merge it would be great to get the downstream PR in pydantic
passing against this git branch to prove that this behaves correctly end-to-end. That will also have the effect of getting us a patch ready to merge when we update pydantic
to pull in a future pydantic-core
with this validator.
@pytest.mark.benchmark(group='uuid str') | ||
def test_core_str(self, benchmark, uuid_str): | ||
v = SchemaValidator({'type': 'uuid'}) | ||
|
||
benchmark(v.validate_python, uuid_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to compare these benchmarks against the current Pydantic implementation (in Python), if you're willing to do that? That would confirm our assumption that migrating this to Rust is worth it :)
It should be possible to do that by building the same core schema as Pydantic currently does (maybe with some copy-paste) and feeding that to a validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, do you have an example to create this benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me update #763 later with a similar benchmark and get back to you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done d9e45c1, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! So I see we're about 33% faster to have a Rust implementation, makes sense to me as the heavy lifting in the Python UUID implementation will be implemented in C anyway; what we're shaving off is the dynamic overheads of checking & constructing a UUID.
Looks great, please give me a bit of time to produce an example benchmark for the decimal case and loop back here! |
f16aed7
to
4cc1c93
Compare
(responding to comment about Yes, we would need to keep hex string support if the pydantic test suite already suggests we accept hex. I've used the |
I deleted the comment, I made a mistake the line of code works fine, sorry |
9cfb7eb
to
da1f5f6
Compare
@davidhewitt I fixed a regression in this commit 8dffaeb (compare version when it's an UUID object) |
Co-authored-by: David Hewitt <[email protected]>
da1f5f6
to
79eb871
Compare
e85c501
to
42e40c0
Compare
42e40c0
to
98f4a4a
Compare
I've seen another regression with this python input None => {
let either_bytes = input
.validate_bytes(true)
.map_err(|_| ValError::new(ErrorType::UuidType, input))?;
let py_object = either_bytes.into_py(py);
let py_bytes = py_object.downcast::<PyBytes>(py)?;
if py_bytes.len()? > 16 {
let uuid_str = from_utf8(py_bytes.as_bytes()).map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?;
Uuid::parse_str(&uuid_str)
.map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?
} else {
Uuid::from_slice(py_bytes.as_bytes())
.map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?
}
} |
Looks like this is the relevant equivalent in our Python implementation: try:
return UUID(input_value.decode())
except ValueError:
# 16 bytes in big-endian order as the bytes argument fail
# the above check
return UUID(bytes=input_value) So we currently try to convert bytes to a string, and if that fails, parse as bytes. I think what you're doing with size is probably better, maybe the following is most efficient to get the data into Rust fastest: // Add this method to get the data
impl<'a> EitherBytes<'a> {
pub fn as_slice(&'a self) -> &[u8] {
match self {
EitherBytes::Cow(bytes) => Ok(bytes),
EitherBytes::Py(py_bytes) => py_bytes.as_bytes(),
}
}
}
// and later on
let either_bytes = input
.validate_bytes(true)
.map_err(|_| ValError::new(ErrorType::UuidType, input))?;
let bytes_slice = either_bytes.as_bytes();
let uuid = 'parse: {
// Try parsing as utf8, but don't care if it fails
if let Ok(utf8_str) = from_utf8(bytes_slice) {
if let Ok(uuid) = Uuid::parse_str(utf8_str) {
break 'parse uuid;
}
}
Uuid::from_slice(bytes_slice)
.map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, thanks for all the effort on this. I think we're pretty much feature complete here now?
I have just one observation about a serializer branch that I think needs adjusting, otherwise let's merge! We can make a pydantic-core
release today and then integrate this upstream into pydantic
shortly after 🚀
src/serializers/infer.rs
Outdated
@@ -187,6 +188,7 @@ pub(crate) fn infer_to_python_known( | |||
let py_url: PyMultiHostUrl = value.extract()?; | |||
py_url.__str__().into_py(py) | |||
} | |||
ObType::Uuid => serialize_dict(uuid_to_dict(value)?)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be uuid_to_string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, WDYT?
src/serializers/infer.rs
Outdated
@@ -483,6 +485,7 @@ pub(crate) fn infer_serialize_known<S: Serializer>( | |||
pydantic_serializer.serialize(serializer) | |||
} | |||
ObType::Dataclass => serialize_dict!(dataclass_to_dict(value).map_err(py_err_se_err)?), | |||
ObType::Uuid => serialize_dict!(uuid_to_dict(value).map_err(py_err_se_err)?), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here I think we just want the string serialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/serializers/shared.rs
Outdated
pub(super) fn uuid_to_dict(dc: &PyAny) -> PyResult<&PyDict> { | ||
let py = dc.py(); | ||
let dc_slots: &PyDict = dc.getattr(intern!(py, "__slots__"))?.downcast()?; | ||
let dict = PyDict::new(py); | ||
|
||
for (field_name, _) in dc_slots.iter() { | ||
let field_name: &PyString = field_name.downcast()?; | ||
dict.set_item(field_name, dc.getattr(field_name)?)?; | ||
} | ||
Ok(dict) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And in general I guess I don't understand why we'd want this conversion instead of just going to string?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do the same as dataclass, would you rather transform into a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks so much for all your work on it!
Change Summary
Adds a Rust implementation of UUID validation and serialization.
Related issue number
Resolves #554
Checklist
pydantic-core
(except for expected changes)Selected Reviewer: @davidhewitt