Add uuid validator #772

JeanArhancet · 2023-07-14T07:17:26Z

Change Summary

Adds a Rust implementation of UUID validation and serialization.

Related issue number

Resolves #554

Checklist

Unit tests for the changes exist
Documentation reflects the changes where applicable
Pydantic tests pass with this pydantic-core (except for expected changes)
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @davidhewitt

codspeed-hq · 2023-07-14T07:24:54Z

CodSpeed Performance Report

Merging #772 will not alter performance

_{Comparing JeanArhancet:feat/add-uuid-validator (de5c8bf) with main (f38bea6)}

Summary

✅ 126 untouched benchmarks

🆕 9 new benchmarks

Benchmarks breakdown

	Benchmark	`main`	`JeanArhancet:feat/add-uuid-validator`	Change
🆕	`test_uuid_from_string_core`	N/A	60.6 µs	N/A
🆕	`test_uuid_from_string_pyd`	N/A	87.5 µs	N/A
🆕	`test_uuid_from_uuid_core`	N/A	14.9 µs	N/A
🆕	`test_uuid_from_uuid_pyd`	N/A	15.8 µs	N/A
🆕	`test_core_python`	N/A	41.3 µs	N/A
🆕	`test_model_core_json`	N/A	85.6 µs	N/A
🆕	`test_core_raw`	N/A	14.9 µs	N/A
🆕	`test_core_str`	N/A	60.7 µs	N/A
🆕	`test_uuid`	N/A	66.3 µs	N/A

codecov · 2023-07-16T19:04:27Z

Codecov Report

Merging #772 (de5c8bf) into main (9b540d0) will decrease coverage by 0.03%.
The diff coverage is 88.62%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #772      +/-   ##
==========================================
- Coverage   93.53%   93.50%   -0.03%     
==========================================
  Files         100      102       +2     
  Lines       14376    14587     +211     
  Branches       25       25              
==========================================
+ Hits        13446    13639     +193     
- Misses        924      942      +18     
  Partials        6        6

Impacted Files	Coverage Δ
src/serializers/shared.rs	`80.54% <0.00%> (-0.44%)`	⬇️
src/serializers/infer.rs	`95.39% <33.33%> (-1.56%)`	⬇️
src/errors/types.rs	`84.23% <77.77%> (-0.16%)`	⬇️
src/input/return_enums.rs	`85.08% <80.00%> (-0.05%)`	⬇️
src/validators/uuid.rs	`90.35% <90.35%> (ø)`
src/serializers/type_serializers/uuid.rs	`94.00% <94.00%> (ø)`
python/pydantic_core/core_schema.py	`96.70% <100.00%> (+0.03%)`	⬆️
src/input/input_python.rs	`98.11% <100.00%> (+0.01%)`	⬆️
src/serializers/ob_type.rs	`86.03% <100.00%> (+0.65%)`	⬆️
src/validators/function.rs	`90.65% <100.00%> (ø)`
... and 1 more

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f38bea6...de5c8bf. Read the comment docs.

JeanArhancet · 2023-07-17T07:31:37Z

please review

davidhewitt

Thanks! This has been on my list of things to tackle so I'm grateful to have this picked up.

I've given a few initial thoughts. I guess we need to be careful to keep the pydantic behaviour the same. Maybe create a corresponding PR in pydantic which uses pydantic-core from this branch so we can see how the whole thing fits together end-to-end?

Also while reading around I saw uuid-rs/uuid#406, so it sounds like we need to be careful with endianness with the integer conversions going on here.

src/validators/uuid.rs

tests/serializers/test_uuid.py

src/serializers/type_serializers/uuid.rs

src/validators/uuid.rs

JeanArhancet · 2023-07-17T12:17:35Z

Thanks! This has been on my list of things to tackle so I'm grateful to have this picked up.

I've given a few initial thoughts. I guess we need to be careful to keep the pydantic behaviour the same. Maybe create a corresponding PR in pydantic which uses pydantic-core from this branch so we can see how the whole thing fits together end-to-end?

Also while reading around I saw uuid-rs/uuid#406, so it sounds like we need to be careful with endianness with the integer conversions going on here.

I've opened this MR pydantic/pydantic#6711 to test integration @davidhewitt

src/validators/uuid.rs

davidhewitt

Thanks! Overall it looks like this implementation is good to me, I've added some final comments and then otherwise I think this is mergeable.

Before we merge it would be great to get the downstream PR in pydantic passing against this git branch to prove that this behaves correctly end-to-end. That will also have the effect of getting us a patch ready to merge when we update pydantic to pull in a future pydantic-core with this validator.

src/validators/uuid.rs

tests/validators/test_uuid.py

davidhewitt · 2023-07-18T09:12:22Z

tests/benchmarks/test_micro_benchmarks.py

+    @pytest.mark.benchmark(group='uuid str')
+    def test_core_str(self, benchmark, uuid_str):
+        v = SchemaValidator({'type': 'uuid'})
+
+        benchmark(v.validate_python, uuid_str)


It would be great to compare these benchmarks against the current Pydantic implementation (in Python), if you're willing to do that? That would confirm our assumption that migrating this to Rust is worth it :)

It should be possible to do that by building the same core schema as Pydantic currently does (maybe with some copy-paste) and feeding that to a validator.

Good idea, do you have an example to create this benchmark?

Let me update #763 later with a similar benchmark and get back to you :)

Done d9e45c1, WDYT?

Looks great! So I see we're about 33% faster to have a Rust implementation, makes sense to me as the heavy lifting in the Python UUID implementation will be implemented in C anyway; what we're shaving off is the dynamic overheads of checking & constructing a UUID.

davidhewitt · 2023-07-18T11:09:57Z

Looks great, please give me a bit of time to produce an example benchmark for the decimal case and loop back here!

davidhewitt · 2023-07-18T16:22:52Z

(responding to comment about hex, which I can't find above)

Yes, we would need to keep hex string support if the pydantic test suite already suggests we accept hex. I've used the hex crate before, should be suitable.

JeanArhancet · 2023-07-18T16:34:15Z

(responding to comment about hex, which I can't find above)

Yes, we would need to keep hex string support if the pydantic test suite already suggests we accept hex. I've used the hex crate before, should be suitable.

I deleted the comment, I made a mistake the line of code works fine, sorry

src/validators/uuid.rs

JeanArhancet · 2023-07-19T18:35:51Z

@davidhewitt I fixed a regression in this commit 8dffaeb (compare version when it's an UUID object)

Co-authored-by: David Hewitt <[email protected]>

JeanArhancet · 2023-07-20T14:29:26Z

I've seen another regression with this python input b'ebcdab58-6eb8-46fb-a190-d07a33e9eac8'. Do you have any idea how to transform this? I can add a size condition but maybe there's a better solution:

 None => {
                let either_bytes = input
                    .validate_bytes(true)
                    .map_err(|_| ValError::new(ErrorType::UuidType, input))?;
                let py_object = either_bytes.into_py(py);
                let py_bytes = py_object.downcast::<PyBytes>(py)?;
                if py_bytes.len()? > 16 {
                    let uuid_str = from_utf8(py_bytes.as_bytes()).map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?;
                    Uuid::parse_str(&uuid_str)
                        .map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?
                } else {
                    Uuid::from_slice(py_bytes.as_bytes())
                        .map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?
                }
            }

davidhewitt · 2023-07-20T15:27:00Z

Looks like this is the relevant equivalent in our Python implementation:

try:
    return UUID(input_value.decode())
except ValueError:
    # 16 bytes in big-endian order as the bytes argument fail
    # the above check
    return UUID(bytes=input_value)

So we currently try to convert bytes to a string, and if that fails, parse as bytes.

I think what you're doing with size is probably better, maybe the following is most efficient to get the data into Rust fastest:

// Add this method to get the data
impl<'a> EitherBytes<'a> {
    pub fn as_slice(&'a self) -> &[u8] {
        match self {
            EitherBytes::Cow(bytes) => Ok(bytes),
            EitherBytes::Py(py_bytes) => py_bytes.as_bytes(),
        }
    }
}

// and later on
let either_bytes = input
    .validate_bytes(true)
    .map_err(|_| ValError::new(ErrorType::UuidType, input))?;
let bytes_slice = either_bytes.as_bytes();
let uuid = 'parse: {
    // Try parsing as utf8, but don't care if it fails
    if let Ok(utf8_str) = from_utf8(bytes_slice) {
        if let Ok(uuid) = Uuid::parse_str(utf8_str) {
            break 'parse uuid;
        }
    }
    Uuid::from_slice(bytes_slice)
        .map_err(|e| ValError::new(ErrorType::UuidParsing { error: e.to_string() }, input))?
}

davidhewitt

Sorry for the delay, thanks for all the effort on this. I think we're pretty much feature complete here now?

I have just one observation about a serializer branch that I think needs adjusting, otherwise let's merge! We can make a pydantic-core release today and then integrate this upstream into pydantic shortly after 🚀

davidhewitt · 2023-07-24T09:28:54Z

src/serializers/infer.rs

@@ -187,6 +188,7 @@ pub(crate) fn infer_to_python_known(
                let py_url: PyMultiHostUrl = value.extract()?;
                py_url.__str__().into_py(py)
            }
+            ObType::Uuid => serialize_dict(uuid_to_dict(value)?)?,


Should this just be uuid_to_string?

Done, WDYT?

davidhewitt · 2023-07-24T09:29:09Z

src/serializers/infer.rs

@@ -483,6 +485,7 @@ pub(crate) fn infer_serialize_known<S: Serializer>(
            pydantic_serializer.serialize(serializer)
        }
        ObType::Dataclass => serialize_dict!(dataclass_to_dict(value).map_err(py_err_se_err)?),
+        ObType::Uuid => serialize_dict!(uuid_to_dict(value).map_err(py_err_se_err)?),


Similarly here I think we just want the string serialized?

davidhewitt · 2023-07-24T09:30:03Z

src/serializers/shared.rs

+pub(super) fn uuid_to_dict(dc: &PyAny) -> PyResult<&PyDict> {
+    let py = dc.py();
+    let dc_slots: &PyDict = dc.getattr(intern!(py, "__slots__"))?.downcast()?;
+    let dict = PyDict::new(py);
+
+    for (field_name, _) in dc_slots.iter() {
+        let field_name: &PyString = field_name.downcast()?;
+        dict.set_item(field_name, dc.getattr(field_name)?)?;
+    }
+    Ok(dict)
+}


(And in general I guess I don't understand why we'd want this conversion instead of just going to string?)

I tried to do the same as dataclass, would you rather transform into a string?

davidhewitt

This looks great to me, thanks so much for all your work on it!

JeanArhancet changed the title ~~feat: add uuid validator~~ Add uuid validator Jul 14, 2023

JeanArhancet marked this pull request as draft July 14, 2023 07:17

JeanArhancet force-pushed the feat/add-uuid-validator branch from b80f4d6 to 69fa15e Compare July 16, 2023 19:03

JeanArhancet marked this pull request as ready for review July 17, 2023 07:27

pydantic-hooky bot added the ready for review label Jul 17, 2023

pydantic-hooky bot assigned davidhewitt Jul 17, 2023

JeanArhancet force-pushed the feat/add-uuid-validator branch from 99348cb to adcee23 Compare July 17, 2023 08:09

davidhewitt reviewed Jul 17, 2023

View reviewed changes

JeanArhancet mentioned this pull request Jul 17, 2023

Add pydantic core uuid pydantic/pydantic#6711

Closed

5 tasks

davidhewitt reviewed Jul 17, 2023

View reviewed changes

davidhewitt mentioned this pull request Jul 17, 2023

Add UUID validator #584

Closed

davidhewitt reviewed Jul 18, 2023

View reviewed changes

JeanArhancet force-pushed the feat/add-uuid-validator branch from f16aed7 to 4cc1c93 Compare July 18, 2023 14:48

davidhewitt reviewed Jul 19, 2023

View reviewed changes

src/validators/uuid.rs Outdated Show resolved Hide resolved

JeanArhancet force-pushed the feat/add-uuid-validator branch from 9cfb7eb to da1f5f6 Compare July 19, 2023 18:34

JeanArhancet added 7 commits July 19, 2023 20:45

feat: add uuid validator

d67d20b

refactor: serializer and validators

cc00408

feat: add benchmarks

acc9928

refactor: infer uuid

7b364a2

fix: apply feedback

a43ec76

refactor: supports bytes and string

1a78c6f

refactor: create py uid and add tests

7e72b6d

JeanArhancet and others added 2 commits July 19, 2023 20:45

fix: check version if uuid object

932facb

Update src/validators/uuid.rs

79eb871

Co-authored-by: David Hewitt <[email protected]>

JeanArhancet force-pushed the feat/add-uuid-validator branch from da1f5f6 to 79eb871 Compare July 19, 2023 18:55

test: add benchmarck core-pydantic

ff17506

JeanArhancet force-pushed the feat/add-uuid-validator branch 3 times, most recently from e85c501 to 42e40c0 Compare July 20, 2023 11:41

refactor: error message uuid version

98f4a4a

JeanArhancet force-pushed the feat/add-uuid-validator branch from 42e40c0 to 98f4a4a Compare July 20, 2023 12:33

refactor: parse bytes

025998c

davidhewitt reviewed Jul 24, 2023

View reviewed changes

refactor: uuid to string

de5c8bf

davidhewitt approved these changes Jul 24, 2023

View reviewed changes

davidhewitt merged commit c18c875 into pydantic:main Jul 24, 2023

This was referenced Jul 20, 2024

use little endian int fft001/pydantic-core#1

Open

use little endian int #1372

Merged

Add uuid validator #772

Add uuid validator #772

Uh oh!

Conversation

JeanArhancet commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

Related issue number

Checklist

Uh oh!

codspeed-hq bot commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #772 will not alter performance

Summary

Benchmarks breakdown

Uh oh!

codecov bot commented Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JeanArhancet commented Jul 17, 2023

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JeanArhancet commented Jul 17, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidhewitt commented Jul 18, 2023

Uh oh!

davidhewitt commented Jul 18, 2023

Uh oh!

JeanArhancet commented Jul 18, 2023

Uh oh!

Uh oh!

JeanArhancet commented Jul 19, 2023

Uh oh!

JeanArhancet commented Jul 20, 2023

Uh oh!

davidhewitt commented Jul 20, 2023

Uh oh!

davidhewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

JeanArhancet commented Jul 14, 2023 •

edited

Loading

codspeed-hq bot commented Jul 14, 2023 •

edited

Loading

codecov bot commented Jul 16, 2023 •

edited

Loading