Skip to content

Add UUID validator #584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

martinabeleda
Copy link

@martinabeleda martinabeleda commented May 5, 2023

Closes #554

@codspeed-hq
Copy link

codspeed-hq bot commented May 5, 2023

CodSpeed Performance Report

Merging #584 martinabeleda:uuid-validator (3562f0c) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 114 untouched benchmarks

🆕 3 new benchmarks
⁉️ 0 dropped benchmarks

Benchmarks breakdown

Benchmark main martinabeleda:uuid-validator Change
🆕 test_core_str N/A 19.7 µs N/A
🆕 test_core_root_model N/A 113.6 µs N/A
🆕 test_model_core_json N/A 48.5 µs N/A

@samuelcolvin
Copy link
Member

Before we get too far on this, it would be good to see the performance difference between this and the python implementation we currently have in pydantic.

@codecov-commenter
Copy link

codecov-commenter commented May 5, 2023

Codecov Report

Merging #584 (3562f0c) into main (4767abf) will decrease coverage by 0.42%.
The diff coverage is 61.97%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #584      +/-   ##
==========================================
- Coverage   94.12%   93.71%   -0.42%     
==========================================
  Files          96       99       +3     
  Lines       13021    13292     +271     
  Branches       24       24              
==========================================
+ Hits        12256    12456     +200     
- Misses        759      830      +71     
  Partials        6        6              
Impacted Files Coverage Δ
pydantic_core/__init__.py 81.81% <ø> (ø)
src/input/input_abstract.rs 85.29% <0.00%> (-2.59%) ⬇️
src/serializers/shared.rs 89.65% <ø> (ø)
src/serializers/type_serializers/uuid.rs 15.55% <15.55%> (ø)
src/validators/uuid.rs 57.14% <57.14%> (ø)
src/errors/types.rs 84.29% <84.61%> (+0.01%) ⬆️
src/uuid.rs 94.54% <94.54%> (ø)
pydantic_core/core_schema.py 96.99% <100.00%> (+0.03%) ⬆️
src/input/input_python.rs 98.01% <100.00%> (-0.73%) ⬇️
src/lib.rs 100.00% <100.00%> (ø)
... and 1 more

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4767abf...3562f0c. Read the comment docs.

@martinabeleda
Copy link
Author

martinabeleda commented May 6, 2023

I implemented some benchmarks in dba435a. Here's the results:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃  Group                           ┃  Test Name                    ┃             Best (ns/iter)  ┃         Relative  ┃              Stddev (ns/iter)  ┃          Iterations  ┃  Note      ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│  uuid model - JSON               │  model_core_json              │                      490.0  │                   │                          4.98  │           1,000,000  │            │
│                                  │  model_pyd_json               │                    2_676.3  │            x5.46  │                         62.75  │           1,000,000  │            │
├──────────────────────────────────┼───────────────────────────────┼─────────────────────────────┼───────────────────┼────────────────────────────────┼──────────────────────┼────────────┤
│  uuid str                        │  core_str                     │                      131.0  │                   │                          2.65  │           1,000,000  │            │
└──────────────────────────────────┴───────────────────────────────┴─────────────────────────────┴───────────────────┴────────────────────────────────┴──────────────────────┴────────────┘

I also assume we'd want to do something similar to datetime where we can parse strings and python objects?

edit: In saying that, I can't see a type for uuid.UUID in pyo3.types

@martinabeleda martinabeleda marked this pull request as ready for review May 6, 2023 18:21
Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in principle looks good.

But we need a significant change to use the standard lib UUID type, not a new rust type.

def variant(self) -> str: ...
@property
def version(self) -> str: ...
def __init__(self, uuid: str) -> None: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ should come first.

@@ -3503,6 +3503,38 @@ def multi_host_url_schema(
)


class UuidSchema(TypedDict, total=False):
type: Required[Literal['uuid']]
version: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this should be something like Literal[3, 4, 5] - not sure exactly what's allowed, I guess that's kind of the point of using a Literal.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at least have Literal[1, 3, 4, 5] | None so that it's possible to not enforce checking of the version. There are cases, e.g. the Nil UUID or Max UUID, where it may be desirable to not enforce a version.

Note that Python doesn't support generating UUID v2 and nor does the rust uuid crate, but perhaps that should still be included as a valid instance could still be instantiated for a v2 UUID. In addition there is an update in the works to introduce v6, v7 and v8. Not sure whether it's worth adding a comment that this might need looking into in the future? Interestingly the uuid crate does already have preliminary support for generating them, even if Python does not.

```

Args:
version: The UUID version number as per RFC 4122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say what it defaults to if ommitted.

@@ -3503,6 +3503,38 @@ def multi_host_url_schema(
)


class UuidSchema(TypedDict, total=False):
type: Required[Literal['uuid']]
version: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this should be something like Literal[3, 4, 5] - not sure exactly what's allowed, I guess that's kind of the point of using a Literal.

@@ -533,6 +547,9 @@ impl ErrorType {
Self::UrlSyntaxViolation {..} => "Input violated strict URL syntax rules, {error}",
Self::UrlTooLong {..} => "URL should have at most {max_length} characters",
Self::UrlScheme {..} => "URL scheme should be {expected_schemes}",
Self::UuidType => "Input should be a string",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Self::UuidType => "Input should be a string",
Self::UuidType => "UUID Input should be a string or UUID object",

or something, you'll also want a custom error message for JSON, see the function below.


static SCHEMA_DEFINITION_UUID: GILOnceCell<SchemaValidator> = GILOnceCell::new();

#[pyclass(name = "Uuid", module = "pydantic_core._pydantic_core")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a custom type here - the standard library has an existing UUID implementation that we should leverage.

}
}

impl TypeSerializer for UuidSerializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need this, the existing ToStringSerializer should probably surfice.

}
}

impl TypeSerializer for UuidSerializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need this, the existing ToStringSerializer should probably surfice.

}
}

impl TypeSerializer for UuidSerializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need this, the existing ToStringSerializer should probably suffice.

@samuelcolvin
Copy link
Member

Closing as no response.

@samuelcolvin
Copy link
Member

To get fast UUID validation, we should do the following:

  • perform the UUID validation in Rust
  • then construct the UUID instance directly by setting the attributes, like here

@davidhewitt
Copy link
Contributor

I think this has now been superseded by #772.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a built-in validator for UUID
5 participants