Skip to content

Check iterable max_length as we validate #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

Check iterable max_length as we validate #602

wants to merge 17 commits into from

Conversation

adriangb
Copy link
Member

@adriangb adriangb commented May 12, 2023

This is set up so that it will work for all iterables, but I'm only implementing it for lists in this PR to keep things self-contained. That also means the diff is much bigger than it needs to be since I'm not deleting the code this is replacing (I estimate it's about a 1:1 LOC swap).

Selected Reviewer: @dmontagu

@codspeed-hq
Copy link

codspeed-hq bot commented May 12, 2023

CodSpeed Performance Report

Merging #602 any-iterable (33af7fa) will degrade performances by 41.4%.

Summary

🔥 2 improvements
❌ 9 regressions
✅ 107 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main any-iterable Change
test_complete_core_lax 1.3 ms 1.5 ms -12.32%
test_complete_core_strict 1.5 ms 1.7 ms -12.66%
🔥 test_complete_core_error 8.5 ms 6.6 ms 23.20%
🔥 test_complete_core_isinstance 8.4 ms 6.4 ms 23.73%
test_list_of_any_core_py 25 µs 1,211.3 µs -4736.39%
test_set_of_ints_core 9.6 ms 10.8 ms -12.37%
test_set_of_ints_core_duplicates 7.8 ms 8.7 ms -11.94%
test_set_of_ints_core_json 6 ms 7 ms -15.61%
test_set_of_ints_core_json_duplicates 4.6 ms 5.1 ms -10.35%
test_frozenset_of_ints_core 1.9 ms 2.6 ms -36.73%
test_frozenset_of_ints_duplicates_core 1.1 ms 1.5 ms -34.39%

@codecov-commenter
Copy link

codecov-commenter commented May 12, 2023

Codecov Report

Merging #602 (f36e34c) into main (47c2df1) will decrease coverage by 0.49%.
The diff coverage is 90.77%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #602      +/-   ##
==========================================
- Coverage   94.04%   93.55%   -0.49%     
==========================================
  Files          98      100       +2     
  Lines       13177    13454     +277     
  Branches       25       25              
==========================================
+ Hits        12392    12587     +195     
- Misses        779      861      +82     
  Partials        6        6              
Impacted Files Coverage Δ
pydantic_core/core_schema.py 96.87% <ø> (-0.01%) ⬇️
src/input/input_abstract.rs 81.70% <ø> (-3.59%) ⬇️
src/input/mod.rs 100.00% <ø> (ø)
src/input/return_enums.rs 60.30% <ø> (-30.58%) ⬇️
src/input/generic_iterable.rs 55.63% <55.63%> (ø)
src/input/input_python.rs 98.25% <96.55%> (-0.47%) ⬇️
src/validators/frozenset.rs 98.88% <98.30%> (+0.84%) ⬆️
src/validators/set.rs 99.06% <98.41%> (+0.41%) ⬆️
src/validators/list.rs 98.40% <98.52%> (-1.60%) ⬇️
src/validators/tuple.rs 98.54% <98.84%> (+<0.01%) ⬆️
... and 3 more

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47c2df1...f36e34c. Read the comment docs.

@adriangb

This comment was marked as outdated.

@adriangb
Copy link
Member Author

I see what was going on: we originally didn't even copy the data for list[Any], but this new implementation does copy it. I would argue that we should take the performance hit and copy the data so that we are always consistent in doing so. list[Any] is probably also not even that common, and it's already going to be faster than doing any sort of validation.

@adriangb
Copy link
Member Author

please review

Copy link
Collaborator

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issues with what I see here, though think @samuelcolvin should also review

@adriangb
Copy link
Member Author

For the record, somewhat unrelated to this PR, I tried both VecDeque and unsafe PyList usage (both for initialization and setting items, skipping bounds checks, etc.) and neither is really faster than safe Vec usage.

@adriangb adriangb marked this pull request as draft May 14, 2023 00:54
Comment on lines -31 to +41
([], Err('Input should be a valid dictionary [type=dict_type,')),
([('x', 'y')], Err('Input should be a valid dictionary [type=dict_type,')),
([('x', 'y'), ('z', 'z')], Err('Input should be a valid dictionary [type=dict_type,')),
((), Err('Input should be a valid dictionary [type=dict_type,')),
((('x', 'y'),), Err('Input should be a valid dictionary [type=dict_type,')),
((type('Foobar', (), {'x': 1})()), Err('Input should be a valid dictionary [type=dict_type,')),
([], {}),
([('x', '1')], {'x': 1}),
([('x', '1'), ('z', b'2')], {'x': 1, 'z': 2}),
((), {}),
((('x', '1'),), {'x': 1}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all accepted now. I also changed the value type to be int so that we know we're not swapping keys and values validators.

Err('Frozenset should have at most 30 items after validation, not 31 [type=too_long,'),
Err('Frozenset should have at most 3 items after validation, not >= 4 [type=too_long,'),
Copy link
Member Author

@adriangb adriangb May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because it used to hit the default 10x max_length "input" size, now we're measuring output size, kind of the whole point of this PR.

Comment on lines -52 to -61
([1, 2, '3'], [1, 2, 3]),
((1, 2, '3'), [1, 2, 3]),
(deque((1, 2, '3')), [1, 2, 3]),
({1, 2, '3'}, Err('Input should be a valid list [type=list_type,')),
(gen_ints(), [1, 2, 3]),
(frozenset({1, 2, '3'}), Err('Input should be a valid list [type=list_type,')),
({1: 10, 2: 20, '3': '30'}.keys(), [1, 2, 3]),
({1: 10, 2: 20, '3': '30'}.values(), [10, 20, 30]),
({1: 10, 2: 20, '3': '30'}, Err('Input should be a valid list [type=list_type,')),
((x for x in [1, 2, '3']), [1, 2, 3]),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are covered above

Comment on lines -89 to +184
@pytest.mark.parametrize(
'input_value,expected',
[
([], []),
([1, '2', b'3'], [1, '2', b'3']),
(frozenset([1, '2', b'3']), Err('Input should be a valid list [type=list_type,')),
((), []),
((1, '2', b'3'), [1, '2', b'3']),
(deque([1, '2', b'3']), [1, '2', b'3']),
({1, '2', b'3'}, Err('Input should be a valid list [type=list_type,')),
],
)
@pytest.mark.parametrize('input_value,expected', [([], []), ([1, '2', b'3'], [1, '2', b'3'])])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by tests above since we no longer special case "any" item validators.

),
],
)
def test_allow_any_iter(input_value, expected):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allow_any_iter parameter is now a no-op, I'll remove it in a followup PR.

Comment on lines -147 to +149
({1, 2, '3'}, Err('Input should be a valid tuple [type=tuple_type,')),
(frozenset([1, 2, '3']), Err('Input should be a valid tuple [type=tuple_type,')),
({1, 2, '3'}, AnyOf([tuple(o) for o in sorted(itertools.permutations([1, 2, 3]))])),
(frozenset([1, 2, '3']), AnyOf([tuple(o) for o in sorted(itertools.permutations([1, 2, 3]))])),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accounting for the fact that the resultant tuple can come out in any order.

@adriangb adriangb force-pushed the any-iterable branch 2 times, most recently from 6cc77e6 to 174c3d4 Compare May 16, 2023 15:16
@adriangb adriangb marked this pull request as ready for review May 16, 2023 17:37
@@ -509,7 +509,7 @@ def test_multiple_tuple_recursion(multiple_tuple_schema: SchemaValidator):
with pytest.raises(ValidationError) as exc_info:
multiple_tuple_schema.validate_python({'f1': data, 'f2': data})

assert exc_info.value.errors(include_url=False) == [
assert exc_info.value.errors(include_url=False) == Contains(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to only return one of (1) the errors that occurred for each item (e.g. this recursion error) or (2) the length constraint if the result was too short. This was inconsistent across tuples and lists. Further, having both is helpful: with just one or the other you kind of have to guess why tuple[int] was too short when you gave it ['a'] which clearly has 1 item or vice versa.

I'm now returning both the errors for each item and the final failure of the length constraint, which @dmontagu had suggested previously.

Comment on lines +236 to +240
assert exc_info.value.errors(include_url=False) == Contains(
{'type': 'missing', 'loc': (1,), 'msg': 'Field required', 'input': [1]},
{'type': 'missing', 'loc': (2,), 'msg': 'Field required', 'input': [1]},
{'type': 'missing', 'loc': (3,), 'msg': 'Field required', 'input': [1]},
]
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the comment in test_definitinos_recursive.py we are now including not just the error for each location but also the final length constraint failure, so I'm using Contains to ignore the length error since that's not what matters here.

@adriangb
Copy link
Member Author

Please review

@adriangb
Copy link
Member Author

I tested this with pydantic and it works, just requires a couple of error message changes. I might even revert some of them in this PR to keep things simple

@dmontagu
Copy link
Collaborator

dmontagu commented May 16, 2023

Do you feel the performance consequences are necessary and/or negligible? Seems a lot of the benchmarks have gotten worse (looking at the codspeed output at the top of the PR, and assuming that's up-to-date), but I don't know how practically significant it is.. ?

(Happy to ignore the list[Any] thing, but now it seems there are many that have gotten noticeably worse, even without Any being referenced in the name)

@adriangb
Copy link
Member Author

adriangb commented May 16, 2023

Unfortunately, I think they are inherent to "doing work on every iteration" instead of doing work just at the end, and also using PySet instead of a Vec for sets.

I do think we may be able to improve this down the road. I found that test_list_of_nullable_core got back to baseline if I uncommented this:

if let Some(max_length) = self.max_input_length {
self.check_max_length(self.input_length, max_length, input)?;
}
if let Some(max_length) = self.max_length {
self.check_max_length(self.output_length + self.errors.len(), max_length, input)?;
}

This is weird because that should be very cheap to do, even if it's happening for each item. All I can guess is that it's causing differences in how the code gets compiled.

Not much we can do about the set thing: adding items to a PySet one by one is just slower than adding them to a Vec and then creating the PySet in one go. The tradeoff is that we now get better error handling since the max_length is applied to the output, not 10x the input size, and more predictable performance (if we had a lot of duplicates we would have vastly exploded memory usage before, which is probably not what someone expects when they have a set type).

All the rest of the tests (e.g. the model_complete) tests are linked to these two issues I think.

Comment on lines 133 to 196
#[allow(clippy::too_many_arguments)]
pub fn validate_infallible_iterator<'s, 'data, V, O, W, L>(
py: Python<'data>,
input: &'data impl Input<'data>,
extra: &'s Extra<'s>,
definitions: &'data Definitions<CombinedValidator>,
recursion_guard: &'s mut RecursionGuard,
checks: &mut IterableValidationChecks<'data>,
iter: impl Iterator<Item = &'data V>,
items_validator: &'s CombinedValidator,
output: &mut O,
write: &mut W,
len: &L,
) -> ValResult<'data, ()>
where
V: Input<'data> + 'data,
W: FnMut(&mut O, PyObject) -> PyResult<()>,
L: Fn(&O) -> usize,
{
for (index, value) in iter.enumerate() {
let result = items_validator
.validate(py, value, extra, definitions, recursion_guard)
.map_err(|e| e.with_outer_location(index.into()));
if let Some(value) = checks.filter_validation_result(result, input)? {
write(output, value)?;
checks.check_output_length(len(output), input)?;
}
}
checks.finish(input)?;
Ok(())
}

#[allow(clippy::too_many_arguments)]
pub fn validate_fallible_iterator<'s, 'data, V, O, W, L>(
py: Python<'data>,
input: &'data impl Input<'data>,
extra: &'s Extra<'s>,
definitions: &'data Definitions<CombinedValidator>,
recursion_guard: &'s mut RecursionGuard,
checks: &mut IterableValidationChecks<'data>,
iter: impl Iterator<Item = ValResult<'data, &'data V>>,
items_validator: &'s CombinedValidator,
output: &mut O,
write: &mut W,
len: &L,
) -> ValResult<'data, ()>
where
V: Input<'data> + 'data,
W: FnMut(&mut O, PyObject) -> PyResult<()>,
L: Fn(&O) -> usize,
{
for (index, result) in iter.enumerate() {
let value = result?;
let result = items_validator
.validate(py, value, extra, definitions, recursion_guard)
.map_err(|e| e.with_outer_location(index.into()));
if let Some(value) = checks.filter_validation_result(result, input)? {
write(output, value)?;
checks.check_output_length(len(output), input)?;
}
}
checks.finish(input)?;
Ok(())
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting these up (same for the dict.rs versions) was an attempt to improve performance for the non-fallible case. It seemed to make a tiny difference, but to be honest it might be worth just collapsing into a single function to save a couple hundred LOC.

@adriangb
Copy link
Member Author

adriangb commented May 16, 2023

I did some more testing on those LOC referenced. Boxing input doesn't help. Each one of the 3 max length checks has a cost and accounts for about 30% of the performance difference. I don't really understand why these checks aren't basically "free" since they're just doing some integer math in Rust. According to #608 (comment) just commenting out those lines makes all of the difference.

@samuelcolvin
Copy link
Member

Okay, I think the breadth and size of the performance degradation here is significant.

I'm also somewhat nervous about making a change this big at this stage, the current code has been fairly well tested over the last few months. If we release this in the beta, we loose all the value of the usage testing of the current code.

Let's defer this for now and I'll try to have a look at applying some of the behaviour changes included here in the current code base with a smaller change.

@samuelcolvin samuelcolvin mentioned this pull request May 17, 2023
@samuelcolvin
Copy link
Member

I have an alternative take on this #610 which has a smaller footprint and I think has a smaller impact on performance.

But having looked at this a bit more, my opinion is that the current behaviour of checking length after finishing validating elements is correct. I have the following rationale:

  • if you error as soon as the length exceeds max_length, you have to change the error message to basically "the input was longer than the max length allowed of 42" you can say "has the length X" reliably since you haven't finished validation
  • you don't validate all elements, so you don't necessarily show all errors
  • I'm not sure what we did in V1, but generally V2 will be enough faster to make up for validating longer lists etc.
  • the case of the input being a generator isn't really applicable to a dos attack - the "attacker" is writing python inside your system, they can do much worse things can define an infinite generator (although we already have the generator_max_length check for that)
  • you have to change how sets are validated to build the set as you go along, which is slower than building a vec, then constructing a set from there

It seems like @adriangb and I disagree, let's see what @dmontagu thinks, happy to defer if you both agree, but I think #610 might be a safer way to go, even though it will require a little more work.

@adriangb
Copy link
Member Author

Just to be clear #610 is not handling our current V1 breaking changes w.r..t allowed inputs right? That's a good chunk of this PR. If we remove the set changes here (comment on that below) and fix the performance degradation for the length checks (which again is not inherent to the change but rather related to whatever the compiler is doing) we'd end up with just that part, which is something we should do regardless.

For the sets thing, there are more factors than performance. Memory is one of them. If you accumulate in a vec you end up keeping multiple copies of each item in memory, which could be arbitrarily large. I'd argue that's unexpected as a user since it may very well be the reason they're using sets.

@samuelcolvin
Copy link
Member

I think allowing sets as inputs to list and tuple is easy.

Building a set as we go along involves a lot of copying, maybe even a macro, but isn't conceptually hard.

I haven't made those changes until we agree whether the change to length checking is necessary.

@dmontagu
Copy link
Collaborator

dmontagu commented May 17, 2023

Okay, I think in an ideal world, it would be a config flag so you could decide between stopping early (with possible performance/arguably security benefits) and displaying all errors. I would suggest the default behavior be the more development-time-friendly (imo) show-all-errors. (I consider this more development-time-friendly because it has the chance to reveal additional errors in a single call.) But I see the benefit for production-mode web apps being able to say they want early stopping behavior.

I don't think it would be crazy to change to always stopping early, but when in doubt, I guess I'd prefer to preserve v1 behavior? (Not sure what it is, though I suspect it's validate-everything..)

Something I would find a lot more convincing about what the behavior should be would be any (GitHub) issues requesting one behavior vs. the other. If there are no requests or stated concerns about this, I think I'd err on the side of what's easier to implement and/or how it works in v1.


Regarding the vec vs. one-at-a-time set-building, I see the arguments on both sides. I think from a theoretical perspective the worst case scenario is significantly worse on memory using vec than on performance using one-at-a-time-python-set. However, from a practical perspective, it's harder for me to say; the set of scenarios where memory consumption would be the constraint seems small, but wall-clock performance improvements are likely to benefit everyone using it to some (admittedly, likely small) degree. I don't think using Set[x] for validation of very-large items is terribly common either way, so I don't think this decision is likely to be a usage deal-breaker either way (though common enough to be worth the thought). It might be interesting to see what other high-performance libraries do when buildings python sets.

One possible option that I suspect neither of you want to do would be a hybrid approach that uses a pure vec up to a certain size, then converts to set beyond that. (There are 1000 heuristic approaches you could take to optimize beyond that but I sort of think anything beyond this is either going to be obvious to you both or too complex to be desirable. Probably the latter.)

@dmontagu
Copy link
Collaborator

dmontagu commented May 17, 2023

I'll add that early stopping behavior would be a significant improvement for developer experience for people who want to use this as a way to limit request sizes, as it lets you specify what you care about much more directly than a blanket request size cap, etc. So I see this as a pretty good argument in favor of the (eventual) config setting proposed above if we don't change this behavior now.

But I don't know what fraction of people building pydantic-backed webapps find themselves caring deeply about granular request-size capping like this.

I guess one other important question is — under what scenarios are these limits typically used? If the typical usage is to prevent large payloads (perhaps misguided since as we've noted it won't stop full validation), then that's an argument for early stopping.

If the typical usage is for business logic reasons, I think that would be more ambiguous, but leaning toward show-all-errors. Might be worth a skim of sourcegraph usage of these constraints.

@adriangb
Copy link
Member Author

I don't think it would be crazy to change to always stopping early, but when in doubt, I guess I'd prefer to preserve v1 behavior? (Not sure what it is, though I suspect it's validate-everything..)

V1 stops early:

from pydantic import BaseModel, ValidationError, Field, validator


calls: list[int] = []


class Inner(BaseModel):
    x: int

    @validator('x')
    def val_x(cls, v: int) -> int:
        calls.append(v)
        assert v < 10
        return v


class Outer(BaseModel):
    inner: list[Inner] = Field(max_items=10)


try:
    Outer(inner=[{'x': n} for n in range(100)])
except ValidationError:
    pass
else:
    assert False

assert len(calls) <= 10

The same code in V2 fails on the final assertion.

Something I would find a lot more convincing about what the behavior should be would be any (GitHub) issues requesting one behavior vs. the other. If there are no requests or stated concerns about this, I think I'd err on the side of what's easier to implement and/or how it works in v1.

I don't recall any off the top of my head, but maybe that's because the V1 behavior is the right behavior? For all we know if we change the V1 behavior we'll be flooded with issues and security reports.

Regarding the vec vs. one-at-a-time set-building, I see the arguments on both sides. I think from a theoretical perspective the worst-case scenario is significantly worse on memory using vec than on performance using one-at-a-time-python-set. However, from a practical perspective, it's harder for me to say; the set of scenarios where memory consumption would be the constraint seems small, but wall-clock performance improvements are likely to benefit everyone using it to some (admittedly, likely small) degree.

I agree, but keep in mind that we're comparing a 10% constant factor worst-case performance degradation against unbounded worst-case memory degradation. And at some point allocating the entire Vec up front (what we currently do, or conversely bump allocating it as we go) could lead to worse performance, or at the very least unpredictable performance that depends on the input.

@samuelcolvin
Copy link
Member

Okay, I think those issues are now fixed on #610 - please review and let me know what you think.

I think the smaller footprint and fewer preformance regressions is better.

@dmontagu
Copy link
Collaborator

Yeah, if v1 did the early stopping that's a pretty strong argument in favor of it imo.

@adriangb
Copy link
Member Author

This is now a +200 LOC vs. main, so it's really not that much more code, but still a large diff (1104 insertions(+), 844 deletions(-)). A lot of the diff size is coming from the added benchmarks and tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants