Strengthen the sanity checks, and fix some minor style or bugs uncovered #579

Julian · 2022-07-31T19:04:30Z

This looks a bit noisy, especially due to fixing across multiple drafts, but it does a few simple things to make the sanity checks a bit better. Only one is newish (mentioned below), the rest are really things we've always really "wanted" but was apparently missing from the sanity check. Specifically the changes here:

makes the sanity check use "subtests", which basically just means that when CI runs and a sanity test fails because one test in the suite is somehow invalid, the rest of the tests will still be sanity checked, and the PR author will see all the invalid tests not just the first one
case descriptions are checked for uniqueness too, not just test descriptions (this I could have sworn was already here, but apparently it wasn't, so there were a small number of tests which had non-unique descriptions, including a minor "bug" where some tests were running twice because they were accidentally copy pasted)
case descriptions are checked for length too, again same as test descriptions (the reasoning is similar for both, though given there are some long case descriptions, I left the length limit longer for now)
Prevent the use of "test that" or "X should do Y" in test descriptions. This is "new" in the sense that I've only mentioned this in PR reviews once or twice, but it's a stylistic thing which I don't like in test cases (in any language, but here too). Some reasoning is in the commit if that seems controversial, but there were only a few of these in the suite, which were easily removed. Essentially these are "noisy" words which usually can be removed without changing the meaning of the description.

A bit more detail is in each commit message if you're even more curious, but hopefully it's clear that each description changed is at least as good as before.

Feel free to ignore reviewing the Python code if you (future reviewer) aren't comfortable reading it (or feel free to press through and review it).

A subtest in Python means that it will continue running the other examples (tests, cases, whatever) and show an error for each one, rather than simply stopping on the first invalid (test, case, whatever). This should make it easier for seeing what's wrong in CI runs without needing to repeatedly run each time. A further improvement would be grouping tests "horizontally" across versions rather than just within a version, as often authors are copy pasting tests across each version it applies to. With a subtest, they'll see N failures for N versions, whereas it'd be better to see 1 failure for e.g. test description which is too long across all N versions.

The reason we do this is both to force conciseness but also because runners of the suite likely turn one or more of the combination of case description and test description into an identifier within the implementer's test framework. If descriptions are really long, it makes things like copy pasting test names more annoying or difficult. 150 is quite long -- we use 70 for test descriptions, so in total this means potentially 220 characters for the combined length. I'd like to bring this down a bit but this is the number that passes all existing descriptions at least.

The reason we do this for tests is to allow implementers to use test descriptions as unique identifiers, and also because there's a small chance a duplicate description is a copy paste error, or some other test bug. The same reasoning applies to cases. 0da2aaa is another good reason (which in theory applies to both test cases and individual tests) where a case was accidentally duplicated a second time during backporting (where it was already present). See e.g. the first file fixed here, where descriptions should really indicate what validator is being tested for clarity. We don't however check directly for schema duplication in cases, as one could imagine distinct scenarios being tested with the same underlying schema. Fixing further files will be done in successive commits.

Fixes a minor inefficiency in draft3 where separate cases were used instead of just 2 tests.

These were fully duplicated accidentally (which the uniqueness check now finds).

It makes descriptions longer (which we independently check for) but also is stylistically a bit awkward. https://jml.io/pages/test-docstrings.html is a good post on the subject, with a sort of algorithm for removing this and related words. Fix a first file (anchor) where descriptions are shortened and clarified a bit. Further files will be fixed in successive commits.

…ription.

There aren't any currently (though one was just removed in a previous 'should' commit).

handrews

Looks good, but would have been much easier to review as one PR for the python changes and one for all of the more-or-less trivial description changes.

Julian · 2022-08-06T07:10:55Z

Thanks!

And fix those which fail.

Julian added 13 commits July 31, 2022 21:44

Fix two last clashing $ids in draft-next too.

8ade923

Make sure bignum cases have unique descriptions.

a5b459f

Fixes a minor inefficiency in draft3 where separate cases were used instead of just 2 tests.

Remove mistakenly duplicated anyOf tests in draft{4,6,7}.

6383ef6

These were fully duplicated accidentally (which the uniqueness check now finds).

Fix the last non-unique case descriptions.

68f380c

Remove 'should' from a format test which anyhow needed a clearer desc…

b84b900

…ription.

More description cleanup for Bengali non-ASCII tests.

85899fd

Kill more shoulds from ECMA and ref tests.

b7da24f

Remove the last few 'shoulds' from items descriptions.

eaf06b1

Also prevent 'tests that' from descriptions.

e5d5e0a

There aren't any currently (though one was just removed in a previous 'should' commit).

Julian requested a review from a team as a code owner July 31, 2022 19:04

handrews self-assigned this Aug 5, 2022

handrews approved these changes Aug 5, 2022

View reviewed changes

Julian merged commit f827640 into main Aug 6, 2022

Julian deleted the sanity-check-strengthening branch August 6, 2022 07:11

Julian added a commit that referenced this pull request Aug 6, 2022

Apply #579 to case descriptions as well.

fe3eda2

And fix those which fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Strengthen the sanity checks, and fix some minor style or bugs uncovered #579

Strengthen the sanity checks, and fix some minor style or bugs uncovered #579

Uh oh!

Julian commented Jul 31, 2022 •

edited

Loading

Uh oh!

handrews left a comment

Uh oh!

Julian commented Aug 6, 2022

Uh oh!

Uh oh!

Uh oh!

Strengthen the sanity checks, and fix some minor style or bugs uncovered #579

Strengthen the sanity checks, and fix some minor style or bugs uncovered #579

Uh oh!

Conversation

Julian commented Jul 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

handrews left a comment

Choose a reason for hiding this comment

Uh oh!

Julian commented Aug 6, 2022

Uh oh!

Uh oh!

Julian commented Jul 31, 2022 •

edited

Loading