Skip to content

bpo-40331: Increase test coverage for the statistics module #19608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 13, 2020
4 changes: 2 additions & 2 deletions Lib/statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,15 +267,15 @@ def _convert(value, T):


def _find_lteq(a, x):
'Locate the leftmost value exactly equal to x'
"""Locate the leftmost value exactly equal to x"""
i = bisect_left(a, x)
if i != len(a) and a[i] == x:
return i
raise ValueError


def _find_rteq(a, l, x):
'Locate the rightmost value exactly equal to x'
"""Locate the rightmost value exactly equal to x"""
i = bisect_right(a, x, lo=l)
if i != (len(a)+1) and a[i-1] == x:
return i-1
Expand Down
45 changes: 45 additions & 0 deletions Lib/test/test_statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1004,6 +1004,9 @@ def test_nan(self):
x = statistics._convert(nan, type(nan))
self.assertTrue(_nan_equal(x, nan))

def test_raise_type_error(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this test_invalid_input_type()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.assertRaises(TypeError, statistics._convert, None, float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find using with self.assertRaises(): much more readable, please consider changing this to that style.

Suggested change
self.assertRaises(TypeError, statistics._convert, None, float)
with self.assertRaises(TypeError):
statistics._convert(None, float)

(Also in following new tests in this PR.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will change it here, as well as in the FindLteqTest and FindRteqTest tests, as you suggested.

Copy link
Contributor

@aeros aeros Apr 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to mention it just after you made the change already, but I find that using assertRaisesRegex() tends to be a better practice than just assertRaises(). It's more significant of an issue for things like deprecation warnings, but it can help quite a bit with ensuring that you're getting the correct exception or warning.

So, it may not be necessary in the above case, but it's worth some consideration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that using assertRaisesRegex() tends to be a better practice than just assertRaises().

That may be true in general, but I think that in this case it's most appropriate to simply check that a TypeError is thrown.

Copy link
Contributor

@aeros aeros Apr 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looking at the above test a bit more closely, it seems highly unlikely for any other TypeError to occur besides the specific one being tested. So, assertRaises() is probably fine here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion @aeros. As you have both already said, it may not be useful in this particular test, but maybe, it would be useful in the test_single_value_unsupported_type test (introduced in this PR) because it targets a specific TypeError with a custom message, while it is possible for the function to raise other TypeErrors too (through the _fail_neg function).

If you don't have any objections, I will go and modify the test_single_value_unsupported_type to use the assertRaisesRegex instead of the assertRaises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f you don't have any objections, I will go and modify the test_single_value_unsupported_type to use the assertRaisesRegex instead of the assertRaises.

Go ahead.



class FailNegTest(unittest.TestCase):
"""Test _fail_neg private function."""
Expand Down Expand Up @@ -1033,6 +1036,45 @@ def test_error_msg(self):
self.assertEqual(errmsg, msg)


class FindLteqTest(unittest.TestCase):
# Test _find_lteq private function.

def test_raise_value_error(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also find a better name for this, e.g. test_invalid_input_values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for a, x in [
([], 1), # empty a triggers `i != len(a)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a while to think about this, but I've decided that we should remove all of the comments referencing the specific statements that this is intended to exercise. My main reason is that the referenced code will likely change over time, and these comments are very likely to become stale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more general reason is that tests must not test the implementation, but only the desired behavior. We might take inspiration for relevant test cases from the code, but the tests should not need to be changed when the code is changed in ways that don't affect its behavior, such as refactoring and optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for describing your reasoning behind this decision. I will remove those comments. I agree that good tests assert the desired behavior and not the actual implementation. I put them only for helping the reader understand why each of them was added, but all of them, trigger the same behavior (throwing the ValueError). And as you said, they may become obsolete if the implementation change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments explaining why, conceptually, each of the test cases should result in such an error, or perhaps which edge-case it is testing, could be welcome. This is true general, and specifically in this case. They're not required though; I'm just adding this note for completeness :)

([1, 2], 3), # non-existing max value triggers `i != len(a)`
([1, 3], 2) # non-existing value triggers `a[i] == x`
]:
self.assertRaises(ValueError, statistics._find_lteq, a, x)

def test_locate_successfully(self):
for a, x, expected_i in [
([1, 1, 1, 2, 3], 1, 0),
([0, 1, 1, 1, 2, 3], 1, 1),
([1, 2, 3, 3, 3], 3, 2)
]:
self.assertEqual(expected_i, statistics._find_lteq(a, x))


class FindRteqTest(unittest.TestCase):
# Test _find_rteq private function.

def test_raise_value_error(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test method could also be renamed, similarly to the previous ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have changed this too.

for a, l, x in [
([1], 2, 1), # when l=len(a)+1 triggers `i != (len(a)+1)`
([1, 3], 0, 2) # non-existing value triggers `a[i-1] == x`
]:
self.assertRaises(ValueError, statistics._find_rteq, a, l, x)

def test_locate_successfully(self):
for a, l, x, expected_i in [
([1, 1, 1, 2, 3], 0, 1, 2),
([0, 1, 1, 1, 2, 3], 0, 1, 3),
([1, 2, 3, 3, 3], 0, 3, 4)
]:
self.assertEqual(expected_i, statistics._find_rteq(a, l, x))


# === Tests for public functions ===

class UnivariateCommonMixin:
Expand Down Expand Up @@ -1454,6 +1496,9 @@ class TestHarmonicMean(NumericTestCase, AverageMixin, UnivariateTypeMixin):
def setUp(self):
self.func = statistics.harmonic_mean

def test_single_value_unsupported_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come after the prepare_ methods.

Also, why test only a an array with a single value of an unsupported type? This would be a good opportunity for other unsupported cases, such as longer arrays, and arrays with values of mixed types, some supported and some not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I will put it right after the test_negative_error test.

I tested it with a single value of an unsupported type because the statement that explicitly raises this TypeError is part of the elif n == 1 branch, where n is the length of the given list. So a longer list would not trigger this specific TypeError, which was the only statement without being covered in this function. [1]

If you think that it would be useful to have a test with longer lists (only with unsupported values, both supported/unsupported values) and assert that a TypeError is raised, I could add one in the next commit. However, this would be for a TypeError that is not explicitly raised by this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding two assertions:

  1. multiple string values
  2. mixed strings and valid values (i.e. positive numbers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions @taleinat. I have added a test in the latest commit that asserts those cases.

self.assertRaises(TypeError, self.func, ['3.14'])

def prepare_data(self):
# Override mixin method.
values = super().prepare_data()
Expand Down