Skip to content

Commit b8c6677

Browse files
More refinements to the statistics docs (GH-15713) (GH-15715)
(cherry picked from commit d8c93aa) Co-authored-by: Raymond Hettinger <[email protected]>
1 parent 7eaedda commit b8c6677

File tree

1 file changed

+33
-27
lines changed

1 file changed

+33
-27
lines changed

Doc/library/statistics.rst

Lines changed: 33 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,21 @@
1919
--------------
2020

2121
This module provides functions for calculating mathematical statistics of
22-
numeric (:class:`Real`-valued) data.
23-
24-
.. note::
25-
26-
Unless explicitly noted otherwise, these functions support :class:`int`,
27-
:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`.
28-
Behaviour with other types (whether in the numeric tower or not) is
29-
currently unsupported. Collections with a mix of types are also undefined
30-
and implementation-dependent. If your input data consists of mixed types,
31-
you may be able to use :func:`map` to ensure a consistent result, for
32-
example: ``map(float, input_data)``.
22+
numeric (:class:`~numbers.Real`-valued) data.
23+
24+
The module is not intended to be a competitor to third-party libraries such
25+
as `NumPy <https://numpy.org>`_, `SciPy <https://www.scipy.org/>`_, or
26+
proprietary full-featured statistics packages aimed at professional
27+
statisticians such as Minitab, SAS and Matlab. It is aimed at the level of
28+
graphing and scientific calculators.
29+
30+
Unless explicitly noted, these functions support :class:`int`,
31+
:class:`float`, :class:`~decimal.Decimal` and :class:`~fractions.Fraction`.
32+
Behaviour with other types (whether in the numeric tower or not) is
33+
currently unsupported. Collections with a mix of types are also undefined
34+
and implementation-dependent. If your input data consists of mixed types,
35+
you may be able to use :func:`map` to ensure a consistent result, for
36+
example: ``map(float, input_data)``.
3337

3438
Averages and measures of central location
3539
-----------------------------------------
@@ -107,7 +111,7 @@ However, for reading convenience, most of the examples show sorted sequences.
107111
:func:`median` and :func:`mode`.
108112

109113
The sample mean gives an unbiased estimate of the true population mean,
110-
which means that, taken on average over all the possible samples,
114+
so that when taken on average over all the possible samples,
111115
``mean(sample)`` converges on the true mean of the entire population. If
112116
*data* represents the entire population rather than a sample, then
113117
``mean(data)`` is equivalent to calculating the true population mean μ.
@@ -163,8 +167,16 @@ However, for reading convenience, most of the examples show sorted sequences.
163167
will be equivalent to ``3/(1/a + 1/b + 1/c)``.
164168

165169
The harmonic mean is a type of average, a measure of the central
166-
location of the data. It is often appropriate when averaging quantities
167-
which are rates or ratios, for example speeds. For example:
170+
location of the data. It is often appropriate when averaging
171+
rates or ratios, for example speeds.
172+
173+
Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr.
174+
What is the average speed?
175+
176+
.. doctest::
177+
178+
>>> harmonic_mean([40, 60])
179+
48.0
168180

169181
Suppose an investor purchases an equal value of shares in each of
170182
three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
@@ -175,9 +187,6 @@ However, for reading convenience, most of the examples show sorted sequences.
175187
>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
176188
3.6
177189

178-
Using the arithmetic mean would give an average of about 5.167, which
179-
is well over the aggregate P/E ratio.
180-
181190
:exc:`StatisticsError` is raised if *data* is empty, or any element
182191
is less than zero.
183192

@@ -190,9 +199,9 @@ However, for reading convenience, most of the examples show sorted sequences.
190199
middle two" method. If *data* is empty, :exc:`StatisticsError` is raised.
191200
*data* can be a sequence or iterator.
192201

193-
The median is a robust measure of central location, and is less affected by
194-
the presence of outliers in your data. When the number of data points is
195-
odd, the middle data point is returned:
202+
The median is a robust measure of central location and is less affected by
203+
the presence of outliers. When the number of data points is odd, the
204+
middle data point is returned:
196205

197206
.. doctest::
198207

@@ -210,13 +219,10 @@ However, for reading convenience, most of the examples show sorted sequences.
210219
This is suited for when your data is discrete, and you don't mind that the
211220
median may not be an actual data point.
212221

213-
If your data is ordinal (supports order operations) but not numeric (doesn't
214-
support addition), you should use :func:`median_low` or :func:`median_high`
222+
If the data is ordinal (supports order operations) but not numeric (doesn't
223+
support addition), consider using :func:`median_low` or :func:`median_high`
215224
instead.
216225

217-
.. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped`
218-
219-
220226
.. function:: median_low(data)
221227

222228
Return the low median of numeric data. If *data* is empty,
@@ -319,7 +325,7 @@ However, for reading convenience, most of the examples show sorted sequences.
319325
desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
320326
If the input *data* is empty, :exc:`StatisticsError` is raised.
321327

322-
``mode`` assumes discrete data, and returns a single value. This is the
328+
``mode`` assumes discrete data and returns a single value. This is the
323329
standard treatment of the mode as commonly taught in schools:
324330

325331
.. doctest::
@@ -522,7 +528,7 @@ However, for reading convenience, most of the examples show sorted sequences.
522528
cut-point will evaluate to ``104``.
523529

524530
The *method* for computing quantiles can be varied depending on
525-
whether the data in *data* includes or excludes the lowest and
531+
whether the *data* includes or excludes the lowest and
526532
highest possible values from the population.
527533

528534
The default *method* is "exclusive" and is used for data sampled from

0 commit comments

Comments
 (0)