Skip to content

Commit 293a479

Browse files
[3.12] Sync main docs and docstring for median_grouped(). (gh-117214) (gh-117241)
1 parent 9359fdd commit 293a479

File tree

1 file changed

+39
-38
lines changed

1 file changed

+39
-38
lines changed

Doc/library/statistics.rst

Lines changed: 39 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ or sample.
7979
:func:`median` Median (middle value) of data.
8080
:func:`median_low` Low median of data.
8181
:func:`median_high` High median of data.
82-
:func:`median_grouped` Median, or 50th percentile, of grouped data.
82+
:func:`median_grouped` Median (50th percentile) of grouped data.
8383
:func:`mode` Single mode (most common value) of discrete or nominal data.
8484
:func:`multimode` List of modes (most common values) of discrete or nominal data.
8585
:func:`quantiles` Divide data into intervals with equal probability.
@@ -329,55 +329,56 @@ However, for reading convenience, most of the examples show sorted sequences.
329329
be an actual data point rather than interpolated.
330330

331331

332-
.. function:: median_grouped(data, interval=1)
332+
.. function:: median_grouped(data, interval=1.0)
333333

334-
Return the median of grouped continuous data, calculated as the 50th
335-
percentile, using interpolation. If *data* is empty, :exc:`StatisticsError`
336-
is raised. *data* can be a sequence or iterable.
334+
Estimates the median for numeric data that has been `grouped or binned
335+
<https://en.wikipedia.org/wiki/Data_binning>`_ around the midpoints
336+
of consecutive, fixed-width intervals.
337337

338-
.. doctest::
338+
The *data* can be any iterable of numeric data with each value being
339+
exactly the midpoint of a bin. At least one value must be present.
339340

340-
>>> median_grouped([52, 52, 53, 54])
341-
52.5
341+
The *interval* is the width of each bin.
342342

343-
In the following example, the data are rounded, so that each value represents
344-
the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2
345-
is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data
346-
given, the middle value falls somewhere in the class 3.5--4.5, and
347-
interpolation is used to estimate it:
343+
For example, demographic information may have been summarized into
344+
consecutive ten-year age groups with each group being represented
345+
by the 5-year midpoints of the intervals:
348346

349347
.. doctest::
350348

351-
>>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])
352-
3.7
353-
354-
Optional argument *interval* represents the class interval, and defaults
355-
to 1. Changing the class interval naturally will change the interpolation:
349+
>>> from collections import Counter
350+
>>> demographics = Counter({
351+
... 25: 172, # 20 to 30 years old
352+
... 35: 484, # 30 to 40 years old
353+
... 45: 387, # 40 to 50 years old
354+
... 55: 22, # 50 to 60 years old
355+
... 65: 6, # 60 to 70 years old
356+
... })
357+
...
358+
359+
The 50th percentile (median) is the 536th person out of the 1071
360+
member cohort. That person is in the 30 to 40 year old age group.
361+
362+
The regular :func:`median` function would assume that everyone in the
363+
tricenarian age group was exactly 35 years old. A more tenable
364+
assumption is that the 484 members of that age group are evenly
365+
distributed between 30 and 40. For that, we use
366+
:func:`median_grouped`:
356367

357368
.. doctest::
358369

359-
>>> median_grouped([1, 3, 3, 5, 7], interval=1)
360-
3.25
361-
>>> median_grouped([1, 3, 3, 5, 7], interval=2)
362-
3.5
363-
364-
This function does not check whether the data points are at least
365-
*interval* apart.
366-
367-
.. impl-detail::
368-
369-
Under some circumstances, :func:`median_grouped` may coerce data points to
370-
floats. This behaviour is likely to change in the future.
371-
372-
.. seealso::
370+
>>> data = list(demographics.elements())
371+
>>> median(data)
372+
35
373+
>>> round(median_grouped(data, interval=10), 1)
374+
37.5
373375

374-
* "Statistics for the Behavioral Sciences", Frederick J Gravetter and
375-
Larry B Wallnau (8th Edition).
376+
The caller is responsible for making sure the data points are separated
377+
by exact multiples of *interval*. This is essential for getting a
378+
correct result. The function does not check this precondition.
376379

377-
* The `SSMEDIAN
378-
<https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN>`_
379-
function in the Gnome Gnumeric spreadsheet, including `this discussion
380-
<https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html>`_.
380+
Inputs may be any numeric type that can be coerced to a float during
381+
the interpolation step.
381382

382383

383384
.. function:: mode(data)

0 commit comments

Comments
 (0)