@@ -26,10 +26,10 @@ numeric (:class:`Real`-valued) data.
26
26
Unless explicitly noted otherwise, these functions support :class: `int `,
27
27
:class: `float `, :class: `decimal.Decimal ` and :class: `fractions.Fraction `.
28
28
Behaviour with other types (whether in the numeric tower or not) is
29
- currently unsupported. Mixed types are also undefined and
30
- implementation-dependent. If your input data consists of mixed types,
31
- you may be able to use :func: `map ` to ensure a consistent result, e.g.
32
- ``map(float, input_data) ``.
29
+ currently unsupported. Collections with a mix of types are also undefined
30
+ and implementation-dependent. If your input data consists of mixed types,
31
+ you may be able to use :func: `map ` to ensure a consistent result, for
32
+ example: ``map(float, input_data) ``.
33
33
34
34
Averages and measures of central location
35
35
-----------------------------------------
@@ -102,11 +102,9 @@ However, for reading convenience, most of the examples show sorted sequences.
102
102
.. note ::
103
103
104
104
The mean is strongly affected by outliers and is not a robust estimator
105
- for central location: the mean is not necessarily a typical example of the
106
- data points. For more robust, although less efficient, measures of
107
- central location, see :func: `median ` and :func: `mode `. (In this case,
108
- "efficient" refers to statistical efficiency rather than computational
109
- efficiency.)
105
+ for central location: the mean is not necessarily a typical example of
106
+ the data points. For more robust measures of central location, see
107
+ :func: `median ` and :func: `mode `.
110
108
111
109
The sample mean gives an unbiased estimate of the true population mean,
112
110
which means that, taken on average over all the possible samples,
@@ -120,9 +118,8 @@ However, for reading convenience, most of the examples show sorted sequences.
120
118
Convert *data * to floats and compute the arithmetic mean.
121
119
122
120
This runs faster than the :func: `mean ` function and it always returns a
123
- :class: `float `. The result is highly accurate but not as perfect as
124
- :func: `mean `. If the input dataset is empty, raises a
125
- :exc: `StatisticsError `.
121
+ :class: `float `. The *data * may be a sequence or iterator. If the input
122
+ dataset is empty, raises a :exc: `StatisticsError `.
126
123
127
124
.. doctest ::
128
125
@@ -136,15 +133,20 @@ However, for reading convenience, most of the examples show sorted sequences.
136
133
137
134
Convert *data * to floats and compute the geometric mean.
138
135
136
+ The geometric mean indicates the central tendency or typical value of the
137
+ *data * using the product of the values (as opposed to the arithmetic mean
138
+ which uses their sum).
139
+
139
140
Raises a :exc: `StatisticsError ` if the input dataset is empty,
140
141
if it contains a zero, or if it contains a negative value.
142
+ The *data * may be a sequence or iterator.
141
143
142
144
No special efforts are made to achieve exact results.
143
145
(However, this may change in the future.)
144
146
145
147
.. doctest ::
146
148
147
- >>> round (geometric_mean([54 , 24 , 36 ]), 9 )
149
+ >>> round (geometric_mean([54 , 24 , 36 ]), 1 )
148
150
36.0
149
151
150
152
.. versionadded :: 3.8
@@ -174,7 +176,7 @@ However, for reading convenience, most of the examples show sorted sequences.
174
176
3.6
175
177
176
178
Using the arithmetic mean would give an average of about 5.167, which
177
- is too high .
179
+ is well over the aggregate P/E ratio .
178
180
179
181
:exc: `StatisticsError ` is raised if *data * is empty, or any element
180
182
is less than zero.
@@ -312,10 +314,10 @@ However, for reading convenience, most of the examples show sorted sequences.
312
314
The mode (when it exists) is the most typical value and serves as a
313
315
measure of central location.
314
316
315
- If there are multiple modes, returns the first one encountered in the * data *.
316
- If the smallest or largest of multiple modes is desired instead, use
317
- ``min(multimode(data)) `` or ``max(multimode(data)) ``. If the input * data * is
318
- empty, :exc: `StatisticsError ` is raised.
317
+ If there are multiple modes with the same frequency, returns the first one
318
+ encountered in the * data *. If the smallest or largest of those is
319
+ desired instead, use ``min(multimode(data)) `` or ``max(multimode(data)) ``.
320
+ If the input * data * is empty, :exc: `StatisticsError ` is raised.
319
321
320
322
``mode `` assumes discrete data, and returns a single value. This is the
321
323
standard treatment of the mode as commonly taught in schools:
@@ -325,8 +327,8 @@ However, for reading convenience, most of the examples show sorted sequences.
325
327
>>> mode([1 , 1 , 2 , 3 , 3 , 3 , 3 , 4 ])
326
328
3
327
329
328
- The mode is unique in that it is the only statistic which also applies
329
- to nominal (non-numeric) data:
330
+ The mode is unique in that it is the only statistic in this package that
331
+ also applies to nominal (non-numeric) data:
330
332
331
333
.. doctest ::
332
334
@@ -368,15 +370,16 @@ However, for reading convenience, most of the examples show sorted sequences.
368
370
369
371
.. function :: pvariance(data, mu=None)
370
372
371
- Return the population variance of *data *, a non-empty iterable of real-valued
372
- numbers. Variance, or second moment about the mean, is a measure of the
373
- variability (spread or dispersion) of data. A large variance indicates that
374
- the data is spread out; a small variance indicates it is clustered closely
375
- around the mean.
373
+ Return the population variance of *data *, a non-empty sequence or iterator
374
+ of real-valued numbers. Variance, or second moment about the mean, is a
375
+ measure of the variability (spread or dispersion) of data. A large
376
+ variance indicates that the data is spread out; a small variance indicates
377
+ it is clustered closely around the mean.
376
378
377
- If the optional second argument *mu * is given, it should be the mean of
378
- *data *. If it is missing or ``None `` (the default), the mean is
379
- automatically calculated.
379
+ If the optional second argument *mu * is given, it is typically the mean of
380
+ the *data *. It can also be used to compute the second moment around a
381
+ point that is not the mean. If it is missing or ``None `` (the default),
382
+ the arithmetic mean is automatically calculated.
380
383
381
384
Use this function to calculate the variance from the entire population. To
382
385
estimate the variance from a sample, the :func: `variance ` function is usually
@@ -401,10 +404,6 @@ However, for reading convenience, most of the examples show sorted sequences.
401
404
>>> pvariance(data, mu)
402
405
1.25
403
406
404
- This function does not attempt to verify that you have passed the actual mean
405
- as *mu *. Using arbitrary values for *mu * may lead to invalid or impossible
406
- results.
407
-
408
407
Decimals and Fractions are supported:
409
408
410
409
.. doctest ::
@@ -423,11 +422,11 @@ However, for reading convenience, most of the examples show sorted sequences.
423
422
σ². When called on a sample instead, this is the biased sample variance
424
423
s², also known as variance with N degrees of freedom.
425
424
426
- If you somehow know the true population mean μ, you may use this function
427
- to calculate the variance of a sample, giving the known population mean as
428
- the second argument. Provided the data points are representative
429
- (e.g. independent and identically distributed) , the result will be an
430
- unbiased estimate of the population variance.
425
+ If you somehow know the true population mean μ, you may use this
426
+ function to calculate the variance of a sample, giving the known
427
+ population mean as the second argument. Provided the data points are a
428
+ random sample of the population , the result will be an unbiased estimate
429
+ of the population variance.
431
430
432
431
433
432
.. function :: stdev(data, xbar=None)
@@ -502,19 +501,19 @@ However, for reading convenience, most of the examples show sorted sequences.
502
501
:func: `pvariance ` function as the *mu * parameter to get the variance of a
503
502
sample.
504
503
505
- .. function :: quantiles(dist , *, n=4, method='exclusive')
504
+ .. function :: quantiles(data , *, n=4, method='exclusive')
506
505
507
- Divide *dist * into *n * continuous intervals with equal probability.
506
+ Divide *data * into *n * continuous intervals with equal probability.
508
507
Returns a list of ``n - 1 `` cut points separating the intervals.
509
508
510
509
Set *n * to 4 for quartiles (the default). Set *n * to 10 for deciles. Set
511
510
*n * to 100 for percentiles which gives the 99 cuts points that separate
512
- *dist * in to 100 equal sized groups. Raises :exc: `StatisticsError ` if *n *
511
+ *data * in to 100 equal sized groups. Raises :exc: `StatisticsError ` if *n *
513
512
is not least 1.
514
513
515
- The *dist * can be any iterable containing sample data or it can be an
514
+ The *data * can be any iterable containing sample data or it can be an
516
515
instance of a class that defines an :meth: `~inv_cdf ` method. For meaningful
517
- results, the number of data points in *dist * should be larger than *n *.
516
+ results, the number of data points in *data * should be larger than *n *.
518
517
Raises :exc: `StatisticsError ` if there are not at least two data points.
519
518
520
519
For sample data, the cut points are linearly interpolated from the
@@ -523,7 +522,7 @@ However, for reading convenience, most of the examples show sorted sequences.
523
522
cut-point will evaluate to ``104 ``.
524
523
525
524
The *method * for computing quantiles can be varied depending on
526
- whether the data in *dist * includes or excludes the lowest and
525
+ whether the data in *data * includes or excludes the lowest and
527
526
highest possible values from the population.
528
527
529
528
The default *method * is "exclusive" and is used for data sampled from
@@ -535,14 +534,14 @@ However, for reading convenience, most of the examples show sorted sequences.
535
534
536
535
Setting the *method * to "inclusive" is used for describing population
537
536
data or for samples that are known to include the most extreme values
538
- from the population. The minimum value in *dist * is treated as the 0th
537
+ from the population. The minimum value in *data * is treated as the 0th
539
538
percentile and the maximum value is treated as the 100th percentile.
540
539
The portion of the population falling below the *i-th * of *m * sorted
541
540
data points is computed as ``(i - 1) / (m - 1) ``. Given 11 sample
542
541
values, the method sorts them and assigns the following percentiles:
543
542
0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
544
543
545
- If *dist * is an instance of a class that defines an
544
+ If *data * is an instance of a class that defines an
546
545
:meth: `~inv_cdf ` method, setting *method * has no effect.
547
546
548
547
.. doctest ::
@@ -580,7 +579,7 @@ A single exception is defined:
580
579
:class: `NormalDist ` is a tool for creating and manipulating normal
581
580
distributions of a `random variable
582
581
<http://www.stat.yale.edu/Courses/1997-98/101/ranvar.htm> `_. It is a
583
- composite class that treats the mean and standard deviation of data
582
+ class that treats the mean and standard deviation of data
584
583
measurements as a single entity.
585
584
586
585
Normal distributions arise from the `Central Limit Theorem
@@ -616,13 +615,14 @@ of applications in statistics.
616
615
617
616
.. classmethod :: NormalDist.from_samples(data)
618
617
619
- Makes a normal distribution instance computed from sample data. The
620
- *data * can be any :term: `iterable ` and should consist of values that
621
- can be converted to type :class: `float `.
618
+ Makes a normal distribution instance with *mu * and *sigma * parameters
619
+ estimated from the *data * using :func: `fmean ` and :func: `stdev `.
622
620
623
- If *data * does not contain at least two elements, raises
624
- :exc: `StatisticsError ` because it takes at least one point to estimate
625
- a central value and at least two points to estimate dispersion.
621
+ The *data * can be any :term: `iterable ` and should consist of values
622
+ that can be converted to type :class: `float `. If *data * does not
623
+ contain at least two elements, raises :exc: `StatisticsError ` because it
624
+ takes at least one point to estimate a central value and at least two
625
+ points to estimate dispersion.
626
626
627
627
.. method :: NormalDist.samples(n, *, seed=None)
628
628
@@ -636,10 +636,10 @@ of applications in statistics.
636
636
.. method :: NormalDist.pdf(x)
637
637
638
638
Using a `probability density function (pdf)
639
- <https://en.wikipedia.org/wiki/Probability_density_function> `_,
640
- compute the relative likelihood that a random variable *X * will be near
641
- the given value *x *. Mathematically, it is the ratio ``P(x <= X <
642
- x+dx) / dx ``.
639
+ <https://en.wikipedia.org/wiki/Probability_density_function> `_, compute
640
+ the relative likelihood that a random variable *X * will be near the
641
+ given value *x *. Mathematically, it is the limit of the ratio ``P(x <=
642
+ X < x+dx) / dx `` as * dx * approaches zero .
643
643
644
644
The relative likelihood is computed as the probability of a sample
645
645
occurring in a narrow range divided by the width of the range (hence
@@ -667,8 +667,10 @@ of applications in statistics.
667
667
668
668
.. method :: NormalDist.overlap(other)
669
669
670
- Returns a value between 0.0 and 1.0 giving the overlapping area for
671
- the two probability density functions.
670
+ Measures the agreement between two normal probability distributions.
671
+ Returns a value between 0.0 and 1.0 giving `the overlapping area for
672
+ the two probability density functions
673
+ <https://www.rasch.org/rmt/rmt101r.htm> `_.
672
674
673
675
Instances of :class: `NormalDist ` support addition, subtraction,
674
676
multiplication and division by a constant. These operations
@@ -740,12 +742,11 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
740
742
... return (3 * x + 7 * x* y - 5 * y) / (11 * z)
741
743
...
742
744
>>> n = 100_000
743
- >>> seed = 86753099035768
744
- >>> X = NormalDist(10 , 2.5 ).samples(n, seed = seed)
745
- >>> Y = NormalDist(15 , 1.75 ).samples(n, seed = seed)
746
- >>> Z = NormalDist(50 , 1.25 ).samples(n, seed = seed)
747
- >>> NormalDist.from_samples(map (model, X, Y, Z)) # doctest: +SKIP
748
- NormalDist(mu=1.8661894803304777, sigma=0.65238717376862)
745
+ >>> X = NormalDist(10 , 2.5 ).samples(n, seed = 3652260728 )
746
+ >>> Y = NormalDist(15 , 1.75 ).samples(n, seed = 4582495471 )
747
+ >>> Z = NormalDist(50 , 1.25 ).samples(n, seed = 6582483453 )
748
+ >>> quantiles(map (model, X, Y, Z)) # doctest: +SKIP
749
+ [1.4591308524824727, 1.8035946855390597, 2.175091447274739]
749
750
750
751
Normal distributions commonly arise in machine learning problems.
751
752
0 commit comments