@@ -467,6 +467,201 @@ A single exception is defined:
467
467
468
468
Subclass of :exc: `ValueError ` for statistics-related exceptions.
469
469
470
+
471
+ :class: `NormalDist ` objects
472
+ ===========================
473
+
474
+ A :class: `NormalDist ` is a a composite class that treats the mean and standard
475
+ deviation of data measurements as a single entity. It is a tool for creating
476
+ and manipulating normal distributions of a random variable.
477
+
478
+ Normal distributions arise from the `Central Limit Theorem
479
+ <https://en.wikipedia.org/wiki/Central_limit_theorem> `_ and have a wide range
480
+ of applications in statistics, including simulations and hypothesis testing.
481
+
482
+ .. class :: NormalDist(mu=0.0, sigma=1.0)
483
+
484
+ Returns a new *NormalDist * object where *mu * represents the `arithmetic
485
+ mean <https://en.wikipedia.org/wiki/Arithmetic_mean> `_ of data and *sigma *
486
+ represents the `standard deviation
487
+ <https://en.wikipedia.org/wiki/Standard_deviation> `_ of the data.
488
+
489
+ If *sigma * is negative, raises :exc: `StatisticsError `.
490
+
491
+ .. attribute :: mu
492
+
493
+ The mean of a normal distribution.
494
+
495
+ .. attribute :: sigma
496
+
497
+ The standard deviation of a normal distribution.
498
+
499
+ .. attribute :: variance
500
+
501
+ A read-only property representing the `variance
502
+ <https://en.wikipedia.org/wiki/Variance> `_ of a normal
503
+ distribution. Equal to the square of the standard deviation.
504
+
505
+ .. classmethod :: NormalDist.from_samples(data)
506
+
507
+ Class method that makes a normal distribution instance
508
+ from sample data. The *data * can be any :term: `iterable `
509
+ and should consist of values that can be converted to type
510
+ :class: `float `.
511
+
512
+ If *data * does not contain at least two elements, raises
513
+ :exc: `StatisticsError ` because it takes at least one point to estimate
514
+ a central value and at least two points to estimate dispersion.
515
+
516
+ .. method :: NormalDist.samples(n, seed=None)
517
+
518
+ Generates *n * random samples for a given mean and standard deviation.
519
+ Returns a :class: `list ` of :class: `float ` values.
520
+
521
+ If *seed * is given, creates a new instance of the underlying random
522
+ number generator. This is useful for creating reproducible results,
523
+ even in a multi-threading context.
524
+
525
+ .. method :: NormalDist.pdf(x)
526
+
527
+ Using a `probability density function (pdf)
528
+ <https://en.wikipedia.org/wiki/Probability_density_function> `_,
529
+ compute the relative likelihood that a random sample *X * will be near
530
+ the given value *x *. Mathematically, it is the ratio ``P(x <= X <
531
+ x+dx) / dx ``.
532
+
533
+ Note the relative likelihood of *x * can be greater than `1.0 `. The
534
+ probability for a specific point on a continuous distribution is `0.0 `,
535
+ so the :func: `pdf ` is used instead. It gives the probability of a
536
+ sample occurring in a narrow range around *x * and then dividing that
537
+ probability by the width of the range (hence the word "density").
538
+
539
+ .. method :: NormalDist.cdf(x)
540
+
541
+ Using a `cumulative distribution function (cdf)
542
+ <https://en.wikipedia.org/wiki/Cumulative_distribution_function> `_,
543
+ compute the probability that a random sample *X * will be less than or
544
+ equal to *x *. Mathematically, it is written ``P(X <= x) ``.
545
+
546
+ Instances of :class: `NormalDist ` support addition, subtraction,
547
+ multiplication and division by a constant. These operations
548
+ are used for translation and scaling. For example:
549
+
550
+ .. doctest ::
551
+
552
+ >>> temperature_february = NormalDist(5 , 2.5 ) # Celsius
553
+ >>> temperature_february * (9 / 5 ) + 32 # Fahrenheit
554
+ NormalDist(mu=41.0, sigma=4.5)
555
+
556
+ Dividing a constant by an instance of :class: `NormalDist ` is not supported.
557
+
558
+ Since normal distributions arise from additive effects of independent
559
+ variables, it is possible to `add and subtract two normally distributed
560
+ random variables
561
+ <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables> `_
562
+ represented as instances of :class: `NormalDist `. For example:
563
+
564
+ .. doctest ::
565
+
566
+ >>> birth_weights = NormalDist.from_samples([2.5 , 3.1 , 2.1 , 2.4 , 2.7 , 3.5 ])
567
+ >>> drug_effects = NormalDist(0.4 , 0.15 )
568
+ >>> combined = birth_weights + drug_effects
569
+ >>> f ' mu= { combined.mu :.1f } sigma= { combined.sigma :.1f } '
570
+ 'mu=3.1 sigma=0.5'
571
+
572
+ .. versionadded :: 3.8
573
+
574
+
575
+ :class: `NormalDist ` Examples and Recipes
576
+ ----------------------------------------
577
+
578
+ A :class: `NormalDist ` readily solves classic probability problems.
579
+
580
+ For example, given `historical data for SAT exams
581
+ <https://blog.prepscholar.com/sat-standard-deviation> `_ showing that scores
582
+ are normally distributed with a mean of 1060 and standard deviation of 192,
583
+ determine the percentage of students with scores between 1100 and 1200:
584
+
585
+ .. doctest ::
586
+
587
+ >>> sat = NormalDist(1060 , 195 )
588
+ >>> fraction = sat.cdf(1200 ) - sat.cdf(1100 )
589
+ >>> f ' { fraction * 100 :.1f } % score between 1100 and 1200 '
590
+ '18.2% score between 1100 and 1200'
591
+
592
+ To estimate the distribution for a model than isn't easy to solve
593
+ analytically, :class: `NormalDist ` can generate input samples for a `Monte
594
+ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method> `_ of the
595
+ model:
596
+
597
+ .. doctest ::
598
+
599
+ >>> n = 100_000
600
+ >>> X = NormalDist(350 , 15 ).samples(n)
601
+ >>> Y = NormalDist(47 , 17 ).samples(n)
602
+ >>> Z = NormalDist(62 , 6 ).samples(n)
603
+ >>> model_simulation = [x * y / z for x, y, z in zip (X, Y, Z)]
604
+ >>> NormalDist.from_samples(model_simulation) # doctest: +SKIP
605
+ NormalDist(mu=267.6516398754636, sigma=101.357284306067)
606
+
607
+ Normal distributions commonly arise in machine learning problems.
608
+
609
+ Uncyclopedia has a `nice example with a Naive Bayesian Classifier
610
+ <https://en.wikipedia.org/wiki/Naive_Bayes_classifier> `_. The challenge
611
+ is to guess a person's gender from measurements of normally distributed
612
+ features including height, weight, and foot size.
613
+
614
+ The `prior probability <https://en.wikipedia.org/wiki/Prior_probability >`_ of
615
+ being male or female is 50%:
616
+
617
+ .. doctest ::
618
+
619
+ >>> prior_male = 0.5
620
+ >>> prior_female = 0.5
621
+
622
+ We also have a training dataset with measurements for eight people. These
623
+ measurements are assumed to be normally distributed, so we summarize the data
624
+ with :class: `NormalDist `:
625
+
626
+ .. doctest ::
627
+
628
+ >>> height_male = NormalDist.from_samples([6 , 5.92 , 5.58 , 5.92 ])
629
+ >>> height_female = NormalDist.from_samples([5 , 5.5 , 5.42 , 5.75 ])
630
+ >>> weight_male = NormalDist.from_samples([180 , 190 , 170 , 165 ])
631
+ >>> weight_female = NormalDist.from_samples([100 , 150 , 130 , 150 ])
632
+ >>> foot_size_male = NormalDist.from_samples([12 , 11 , 12 , 10 ])
633
+ >>> foot_size_female = NormalDist.from_samples([6 , 8 , 7 , 9 ])
634
+
635
+ We observe a new person whose feature measurements are known but whose gender
636
+ is unknown:
637
+
638
+ .. doctest ::
639
+
640
+ >>> ht = 6.0 # height
641
+ >>> wt = 130 # weight
642
+ >>> fs = 8 # foot size
643
+
644
+ The posterior is the product of the prior times each likelihood of a
645
+ feature measurement given the gender:
646
+
647
+ .. doctest ::
648
+
649
+ >>> posterior_male = (prior_male * height_male.pdf(ht) *
650
+ ... weight_male.pdf(wt) * foot_size_male.pdf(fs))
651
+
652
+ >>> posterior_female = (prior_female * height_female.pdf(ht) *
653
+ ... weight_female.pdf(wt) * foot_size_female.pdf(fs))
654
+
655
+ The final prediction is awarded to the largest posterior -- this is known as
656
+ the `maximum a posteriori
657
+ <https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation> `_ or MAP:
658
+
659
+ .. doctest ::
660
+
661
+ >>> ' male' if posterior_male > posterior_female else ' female'
662
+ 'female'
663
+
664
+
470
665
..
471
666
# This modelines must appear within the last ten lines of the file.
472
667
kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
0 commit comments