68–95–99.7 rule
This article needs additional citations for verification. (September 2023) |
In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an
In mathematical notation, these facts can be expressed as follows, where Pr() is the
The usefulness of this heuristic especially depends on the question under consideration.
In the
In the
A weaker three-sigma rule can be derived from Chebyshev's inequality, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within the interval is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.[3]
Proof
We have that
and this integral is independent of and . We only need to calculate each integral for the cases .
Cumulative distribution function
These numerical values "68%, 95%, 99.7%" come from the cumulative distribution function of the normal distribution.
The prediction interval for any standard score z corresponds numerically to (1 − (1 − Φμ,σ2(z)) · 2).
For example, Φ(2) ≈ 0.9772, or Pr(X ≤ μ + 2σ) ≈ 0.9772, corresponding to a prediction interval of (1 − (1 − 0.97725)·2) = 0.9545 = 95.45%. This is not a symmetrical interval – this is merely the probability that an observation is less than μ + 2σ. To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):
This is related to confidence interval as used in statistics: is approximately a 95% confidence interval when is the average of a sample of size .
Normality tests
The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for
To pass from a sample to a number of standard deviations, one first computes the
To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the
One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a Poisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.
For example, a 6σ event corresponds to a chance of about two
In The Black Swan, Nassim Nicholas Taleb gives the example of risk models according to which the Black Monday crash would correspond to a 36-σ event: the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction of
Table of numerical values
Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:
Range | Expected fraction of
population inside range |
Expected fraction of
population outside range |
Approx. expected frequency outside range |
Approx. frequency outside range for daily event | |
---|---|---|---|---|---|
μ ± 0.5σ | 0.382924922548026 | 6.171E-01 = 61.71 % | 3 in | 5 | Four or five times a week |
μ ± σ | 0.682689492137086[4] | 3.173E-01 = 31.73 % | 1 in | 3 | Twice or thrice a week |
μ ± 1.5σ | 0.866385597462284 | 1.336E-01 = 13.36 % | 2 in | 15 | Weekly |
μ ± 2σ | 0.954499736103642[5] | 4.550E-02 = 4.550 % | 1 in | 22 | Every three weeks |
μ ± 2.5σ | 0.987580669348448 | 1.242E-02 = 1.242 % | 1 in | 81 | Quarterly |
μ ± 3σ | 0.997300203936740[6] | 2.700E-03 = 0.270 % = 2.700 ‰ | 1 in | 370 | Yearly |
μ ± 3.5σ | 0.999534741841929 | 4.653E-04 = 0.04653 % = 465.3 ppm | 1 in | 2149 | Every 6 years |
μ ± 4σ | 0.999936657516334 | 6.334E-05 = 63.34 ppm | 1 in | 15787 | Every 43 years (twice in a lifetime) |
μ ± 4.5σ | 0.999993204653751 | 6.795E-06 = 6.795 ppm | 1 in | 147160 | Every 403 years (once in the modern era) |
μ ± 5σ | 0.999999426696856 | 5.733E-07 = 0.5733 ppm = 573.3 ppb | 1 in | 1744278 | Every 4776 years (once in recorded history) |
μ ± 5.5σ | 0.999999962020875 | 3.798E-08 = 37.98 ppb | 1 in | 26330254 | Every 72090 years (thrice in history of modern humankind )
|
μ ± 6σ | 0.999999998026825 | 1.973E-09 = 1.973 ppb | 1 in | 506797346 | Every 1.38 million years (twice in history of humankind )
|
μ ± 6.5σ | 0.999999999919680 | 8.032E-11 = 0.08032 ppb = 80.32 ppt | 1 in | 12450197393 | Every 34 million years (twice since the extinction of dinosaurs )
|
μ ± 7σ | 0.999999999997440 | 2.560E-12 = 2.560 ppt | 1 in | 390682215445 | Every 1.07 billion years (four occurrences in history of Earth) |
μ ± 7.5σ | 0.999999999999936 | 6.382E-14 = 63.82 ppq | 1 in | 15669601204101 | Once every 43 billion years (never in the history of the Universe, twice in the future of the Local Group before its merger) |
μ ± 8σ | 0.999999999999999 | 1.244E-15 = 1.244 ppq | 1 in | 803734397655348 | Once every 2.2 trillion years (never in the history of the Universe, once during the life of a red dwarf) |
μ ± xσ | 1 in | Every days |
See also
References
- ISBN 9780190845414.
- ^ This usage of "three-sigma rule" entered common usage in the 2000s, e.g. cited in
- Schaum's Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359. ISBN 9780071398763
- Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed Effects, Random Effects, and Mixed Models. Walter de Gruyter. p. 553. ISBN 9783110162165.
- Schaum's Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359.
- ^ See:
- Wheeler, D. J.; Chambers, D. S. (1992). Understanding Statistical Process Control. SPC Press. ISBN 9780945320135.
- ISBN 9780898713947.
- Pukelsheim, F. (1994). "The Three Sigma Rule". American Statistician. 48 (2): 88–91. JSTOR 2684253.
- Wheeler, D. J.; Chambers, D. S. (1992). Understanding Statistical Process Control. SPC Press.
- ^ Sloane, N. J. A. (ed.). "Sequence A178647". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.
- ^ Sloane, N. J. A. (ed.). "Sequence A110894". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.
- ^ Sloane, N. J. A. (ed.). "Sequence A270712". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.
External links
- "Calculate percentage proportion within x sigmas at WolframAlpha