68–95–99.7 rule

x-axis

). The y-axis is logarithmically scaled (but the values on it are not modified).

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an

interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean

, respectively.

In mathematical notation, these facts can be expressed as follows, where $Pr()$ is the

σ

(sigma) is its standard deviation:

{\begin{aligned}\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )&\approx 68.27\%\\\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )&\approx 95.45\%\\\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )&\approx 99.73\%\end{aligned}}

The usefulness of this heuristic especially depends on the question under consideration.

In the

empirical sciences, the so-called three-sigma rule of thumb (or 3 $σ$ rule) expresses a conventional heuristic that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7% probability as near certainty.^[2]

In the

confidence level is of the order of a two-sigma effect (95%), while in particle physics, there is a convention of a five-sigma effect (99.99994% confidence) being required to qualify as a discovery

.

A weaker three-sigma rule can be derived from Chebyshev's inequality, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. For unimodal distributions, the probability of being within the interval is at least 95% by the Vysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.^[3]

Proof

We have that

{\begin{aligned}\Pr(\mu -n\sigma \leq X\leq \mu +n\sigma )=\int _{\mu -n\sigma }^{\mu +n\sigma }{\frac {1}{{\sqrt {2\pi }}\sigma }}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}dx,\end{aligned}}

doing the change of variable

u={\frac {x-\mu }{\sigma }}

, we have

{\begin{aligned}{\frac {1}{\sqrt {2\pi }}}\int _{-n}^{n}e^{-{\frac {u^{2}}{2}}}du\end{aligned}},

and this integral is independent of $\mu$ and $\sigma$ . We only need to calculate each integral for the cases $n=1,2,3$ .

{\begin{aligned}&\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )={\frac {1}{\sqrt {2\pi }}}\int _{-1}^{1}e^{-{\frac {u^{2}}{2}}}du\approx 0.6827\\&\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )={\frac {1}{\sqrt {2\pi }}}\int _{-2}^{2}e^{-{\frac {u^{2}}{2}}}du\approx 0.9545\\&\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )={\frac {1}{\sqrt {2\pi }}}\int _{-3}^{3}e^{-{\frac {u^{2}}{2}}}du\approx 0.9973.\end{aligned}}

Cumulative distribution function

These numerical values "68%, 95%, 99.7%" come from the cumulative distribution function of the normal distribution.

The prediction interval for any standard score z corresponds numerically to $(1 - (1 - Φ μ, σ 2 (z)) \cdot 2)$ .

For example, $Φ (2) \approx 0.9772$ , or $Pr(X \leq μ + 2 σ) \approx 0.9772$ , corresponding to a prediction interval of $(1 - (1 - 0.97725)\cdot2) = 0.9545 = 95.45%$ . This is not a symmetrical interval – this is merely the probability that an observation is less than $μ + 2 σ$ . To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):

\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )=\Phi (2)-\Phi (-2)\approx 0.9772-(1-0.9772)\approx 0.9545

This is related to confidence interval as used in statistics: ${\bar {X}}\pm 2{\frac {\sigma }{\sqrt {n}}}$ is approximately a 95% confidence interval when ${\bar {X}}$ is the average of a sample of size $n$ .

Normality tests

The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for

outliers if the population is assumed normal, and as a normality test

if the population is potentially not normal.

To pass from a sample to a number of standard deviations, one first computes the

studentizing

(dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.

To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the

sample size

is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.

One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a Poisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.

For example, a 6σ event corresponds to a chance of about two

parts per billion. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives a simple normality test

: if one witnesses a 6σ in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect.

In The Black Swan, Nassim Nicholas Taleb gives the example of risk models according to which the Black Monday crash would correspond to a 36-σ event: the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction of

statistical hypothesis testing works not so much by confirming a hypothesis considered to be likely, but by refuting hypotheses considered unlikely

.

Table of numerical values

Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:

Range	Expected fraction of population inside range	Expected fraction of population outside range	Approx. expected frequency outside range		Approx. frequency outside range for daily event
μ ± 0.5σ	0.382924922548026	6.171E-01 = 61.71 %	3 in	5	Four or five times a week
μ ± σ	0.682689492137086^[4]	3.173E-01 = 31.73 %	1 in	3	Twice or thrice a week
μ ± 1.5σ	0.866385597462284	1.336E-01 = 13.36 %	2 in	15	Weekly
μ ± 2σ	0.954499736103642^[5]	4.550E-02 = 4.550 %	1 in	22	Every three weeks
μ ± 2.5σ	0.987580669348448	1.242E-02 = 1.242 %	1 in	81	Quarterly
μ ± 3σ	0.997300203936740^[6]	2.700E-03 = 0.270 % = 2.700 ‰	1 in	370	Yearly
μ ± 3.5σ	0.999534741841929	4.653E-04 = 0.04653 % = 465.3 ppm	1 in	2149	Every 6 years
μ ± 4σ	0.999936657516334	6.334E-05 = 63.34 ppm	1 in	15787	Every 43 years (twice in a lifetime)
μ ± 4.5σ	0.999993204653751	6.795E-06 = 6.795 ppm	1 in	147160	Every 403 years (once in the modern era)
μ ± 5σ	0.999999426696856	5.733E-07 = 0.5733 ppm = 573.3 ppb	1 in	1744278	Every 4776 years (once in recorded history)
μ ± 5.5σ	0.999999962020875	3.798E-08 = 37.98 ppb	1 in	26330254	Every 72090 years (thrice in history of modern humankind )
μ ± 6σ	0.999999998026825	1.973E-09 = 1.973 ppb	1 in	506797346	Every 1.38 million years (twice in history of humankind )
μ ± 6.5σ	0.999999999919680	8.032E-11 = 0.08032 ppb = 80.32 ppt	1 in	12450197393	Every 34 million years (twice since the extinction of dinosaurs )
μ ± 7σ	0.999999999997440	2.560E-12 = 2.560 ppt	1 in	390682215445	Every 1.07 billion years (four occurrences in history of Earth)
μ ± 7.5σ	0.999999999999936	6.382E-14 = 63.82 ppq	1 in	15669601204101	Once every 43 billion years (never in the history of the Universe, twice in the future of the Local Group before its merger)
μ ± 8σ	0.999999999999999	1.244E-15 = 1.244 ppq	1 in	803734397655348	Once every 2.2 trillion years (never in the history of the Universe, once during the life of a red dwarf)
μ ± $x$ σ	$\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	$1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	1 in	${\tfrac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$	Every ${\tfrac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$ days

References

ISBN 9780190845414
.

^ This usage of "three-sigma rule" entered common usage in the 2000s, e.g. cited in
Schaum's Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359.
ISBN 9780071398763

Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed Effects, Random Effects, and Mixed Models. Walter de Gruyter. p. 553.
ISBN 9783110162165
.

^ See:
Wheeler, D. J.; Chambers, D. S. (1992). Understanding Statistical Process Control. SPC Press.
ISBN 9780945320135
.

ISBN 9780898713947
.

Pukelsheim, F. (1994). "The Three Sigma Rule". American Statistician. 48 (2): 88–91.
JSTOR 2684253
.

^ Sloane, N. J. A. (ed.). "Sequence A178647". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

^ Sloane, N. J. A. (ed.). "Sequence A110894". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

^ Sloane, N. J. A. (ed.). "Sequence A270712". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

External links

"Calculate percentage proportion within x sigmas at WolframAlpha

v
t
e
Probability distributions (list)
Discrete
univariate
with finite
support

Benford

Bernoulli

beta-binomial

binomial

categorical

hypergeometric
negative

Poisson binomial

Rademacher

soliton

discrete uniform

Zipf

Zipf–Mandelbrot

with infinite
support

beta negative binomial

Borel

Conway–Maxwell–Poisson

discrete phase-type

Delaporte

extended negative binomial

Flory–Schulz

Gauss–Kuzmin

geometric

logarithmic

mixed Poisson

negative binomial

Panjer

parabolic fractal

Poisson

Skellam

Yule–Simon

zeta

Continuous
univariate
supported on a
bounded interval

arcsine

ARGUS

Balding–Nichols

Bates

beta

beta rectangular

continuous Bernoulli

Irwin–Hall

Kumaraswamy

logit-normal

noncentral beta

PERT

raised cosine

reciprocal

triangular

U-quadratic

uniform

Wigner semicircle

supported on a
semi-infinite
interval

Benini

Benktander 1st kind

Benktander 2nd kind

beta prime

Burr

chi

chi-squared
noncentral

inverse
scaled

Dagum

Davis

Erlang
hyper

exponential
hyperexponential

hypoexponential

logarithmic

F
noncentral

folded normal

Fréchet

gamma
generalized

inverse

gamma/Gompertz

Gompertz
shifted

half-logistic

half-normal

Hotelling's T-squared

inverse Gaussian
generalized

Kolmogorov

Lévy

log-Cauchy

log-Laplace

log-logistic

log-normal

log-t

Lomax

matrix-exponential

Maxwell–Boltzmann

Maxwell–Jüttner

Mittag-Leffler

Nakagami

Pareto

phase-type

Poly-Weibull

Rayleigh

relativistic Breit–Wigner

Rice

truncated normal

type-2 Gumbel

Weibull
discrete

Wilks's lambda

supported
on the whole
real line

Cauchy

exponential power

Fisher's z

Kaniadakis κ-Gaussian

Gaussian q

generalized normal

generalized hyperbolic

geometric stable

Gumbel

Holtsmark

hyperbolic secant

Johnson's S_U

Landau

Laplace
asymmetric

logistic

noncentral t

normal (Gaussian)

normal-inverse Gaussian

skew normal

slash

stable

Student's t

Tracy–Widom

variance-gamma

Voigt

with support
whose type varies

generalized chi-squared

generalized extreme value

generalized Pareto

Marchenko–Pastur

Kaniadakis κ-exponential

Kaniadakis κ-Gamma

Kaniadakis κ-Weibull

Kaniadakis κ-Logistic

Kaniadakis κ-Erlang

q-exponential

q-Gaussian

q-Weibull

shifted log-logistic

Tukey lambda

Mixed
univariate
continuous-
discrete

Rectified Gaussian

Multivariate
(joint)

Discrete:

Ewens

multinomial
Dirichlet

negative

Continuous:

Dirichlet
generalized

multivariate Laplace

multivariate normal

multivariate stable

multivariate t

normal-gamma
inverse

Matrix-valued:

LKJ

matrix normal

matrix t

matrix gamma
inverse

Wishart
normal

inverse

normal-inverse

complex

Directional

Univariate (circular) directional

Circular uniform

univariate von Mises

wrapped normal

wrapped Cauchy

wrapped exponential

wrapped asymmetric Laplace

wrapped Lévy

Bivariate (spherical)

Kent

Bivariate (toroidal)

bivariate von Mises

Multivariate

von Mises–Fisher

Bingham

Degenerate
and singular

Degenerate

Dirac delta function

Singular

Cantor

Families

Circular

compound Poisson

elliptical

exponential

natural exponential

location–scale

maximum entropy

mixture

Pearson

Tweedie

wrapped

Category

Commons

Retrieved from "https://en.wikipedia.org/w/index.php?title=68–95–99.7_rule&oldid=1214533040"

[1] ISBN 9780190845414
.

[2] This usage of "three-sigma rule" entered common usage in the 2000s, e.g. cited in
Schaum's Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359.
ISBN 9780071398763

Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed Effects, Random Effects, and Mixed Models. Walter de Gruyter. p. 553.
ISBN 9783110162165
.

[3] Schaum's Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359.
ISBN 9780071398763

[4] Grafarend, Erik W. (2006). Linear and Nonlinear Models: Fixed Effects, Random Effects, and Mixed Models. Walter de Gruyter. p. 553.
ISBN 9783110162165
.

[3] See:
Wheeler, D. J.; Chambers, D. S. (1992). Understanding Statistical Process Control. SPC Press.
ISBN 9780945320135
.

ISBN 9780898713947
.

Pukelsheim, F. (1994). "The Three Sigma Rule". American Statistician. 48 (2): 88–91.
JSTOR 2684253
.

[6] Wheeler, D. J.; Chambers, D. S. (1992). Understanding Statistical Process Control. SPC Press.
ISBN 9780945320135
.

[7] ISBN 9780898713947
.

[8] Pukelsheim, F. (1994). "The Three Sigma Rule". American Statistician. 48 (2): 88–91.
JSTOR 2684253
.

[4] Sloane, N. J. A. (ed.). "Sequence A178647". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

[5] Sloane, N. J. A. (ed.). "Sequence A110894". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

[6] Sloane, N. J. A. (ed.). "Sequence A270712". The On-Line Encyclopedia of Integer Sequences. OEIS Foundation.

[2]

[3]

[4]

[5]

[6]

Proof

Cumulative distribution function

Normality tests

Table of numerical values

See also

References

External links