History of statistics

sovereign states

.

In early times, the meaning was restricted to information about states, particularly

temperature record, and analytical work which requires statistical inference. Statistical activities are often associated with models expressed using probabilities, hence the connection with probability theory. The large requirements of data processing have made statistics a key application of computing. A number of statistical concepts have an important impact on a wide range of sciences. These include the design of experiments and approaches to statistical inference such as Bayesian inference

, each of which can be considered to have their own sequence in the development of the ideas underlying modern statistics.

Introduction

By the 18th century, the term "

demographic and economic data by states. For at least two millennia, these data were mainly tabulations of human and material resources that might be taxed or put to military use. In the early 19th century, collection intensified, and the meaning of "statistics" broadened to include the discipline concerned with the collection, summary, and analysis of data. Today, data is collected and statistics are computed and widely distributed in government, business, most of the sciences and sports, and even for many pastimes. Electronic computers have expedited more elaborate statistical computation

even as they have facilitated the collection and aggregation of data. A single data analyst may have available a set of data-files with millions of records, each with dozens or hundreds of separate measurements. These were collected over time from computer activity (for example, a stock exchange) or from computerized sensors, point-of-sale registers, and so on. Computers then produce simple, accurate summaries, and allow more tedious analyses, such as those that require inverting a large matrix or perform hundreds of steps of iteration, that would never be attempted by hand. Faster computing has allowed statisticians to develop "computer-intensive" methods which may look at all permutations, or use randomization to look at 10,000 permutations of a problem, to estimate answers that are not easy to quantify by theory alone.

The term "

inductive logic and the scientific method, which are concerns that move statisticians away from the narrower area of mathematical statistics. Much of the theoretical work was readily available by the time computers were available to exploit them. By the 1970s, Johnson and Kotz

produced a four-volume Compendium on Statistical Distributions (1st ed., 1969–1972), which is still an invaluable resource.

Applied statistics can be regarded as not a field of

decision science. With its concerns with searching and effectively presenting data, statistics has overlap with information science and computer science

.

Etymology

Look up statistics in Wiktionary, the free dictionary.

The term statistics is ultimately derived from the

Statistical Account of Scotland.^[1]

Origins in probability theory

Basic forms of statistics have been used since the beginning of civilization. Early empires often collated censuses of the population or recorded the trade in various commodities. The Han dynasty and the Roman Empire were some of the first states to extensively gather data on the size of the empire's population, geographical area and wealth.

The use of statistical methods dates back to at least the 5th century BCE. The historian Thucydides in his History of the Peloponnesian War^[2] describes how the Athenians calculated the height of the wall of Platea by counting the number of bricks in an unplastered section of the wall sufficiently near them to be able to count them. The count was repeated several times by a number of soldiers. The most frequent value (in modern terminology - the mode ) so determined was taken to be the most likely value of the number of bricks. Multiplying this value by the height of the bricks used in the wall allowed the Athenians to determine the height of the ladders necessary to scale the walls.^{[citation needed]}

The Trial of the Pyx is a test of the purity of the coinage of the Royal Mint which has been held on a regular basis since the 12th century. The Trial itself is based on statistical sampling methods. After minting a series of coins - originally from ten pounds of silver - a single coin was placed in the Pyx - a box in Westminster Abbey. After a given period - now once a year - the coins are removed and weighed. A sample of coins removed from the box are then tested for purity.

The Nuova Cronica, a 14th-century history of Florence by the Florentine banker and official Giovanni Villani, includes much statistical information on population, ordinances, commerce and trade, education, and religious facilities and has been described as the first introduction of statistics as a positive element in history,^[3] though neither the term nor the concept of statistics as a specific field yet existed.

The arithmetic mean, although a concept known to the Greeks, was not generalised to more than two values until the 16th century. The invention of the decimal system by Simon Stevin in 1585 seems likely to have facilitated these calculations. This method was first adopted in astronomy by Tycho Brahe who was attempting to reduce the errors in his estimates of the locations of various celestial bodies.

The idea of the median originated in Edward Wright's book on navigation (Certaine Errors in Navigation) in 1599 in a section concerning the determination of location with a compass. Wright felt that this value was the most likely to be the correct value in a series of observations. The difference between the mean and the median was noticed in 1669 by Chistiaan Huygens in the context of using Graunt's tables.^[4]

The term 'statistic' was introduced by the Italian scholar

Laplace in 1802 estimated the population of France with a similar method; see Ratio estimator § History

for details.

Although the original scope of statistics was limited to data useful for governance, the approach was extended to many fields of a scientific or commercial nature during the 19th century. The mathematical foundations for the subject heavily drew on the new

Jakob Bernoulli's Ars Conjectandi (posthumous, 1713) and Abraham de Moivre's The Doctrine of Chances

(1718) treated the subject as a branch of mathematics. In his book Bernoulli introduced the idea of representing complete certainty as one and probability as a number between zero and one.

A key early application of statistics in the 18th century was to the

nonparametric test …",^[9] specifically the sign test; see details at Sign test § History

.

The formal study of

theory of errors may be traced back to Roger Cotes' Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given. Simpson discussed several possible distributions of error. He first considered the uniform distribution and then the discrete symmetric triangular distribution followed by the continuous symmetric triangle distribution. Tobias Mayer, in his study of the libration of the moon

(Kosmographische Nachrichten, Nuremberg, 1750), invented the first formal method for estimating the unknown quantities by generalized the averaging of observations under identical circumstances to the averaging of groups of similar equations.

Roger Joseph Boscovich in 1755 based in his work on the shape of the earth proposed in his book De Litteraria expeditione per pontificiam ditionem ad dimetiendos duos meridiani gradus a PP. Maire et Boscovicli that the true value of a series of observations would be that which minimises the sum of absolute errors. In modern terminology this value is the median. The first example of what later became known as the normal curve was studied by Abraham de Moivre who plotted this curve on November 12, 1733.^[14] de Moivre was studying the number of heads that occurred when a 'fair' coin was tossed.

In 1763 Richard Price transmitted to the Royal Society Thomas Bayes proof of a rule for using a binomial distribution to calculate a posterior probability on a prior event.

In 1765 Joseph Priestley invented the first timeline charts.

Johann Heinrich Lambert in his 1765 book Anlage zur Architectonic proposed the semicircle as a distribution of errors:

f(x)={\frac {1}{2}}{\sqrt {(1-x^{2})}}

with -1 < x < 1.

Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve and deduced a formula for the mean of three observations.

Laplace in 1774 noted that the frequency of an error could be expressed as an exponential function of its magnitude once its sign was disregarded.^[15]^[16] This distribution is now known as the Laplace distribution. Lagrange proposed a parabolic fractal distribution of errors in 1776.

Laplace in 1778 published his second law of errors wherein he noted that the frequency of an error was proportional to the exponential of the square of its magnitude. This was subsequently rediscovered by

C. S. Peirce in 1873 who was studying measurement errors when an object was dropped onto a wooden base.^[18]

He chose the term normal because of its frequent occurrence in naturally occurring variables.

Lagrange also suggested in 1781 two other distributions for errors - a raised cosine distribution and a logarithmic distribution.

Laplace gave (1781) a formula for the law of facility of error (a term due to

Joseph Louis Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli

(1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

In 1786 William Playfair (1759-1823) introduced the idea of graphical representation into statistics. He invented the line chart, bar chart and histogram and incorporated them into his works on economics, the Commercial and Political Atlas. This was followed in 1795 by his invention of the pie chart and circle chart which he used to display the evolution of England's imports and exports. These latter charts came to general attention when he published examples in his Statistical Breviary in 1801.

Laplace, in an investigation of the motions of Saturn and Jupiter in 1787, generalized Mayer's method by using different linear combinations of a single group of equations.

In 1791 Sir John Sinclair introduced the term 'statistics' into English in his Statistical Accounts of Scotland.

In 1802 Laplace estimated the population of France to be 28,328,612.^[19] He calculated this figure using the number of births in the previous year and census data for three communities. The census data of these communities showed that they had 2,037,615 persons and that the number of births were 71,866. Assuming that these samples were representative of France, Laplace produced his estimate for the entire population.

method of least squares

in 1809.

The

method of least squares, which was used to minimize errors in data measurement, was published independently by Adrien-Marie Legendre (1805), Robert Adrain (1808), and Carl Friedrich Gauss (1809). Gauss had used the method in his famous 1801 prediction of the location of the dwarf planet Ceres

. The observations that Gauss based his calculations on were made by the Italian monk Piazzi.

The method of least squares was preceded by the use a median regression slope. This method minimizing the sum of the absolute deviances. A method of estimating this slope was invented by Roger Joseph Boscovich in 1760 which he applied to astronomy.

The term probable error (der wahrscheinliche Fehler) - the median deviation from the mean - was introduced in 1815 by the German astronomer Frederik Wilhelm Bessel. Antoine Augustin Cournot in 1843 was the first to use the term median (valeur médiane) for the value that divides a probability distribution into two equal halves.

Other contributors to the theory of errors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875).^{[citation needed]} Peters's (1856) formula for $r$ , the "probable error" of a single observation was widely used and inspired early robust statistics (resistant to outliers: see Peirce's criterion).

In the 19th century authors on

Laurent (1873), Liagre, Didion, De Morgan and Boole

.

Gustav Theodor Fechner used the median (Centralwerth) in sociological and psychological phenomena.^[20] It had earlier been used only in astronomy and related fields. Francis Galton used the English term median for the first time in 1881 having earlier used the terms middle-most value in 1869 and the medium in 1880.^[21]

suicide rates.^[22]

The first tests of the normal distribution were invented by the German statistician Wilhelm Lexis in the 1870s. The only data sets available to him that he was able to show were normally distributed were birth rates.

Development of modern statistics

Although the origins of statistical theory lie in the 18th-century advances in probability, the modern field of statistics only emerged in the late-19th and early-20th century in three stages. The first wave, at the turn of the century, was led by the work of Francis Galton and Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well. The second wave of the 1910s and 20s was initiated by William Sealy Gosset, and reached its culmination in the insights of Ronald Fisher. This involved the development of better design of experiments models, hypothesis testing and techniques for use with small data samples. The final wave, which mainly saw the refinement and expansion of earlier developments, emerged from the collaborative work between Egon Pearson and Jerzy Neyman in the 1930s.^[23] Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data and for making decisions in the face of uncertainty based on statistical methodology.

The first statistical bodies were established in the early 19th century. The Royal Statistical Society was founded in 1834 and Florence Nightingale, its first female member, pioneered the application of statistical analysis to health problems for the furtherance of epidemiological understanding and public health practice. However, the methods then used would not be considered as modern statistics today.

The

Edgeworth expansion, the Edgeworth series

, the method of variate transformation and the asymptotic theory of maximum likelihood estimates.

The Norwegian

random sampling techniques. His efforts culminated in his New Survey of London Life and Labour.^[26]

Francis Galton is credited as one of the principal founders of statistical theory. His contributions to the field included introducing the concepts of standard deviation, correlation, regression and the application of these methods to the study of the variety of human characteristics - height, weight, eyelash length among others. He found that many of these could be fitted to a normal curve distribution.^[27]

Galton submitted a paper to Nature in 1907 on the usefulness of the median.^[28] He examined the accuracy of 787 guesses of the weight of an ox at a country fair. The actual weight was 1208 pounds: the median guess was 1198. The guesses were markedly non-normally distributed (cf. Wisdom of the Crowd).

Galton's publication of Natural Inheritance in 1889 sparked the interest of a brilliant mathematician,

biometry, and Galton, he founded the journal Biometrika

as the first journal of mathematical statistics and biometry.

His work, and that of Galton, underpins many of the 'classical' statistical methods which are in common use today, including the

Correlation coefficient, defined as a product-moment;^[31] the method of moments for the fitting of distributions to samples; Pearson's system of continuous curves that forms the basis of the now conventional continuous probability distributions; Chi distance a precursor and special case of the Mahalanobis distance^[32] and P-value, defined as the probability measure of the complement of the ball with the hypothesized value as center point and chi distance as radius.^[32]

He also introduced the term 'standard deviation'.

He also founded the

statistical hypothesis testing theory,^[32] Pearson's chi-squared test and principal component analysis.^[33]^[34] In 1911 he founded the world's first university statistics department at University College London

.

The second wave of mathematical statistics was pioneered by

Rothamsted Experimental Station he started a major study of the extensive collections of data recorded over many years. This resulted in a series of reports under the general title Studies in Crop Variation. In 1930 he published The Genetical Theory of Natural Selection where he applied statistics to evolution

.

Over the next seven years, he pioneered the principles of the design of experiments (see below) and elaborated his studies of analysis of variance. He furthered his studies of the statistics of small samples. Perhaps even more important, he began his systematic approach of the analysis of real data as the springboard for the development of new statistical methods. He developed computational algorithms for analyzing data from his balanced experimental designs. In 1925, this work resulted in the publication of his first book, Statistical Methods for Research Workers.^[35] This book went through many editions and translations in later years, and it became the standard reference work for scientists in many disciplines. In 1935, this book was followed by The Design of Experiments, which was also widely used.

In addition to analysis of variance, Fisher named and promoted the method of

F distribution).^[36]

The 5% level of significance appears to have been introduced by Fisher in 1925.[37] Fisher stated that deviations exceeding twice the standard deviation are regarded as significant. Before this deviations exceeding three times the probable error were considered significant. For a symmetrical distribution the probable error is half the interquartile range. For a normal distribution the probable error is approximately 2/3 the standard deviation. It appears that Fisher's 5% criterion was rooted in previous practice.

Other important contributions at this time included Charles Spearman's rank correlation coefficient that was a useful extension of the Pearson correlation coefficient. William Sealy Gosset, the English statistician better known under his pseudonym of Student, introduced Student's t-distribution, a continuous probability distribution useful in situations where the sample size is small and population standard deviation is unknown.

Egon Pearson (Karl's son) and Jerzy Neyman introduced the concepts of "Type II" error, power of a test and confidence intervals. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.^[38]

Design of experiments

James Lind carried out the first ever clinical trial in 1747, in an effort to find a treatment for scurvy

.

In 1747, while serving as surgeon on HM Bark Salisbury,

James Lind carried out a controlled experiment to develop a cure for scurvy.^[39] In this study his subjects' cases "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. The men were paired, which provided blocking

. From a modern perspective, the main thing that is missing is randomized allocation of subjects to treatments.

Lind is today often described as a one-factor-at-a-time experimenter.

John Lawes to determine the optimal inorganic fertilizer for use on wheat.^[40]

A theory of statistical inference was developed by

blinded, repeated-measures design to evaluate their ability to discriminate weights.^[41]^[42]^[43]^[44]

Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s.

Gergonne in 1815.^{[citation needed]} In 1918 Kirstine Smith published optimal designs for polynomials of degree six (and less).^[46]

The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, was pioneered [47] by Abraham Wald in the context of sequential tests of statistical hypotheses.^[48] Surveys are available of optimal sequential designs,^[49] and of adaptive designs.^[50] One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit, on which early work was done by Herbert Robbins in 1952.^[51]

The term "design of experiments" (DOE) derives from early statistical work performed by Sir Ronald Fisher. He was described by Anders Hald as "a genius who almost single-handedly created the foundations for modern statistical science."^[52] Fisher initiated the principles of design of experiments and elaborated on his studies of "analysis of variance". Perhaps even more important, Fisher began his systematic approach to the analysis of real data as the springboard for the development of new statistical methods. He began to pay particular attention to the labour involved in the necessary computations performed by hand, and developed methods that were as practical as they were founded in rigour. In 1925, this work culminated in the publication of his first book, Statistical Methods for Research Workers.^[53] This went into many editions and translations in later years, and became a standard reference work for scientists in many disciplines.^[54]

A methodology for designing experiments was proposed by Ronald A. Fisher, in his innovative book The Design of Experiments (1935) which also became a standard.^[55]^[56]^[57]^[58] As an example, he described how to test the hypothesis that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important ideas of experimental design: see Lady tasting tea.

Agricultural science advances served to meet the combination of larger city populations and fewer farms. But for crop scientists to take due account of widely differing geographical growing climates and needs, it was important to differentiate local growing conditions. To extrapolate experiments on local crops to a national scale, they had to extend crop sample testing economically to overall populations. As statistical methods advanced (primarily the efficacy of designed experiments instead of one-factor-at-a-time experimentation), representative factorial design of experiments began to enable the meaningful extension, by inference, of experimental sampling results to the population as a whole.^{[citation needed]} But it was hard to decide how representative was the crop sample chosen.^{[citation needed]} Factorial design methodology showed how to estimate and correct for any random variation within the sample and also in the data collection procedures.

Bayesian statistics

The term Bayesian refers to

principle of insufficient reason, was called "inverse probability" (because it infers backwards from observations to parameters, or from effects to causes^[62]

).

After the 1920s,

frequentist statistics.^[62] Fisher rejected the Bayesian view, writing that "the theory of inverse probability is founded upon an error, and must be wholly rejected".^[63] At the end of his life, however, Fisher expressed greater respect for the essay of Bayes, which Fisher believed to have anticipated his own, fiducial approach to probability; Fisher still maintained that Laplace's views on probability were "fallacious rubbish".^[63] Neyman started out as a "quasi-Bayesian", but subsequently developed confidence intervals (a key method in frequentist statistics) because "the whole theory would look nicer if it were built from the start without reference to Bayesianism and priors".^[64]

The word Bayesian appeared around 1950, and by the 1960s it became the term preferred by those dissatisfied with the limitations of frequentist statistics.[62]^[65]

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed.^[66] No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the further development of Laplace's ideas, subjective ideas predate objectivist positions. The idea that 'probability' should be interpreted as 'subjective degree of belief in a proposition' was proposed, for example, by

frequentist definition of probability but also with the earlier, objectivist approach of Laplace.^[66] The subjective Bayesian methods were further developed and popularized in the 1950s by L.J. Savage.^{[citation needed}

]

Objective Bayesian inference was further developed by

B.O. Koopman, Howard Raiffa, Robert Schlaifer and Alan Turing

.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications.^[71] Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics.^[72] Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning.^[73]

Important contributors to statistics

References

ISBN 978-0-374-53041-9
.

^ Thucydides (1985). History of the Peloponnesian War. New York: Penguin Books, Ltd. p. 204.

Encyclopædia Britannica 2006 Ultimate Reference Suite DVD
. Retrieved on 2008-03-04.

ISSN 1573-0816
.

doi:10.15611/sps.2014.12.04
.

ISBN 978-0231555647
.

ISBN 978-1-4020-6036-6
.

S2CID 186209819
.

^
ISBN 978-0-471-16068-7

ISBN 978-0-412-44980-2

ISBN 978-0-67440341-3
.

ISBN 978-0-387-95329-8

^ Hald, Anders (1998), "Chapter 4. Chance or Design: Tests of Significance", A History of Mathematical Statistics from 1750 to 1930, Wiley, p. 65

^ de Moivre, A. (1738) The doctrine of chances. Woodfall

^ Laplace, P-S (1774). "Mémoire sur la probabilité des causes par les évènements". Mémoires de l'Académie Royale des Sciences Présentés par Divers Savants. 6: 621–656.

JSTOR 2965467

^ Havil J (2003) Gamma: Exploring Euler's Constant. Princeton, NJ: Princeton University Press, p. 157

C. S. Peirce
(1873) Theory of errors of observations. Report of the Superintendent US Coast Survey, Washington, Government Printing Office. Appendix no. 21: 200-224

ISBN 978-1483237930

^ Keynes, JM (1921) A treatise on probability. Pt II Ch XVII §5 (p 201)

^ Galton F (1881) Report of the Anthropometric Committee pp 245-260. Report of the 51st Meeting of the British Association for the Advancement of Science

^ Stigler (1986, Chapter 5: Quetelet's Two Attempts)

ISBN 9780405066283
.

^ (Stigler 1986, Chapter 9: The Next Generation: Edgeworth)

^ Bellhouse DR (1988) A brief history of random sampling methods. Handbook of statistics. Vol 6 pp 1-14 Elsevier

JSTOR 2339344
.

doi:10.1038/015492a0
.

S2CID 4053860
.

^ Stigler (1986, Chapter 10: Pearson and Yule)

JSTOR 27956805
.

doi:10.1214/ss/1177012580
.

^
doi:10.1080/14786440009463897
.

doi:10.1080/14786440109462720
.

^ Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. New York: Springer-Verlag.

^ Box, R. A. Fisher, pp 93–166

S2CID 18896230
.

^ Fisher RA (1925) Statistical methods for research workers, Edinburgh: Oliver & Boyd

JSTOR 2342192

PMID 9059193
.

^
ISBN 9780470530689
.

^ ^a ^b Charles Sanders Peirce and Joseph Jastrow (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83.

^
S2CID 52201011
.

^
S2CID 143685203
.

^
S2CID 23526321
.

JSTOR 168276
.

JSTOR 2331929
.

^ Johnson, N.L. (1961). "Sequential analysis: a survey." Journal of the Royal Statistical Society, Series A. Vol. 124 (3), 372–411. (pages 375–376)

^ Wald, A. (1945) "Sequential Tests of Statistical Hypotheses", Annals of Mathematical Statistics, 16 (2), 117–186.

ISBN 978-0898710069

ISBN 0-444-82061-2
. (pages 151–180)

doi:10.1090/S0002-9904-1952-09620-8
.

^ Hald, Anders (1998) A History of Mathematical Statistics. New York: Wiley. ^{[page needed]}

ISBN 0-471-09300-9
(pp 93–166)

ISBN 9780444508713
.

S2CID 145725524
.

JSTOR 2682986
.

JSTOR 2528399
.

S2CID 145725524
.

^ ^a ^b ^c Stigler (1986, Chapter 3: Inverse Probability)

^ Hald (1998)^{[page needed]}

^ Lucien Le Cam (1986) Asymptotic Methods in Statistical Decision Theory: Pages 336 and 618–621 (von Mises and Bernstein).

^ ^a ^b ^c Stephen. E. Fienberg, (2006) When did Bayesian Inference become "Bayesian"? Archived 2014-09-10 at the Wayback Machine Bayesian Analysis, 1 (1), 1–40. See page 5.

^
doi:10.1214/08-ba306
.

S2CID 46968744
.

^ Jeff Miller, "Earliest Known Uses of Some of the Words of Mathematics (B)" "The term Bayesian entered circulation around 1950. R. A. Fisher used it in the notes he wrote to accompany the papers in his Contributions to Mathematical Statistics (1950). Fisher thought Bayes's argument was all but extinct for the only recent work to take it seriously was Harold Jeffreys's Theory of Probability (1939). In 1951 L. J. Savage, reviewing Wald's Statistical Decisions Functions, referred to "modern, or unBayesian, statistical theory" ("The Theory of Statistical Decision," Journal of the American Statistical Association, 46, p. 58.). Soon after, however, Savage changed from being an unBayesian to being a Bayesian."

^
ISBN 9780444515391
.

ISBN 0-415-18276-X
pp 50–1

ISBN 0-521-59271-2

^ O'Connor, John J.; Robertson, Edmund F., "History of statistics", MacTutor History of Mathematics Archive, University of St Andrews

^ Bernardo, J. M. and Smith, A. F. M. (1994). "Bayesian Theory". Chichester: Wiley.

MR 2082155
.

^ Bernardo, J. M. (2006). "A Bayesian Mathematical Statistics Primer" (PDF). Proceedings of the Seventh International Conference on Teaching Statistics [CDROM]. Salvador (Bahia), Brazil: International Association for Statistical Education.

ISBN 978-0387310732

Bibliography

Freedman, D. (1999). "From association to causation: Some remarks on the history of statistics". Statistical Science. 14 (3): 243–258.
doi:10.1214/ss/1009212409. (Revised version, 2002
)

ISBN 978-0-471-47129-5
.

ISBN 978-0-471-17912-2
.

Kotz, S., Johnson, N.L. (1992,1992,1997). Breakthroughs in Statistics, Vols I, II, III. Springer
ISBN 0-387-94989-5

ISBN 978-0-02-850120-8
.

ISBN 0-7167-4106-7

ISBN 978-0-674-40341-3
.

Stigler, Stephen M. (1999) Statistics on the Table: The History of Statistical Concepts and Methods. Harvard University Press.
ISBN 0-674-83601-4

David, H. A. (1995). "First (?) Occurrence of Common Terms in Mathematical Statistics". JSTOR 2684625
.

External links

Wikimedia Commons has media related to History of statistics.

JEHPS: Recent publications in the history of probability and statistics

Electronic Journ@l for History of Probability and Statistics/Journ@l Electronique d'Histoire des Probabilités et de la Statistique

Figures from the History of Probability and Statistics (Univ. of Southampton)

Materials for the History of Statistics (Univ. of York)

Probability and Statistics on the Earliest Uses Pages (Univ. of Southampton)

Earliest Uses of Symbols in Probability and Statistics on Earliest Uses of Various Mathematical Symbols

v
t
e
Statistics

Outline

Index

Continuous data
Center

Mean
Arithmetic

Arithmetic-Geometric

Cubic

Generalized/power

Geometric

Harmonic

Heronian

Heinz

Lehmer

Median

Mode

Dispersion

Average absolute deviation

Coefficient of variation

Interquartile range

Percentile

Range

Standard deviation

Variance

Shape

Central limit theorem

Moments
Kurtosis

L-moments

Skewness

Count data

Index of dispersion

Summary tables

Contingency table

Frequency distribution

Grouped data

Dependence

Partial correlation

Pearson product-moment correlation

Rank correlation
Kendall's τ

Spearman's ρ

Scatter plot

Graphics

Bar chart

Biplot

Box plot

Control chart

Correlogram

Fan chart

Forest plot

Histogram

Pie chart

Q–Q plot

Radar chart

Run chart

Scatter plot

Stem-and-leaf display

Violin plot

Data collection
Study design

Effect size

Missing data

Optimal design

Population

Replication

Sample size determination

Statistic

Statistical power

Survey methodology

Sampling
Cluster

Stratified

Opinion poll

Questionnaire

Standard error

Controlled experiments

Blocking

Factorial experiment

Interaction

Random assignment

Randomized controlled trial

Randomized experiment

Scientific control

Adaptive designs

Adaptive clinical trial

Stochastic approximation

Up-and-down designs

Observational studies

Cohort study

Cross-sectional study

Natural experiment

Quasi-experiment

Statistical inference
Statistical theory

Population

Statistic

Probability distribution

Sampling distribution
Order statistic

Empirical distribution
Density estimation

Statistical model
Model specification

L^p space

Parameter
location

scale

shape

Parametric family
Likelihood (monotone)

Location–scale family

Exponential family

Completeness

Sufficiency

Statistical functional

Bootstrap

U

V

Optimal decision
loss function

Efficiency

Statistical distance
divergence

Asymptotics

Robustness

Frequentist inference
Point estimation

Estimating equations
Maximum likelihood

Method of moments

M-estimator

Minimum distance

Unbiased estimators
Mean-unbiased minimum-variance
Rao–Blackwellization

Lehmann–Scheffé theorem

Median unbiased

Plug-in

Interval estimation

Confidence interval

Pivot

Likelihood interval

Prediction interval

Tolerance interval

Resampling
Bootstrap

Jackknife

Testing hypotheses

1- & 2-tails

Power

Uniformly most powerful test

Permutation test
Randomization test

Multiple comparisons

Parametric tests

Likelihood-ratio

Score/Lagrange multiplier

Wald

Specific tests

Z-test (normal)

Student's t-test

F-test

Goodness of fit

Chi-squared

G-test

Kolmogorov–Smirnov

Anderson–Darling

Lilliefors

Jarque–Bera

Normality (Shapiro–Wilk)

Likelihood-ratio test

Model selection
Cross validation

AIC

BIC

Rank statistics

Sign
Sample median

Signed rank (Wilcoxon)
Hodges–Lehmann estimator

Rank sum (Mann–Whitney)

Nonparametric anova
1-way (Kruskal–Wallis)

2-way (Friedman)

Ordered alternative (Jonckheere–Terpstra)

Van der Waerden test

Bayesian inference

Bayesian probability
prior

posterior

Credible interval

Bayes factor

Bayesian estimator
Maximum posterior estimator

Correlation

Pearson product-moment

Partial correlation

Confounding variable

Coefficient of determination

Regression analysis

Errors and residuals

Regression validation

Mixed effects models

Simultaneous equations models

Multivariate adaptive regression splines (MARS)

Linear regression

Simple linear regression

Ordinary least squares

General linear model

Bayesian regression

Non-standard predictors

Nonlinear regression

Nonparametric

Semiparametric

Isotonic

Robust

Heteroscedasticity

Homoscedasticity

Generalized linear model

Exponential families

Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance

Analysis of variance (ANOVA, anova)

Analysis of covariance

Multivariate ANOVA

Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis
Categorical

Cohen's kappa

Contingency table

Graphical model

Log-linear model

McNemar's test

Cochran–Mantel–Haenszel statistics

Multivariate

Regression

Manova

Principal components

Canonical correlation

Discriminant analysis

Cluster analysis

Classification

Structural equation model
Factor analysis

Multivariate distributions

Elliptical distributions
Normal

Time-series
General

Decomposition

Trend

Stationarity

Seasonal adjustment

Exponential smoothing

Cointegration

Structural break

Granger causality

Specific tests

Dickey–Fuller

Johansen

Q-statistic (Ljung–Box)

Durbin–Watson

Breusch–Godfrey

Time domain

Autocorrelation (ACF)
partial (PACF)

Cross-correlation (XCF)

ARMA model

ARIMA model (Box–Jenkins)

Autoregressive conditional heteroskedasticity (ARCH)

Vector autoregression (VAR)

Frequency domain

Spectral density estimation

Fourier analysis

Least-squares spectral analysis

Wavelet

Whittle likelihood

Survival
Survival function

Kaplan–Meier estimator (product limit)

Proportional hazards models

Accelerated failure time (AFT) model

First hitting time

Hazard function

Nelson–Aalen estimator

Test

Log-rank test

Applications
Biostatistics

Bioinformatics

Clinical trials / studies

Epidemiology

Medical statistics

Engineering statistics

Chemometrics

Methods engineering

Probabilistic design

Process / quality control

Reliability

System identification

Social statistics

Actuarial science

Census

Crime statistics

Demography

Econometrics

Jurimetrics

National accounts

Official statistics

Population statistics

Psychometrics

Spatial statistics

Cartography

Environmental statistics

Geographic information system

Geostatistics

Kriging

Category

Mathematics portal

Commons

WikiProject

t
e
History of science
Background

Theories and sociology

Historiography

Pseudoscience

History and philosophy of science

By era

Ancient world

Classical Antiquity

Medieval European

Renaissance

Scientific Revolution

Age of Enlightenment

Romanticism

By culture

African

Argentine

Brazilian

Byzantine

French

Chinese

Indian

Medieval Islamic

Japanese

Korean

Mexican

Russian

Spanish

Natural sciences

Astronomy

Biology

Chemistry

Earth science

Physics

Mathematics

Algebra

Calculus

Combinatorics

Geometry

Logic

Probability

Statistics

Trigonometry

Social sciences

Anthropology

Archaeology

Economics

History

Political science

Psychology

Sociology

Technology

Agricultural science

Computer science

Materials science

Engineering

Medicine

Human medicine

Veterinary medicine

Anatomy

Neuroscience

Neurology and neurosurgery

Nutrition

Pathology

Pharmacy

Timelines

Portal

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=History_of_statistics&oldid=1209294683"

[1] ISBN 978-0-374-53041-9
.

[2] Thucydides (1985). History of the Peloponnesian War. New York: Penguin Books, Ltd. p. 204.

[Villani2008-3] Encyclopædia Britannica 2006 Ultimate Reference Suite DVD
. Retrieved on 2008-03-04.

[4] ISSN 1573-0816
.

[5] :10.15611/sps.2014.12.04
.

[6] ISBN 978-0231555647
.

[7] ISBN 978-1-4020-6036-6
.

[8] S2CID 186209819
.

[Conover1999-9] 
ISBN 978-0-471-16068-7

[Sprent1989-10] ISBN 978-0-412-44980-2

[11] ISBN 978-0-67440341-3
.

[Bellhouse2001-12] ISBN 978-0-387-95329-8

[Hald1998-13] Hald, Anders (1998), "Chapter 4. Chance or Design: Tests of Significance", A History of Mathematical Statistics from 1750 to 1930, Wiley, p. 65

[deMoive1738-14] Moivre, A. (1738) The doctrine of chances. Woodfall

[Laplace1774-15] Laplace, P-S (1774). "Mémoire sur la probabilité des causes par les évènements". Mémoires de l'Académie Royale des Sciences Présentés par Divers Savants. 6: 621–656.

[Wilson1923-16] JSTOR 2965467

[Havil2003-17] Havil J (2003) Gamma: Exploring Euler's Constant. Princeton, NJ: Princeton University Press, p. 157

[Peirce1873-18] C. S. Peirce
(1873) Theory of errors of observations. Report of the Superintendent US Coast Survey, Washington, Government Printing Office. Appendix no. 21: 200-224

[Cochran1978-19] ISBN 978-1483237930

[Keynes1921-20] Keynes, JM (1921) A treatise on probability. Pt II Ch XVII §5 (p 201)

[Galton1881-21] Galton F (1881) Report of the Anthropometric Committee pp 245-260. Report of the 51st Meeting of the British Association for the Advancement of Science

[22] Stigler (1986, Chapter 5: Quetelet's Two Attempts)

[23] ISBN 9780405066283
.

[24] (Stigler 1986, Chapter 9: The Next Generation: Edgeworth)

[Bellhouse1988-25] Bellhouse DR (1988) A brief history of random sampling methods. Handbook of statistics. Vol 6 pp 1-14 Elsevier

[Bowley1906-26] JSTOR 2339344
.

[Galton1877-27] :10.1038/015492a0
.

[Galton1907-28] S2CID 4053860
.

[29] Stigler (1986, Chapter 10: Pearson and Yule)

[30] JSTOR 27956805
.

[31] :10.1214/ss/1177012580
.

[Pearson,_On_the_criterion-32] 
doi:10.1080/14786440009463897
.

[33] :10.1080/14786440109462720
.

[34] Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. New York: Springer-Verlag.

[35] Box, R. A. Fisher, pp 93–166

[36] S2CID 18896230
.

[Fisher1925-37] Fisher RA (1925) Statistical methods for research workers, Edinburgh: Oliver & Boyd

[38] JSTOR 2342192

[ADC1997-39] PMID 9059193
.

[Hinkelmann-40] 
ISBN 9780470530689
.

[smalldiff-41] Charles Sanders Peirce and Joseph Jastrow (1885). "On Small Differences in Sensation". Memoirs of the National Academy of Sciences. 3: 73–83.

[telepathy-42] 
S2CID 52201011
.

[stigler-43] 
S2CID 143685203
.

[dehue-44] 
S2CID 23526321
.

[45] JSTOR 168276
.

[46] JSTOR 2331929
.

[47] Johnson, N.L. (1961). "Sequential analysis: a survey." Journal of the Royal Statistical Society, Series A. Vol. 124 (3), 372–411. (pages 375–376)

[48] Wald, A. (1945) "Sequential Tests of Statistical Hypotheses", Annals of Mathematical Statistics, 16 (2), 117–186.

[ref3-49] ISBN 978-0898710069

[50] ISBN 0-444-82061-2
. (pages 151–180)

[51] :10.1090/S0002-9904-1952-09620-8
.

[52] Hald, Anders (1998) A History of Mathematical Statistics. New York: Wiley. ^{[page needed]}

[53] ISBN 0-471-09300-9
(pp 93–166)

[54] ISBN 9780444508713
.

[55] S2CID 145725524
.

[56] JSTOR 2682986
.

[57] JSTOR 2528399
.

[58] S2CID 145725524
.

[Stigler1986-59] Stigler (1986, Chapter 3: Inverse Probability)

[60] Hald (1998)^{[page needed]}

[61] Lucien Le Cam (1986) Asymptotic Methods in Statistical Decision Theory: Pages 336 and 618–621 (von Mises and Bernstein).

[Fienberg2006-62] Stephen. E. Fienberg, (2006) When did Bayesian Inference become "Bayesian"? Archived 2014-09-10 at the Wayback Machine Bayesian Analysis, 1 (1), 1–40. See page 5.

[ba.stat.cmu.edu-63] 
doi:10.1214/08-ba306
.

[64] S2CID 46968744
.

[Miller_Earliest_Uses-65] Jeff Miller, "Earliest Known Uses of Some of the Words of Mathematics (B)" "The term Bayesian entered circulation around 1950. R. A. Fisher used it in the notes he wrote to accompany the papers in his Contributions to Mathematical Statistics (1950). Fisher thought Bayes's argument was all but extinct for the only recent work to take it seriously was Harold Jeffreys's Theory of Probability (1939). In 1951 L. J. Savage, reviewing Wald's Statistical Decisions Functions, referred to "modern, or unBayesian, statistical theory" ("The Theory of Statistical Decision," Journal of the American Statistical Association, 46, p. 58.). Soon after, however, Savage changed from being an unBayesian to being a Bayesian."

[Bernardo-66] 
ISBN 9780444515391
.

[67] ISBN 0-415-18276-X
pp 50–1

[68] ISBN 0-521-59271-2

[mactutor-69] O'Connor, John J.; Robertson, Edmund F., "History of statistics", MacTutor History of Mathematics Archive, University of St Andrews

[70] Bernardo, J. M. and Smith, A. F. M. (1994). "Bayesian Theory". Chichester: Wiley.

[71] MR 2082155
.

[72] Bernardo, J. M. (2006). "A Bayesian Mathematical Statistics Primer" (PDF). Proceedings of the Seventh International Conference on Teaching Statistics [CDROM]. Salvador (Bahia), Brazil: International Association for Statistical Education.

[ReferenceA-73] ISBN 978-0387310732

[1]

[2]

[3]

[4]

[9]

[14]

[15]

[16]

[18]

[19]

[20]

[21]

[22]

[23]

[26]

[27]

[28]

[31]

[32]

[33]

[34]

[35]

[36]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[46]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[62]

[63]

[64]

[65]

[66]

[71]

[72]

[73]