t-statistic

In

sample means if the population standard deviation is unknown. It is also used along with p-value

when running hypothesis tests where the p-value tells us what the odds are of the results to have happened.

Definition and features

Let ${\hat {\beta }}$ be an estimator of parameter β in some statistical model. Then a t-statistic for this parameter is any quantity of the form

t_{\hat {\beta }}={\frac {{\hat {\beta }}-\beta _{0}}{\operatorname {s.e.} ({\hat {\beta }})}},

where β₀ is a non-random, known constant, which may or may not match the actual unknown parameter value β, and $\operatorname {s.e.} ({\hat {\beta }})$ is the

standard error

of the estimator

{\hat {\beta }}

for β.

By default, statistical packages report t-statistic with β₀ = 0 (these t-statistics are used to test the significance of corresponding regressor). However, when t-statistic is needed to test the hypothesis of the form H₀: β = β₀, then a non-zero β₀ may be used.

If ${\hat {\beta }}$ is an

homoscedastic error terms), and if the true value of the parameter β is equal to β₀, then the sampling distribution of the t-statistic is the Student's t-distribution with (n − k) degrees of freedom, where n is the number of observations, and k is the number of regressors (including the intercept)^{[citation needed}

].

In the majority of models, the estimator ${\hat {\beta }}$ is

asymptotically normally

. If the true value of the parameter β is equal to β₀, and the quantity

\operatorname {s.e.} ({\hat {\beta }})

correctly estimates the asymptotic variance of this estimator, then the t-statistic will asymptotically have the

standard normal

distribution.

In some models the distribution of the t-statistic is different from the normal distribution, even asymptotically. For example, when a time series with a unit root is regressed in the augmented Dickey–Fuller test, the test t-statistic will asymptotically have one of the Dickey–Fuller distributions (depending on the test setting).

Use

Most frequently, t statistics are used in

statistical hypothesis testing, and in the computation of certain confidence intervals

.

The key property of the t statistic is that it is a pivotal quantity – while defined in terms of the sample mean, its sampling distribution does not depend on the population parameters, and thus it can be used regardless of what these may be.

One can also divide a

residual by the sample standard deviation

:

g(x,X)={\frac {x-{\overline {X}}}{s}}

to compute an estimate for the number of standard deviations a given sample is from the mean, as a sample version of a

z-score

, the z-score requiring the population parameters.

Prediction

Given a normal distribution $N(\mu ,\sigma ^{2})$ with unknown mean and variance, the t-statistic of a future observation $X_{n+1},$ after one has made n observations, is an ancillary statistic – a pivotal quantity (does not depend on the values of μ and σ²) that is a statistic (computed from observations). This allows one to compute a frequentist prediction interval (a predictive confidence interval), via the following t-distribution:

{\frac {X_{n+1}-{\overline {X}}_{n}}{s_{n}{\sqrt {1+n^{-1}}}}}\sim T^{n-1}.

Solving for $X_{n+1}$ yields the prediction distribution

{\overline {X}}_{n}+s_{n}{\sqrt {1+n^{-1}}}\cdot T^{n-1},

from which one may compute predictive confidence intervals – given a probability p, one may compute intervals such that 100p% of the time, the next observation $X_{n+1}$ will fall in that interval.

History

The term "t-statistic" is abbreviated from "hypothesis test statistic".^[1]^{[citation needed]} In statistics, the t-distribution was first derived as a posterior distribution in 1876 by Helmert^[2]^[3]^[4] and Lüroth.^[5]^[6]^[7] The t-distribution also appeared in a more general form as Pearson Type IV distribution in Karl Pearson's 1895 paper.^[8] However, the T-Distribution, also known as Student's T Distribution gets its name from William Sealy Gosset who was first to publish the result in English in his 1908 paper titled "The Probable Error of a Mean" (in Biometrika) using his pseudonym "Student"^[9]^[10] because his employer preferred their staff to use pen names when publishing scientific papers instead of their real name, so he used the name "Student" to hide his identity.^[11] Gosset worked at the Guinness Brewery in Dublin, Ireland, and was interested in the problems of small samples – for example, the chemical properties of barley where sample sizes might be as few as 3. Hence a second version of the etymology of the term Student is that Guinness did not want their competitors to know that they were using the t-test to determine the quality of raw material. Although it was William Gosset after whom the term "Student" is penned, it was actually through the work of Ronald Fisher that the distribution became well known as "Student's distribution"^[12]^[13] and "Student's t-test"

Related concepts

standardized testing
.
Studentized residual: In regression analysis, the standard errors of the estimators at different data points vary (compare the middle versus endpoints of a simple linear regression), and thus one must divide the different residuals by different estimates for the error, yielding what are called studentized residuals.

References

Retrieved from "https://en.wikipedia.org/w/index.php?title=T-statistic&oldid=1216622126"

[1] ISBN 978-0-12-820001-8
.

[2] ISBN 978-3-540-13293-6

[3] S2CID 27311567
.

[4] :10.1002/asna.18760880802
.

[5] :10.1002/asna.18760871402
.

[6] MR 1766040
.

[7] S2CID 121241599
.

[8] ISSN 1364-503X
.

[9] JSTOR 2331554
.

[10] "T Table | History of T Table, Etymology, one-tail T Table, two-tail T Table and T-statistic".

[11] PMID 27013722
.

[12] :10.15438/rr.v4i2.72
.

[13] OCLC 818811849
.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Definition and features

Use

Prediction

History

Related concepts

See also

References