Truncation (statistics)
In statistics, truncation results in values that are limited above or below, resulting in a truncated sample.[1] A random variable is said to be truncated from below if, for some threshold value , the exact value of is known for all cases , but unknown for all cases . Similarly, truncation from above means the exact value of is known in cases where , but unknown when .[2]
Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. With statistical censoring, a note would be recorded documenting which bound (upper or lower) had been exceeded and the value of that bound. With truncated sampling, no note is recorded.
Applications
Usually the values that
Probability distributions
Truncation can be applied to any probability distribution. This will usually lead to a new distribution, not one within the same family. Thus, if a random variable X has F(x) as its distribution function, the new random variable Y defined as having the distribution of X truncated to the semi-open interval (a, b] has the distribution function
for y in the interval (a, b], and 0 or 1 otherwise. If truncation were to the closed interval [a, b], the distribution function would be
for y in the interval [a, b], and 0 or 1 otherwise.
Data analysis
The analysis of data where observations are treated as being from truncated versions of standard distributions can be undertaken using
In practice, if the fraction truncated is very small the effect of truncation might be ignored when analysing data. For example, it is common to use a normal distribution to model data whose values can only be positive but for which the typical range of values is well away from zero. In such cases, a truncated or censored version of the normal distribution may formally be preferable (although there would be alternatives); there would be very little change in results from the more complicated analysis. However, software is readily available for maximum-likelihood estimation of even moderately complicated models, such as regression models, for truncated data.[3]
In econometrics, truncated dependent variables are variables for which observations cannot be made for certain values in some range.[4] Regression models with such dependent variables require special care that properly recognizes the truncated nature of the variable. Estimation of such truncated regression model can be done in parametric,[5][6][7] or semi- and non-parametric frameworks.[8][9]
See also
References
- ISBN 0-19-920613-9
- ISBN 0-8039-5710-6.
- JSTOR 2346749.
- About.com. Retrieved 2008-03-22.
- JSTOR 1914031.
- ^ Heckman, James (1976). "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models". Annals of Economic and Social Measurement. 5 (4): 475–492.
- S2CID 255455365.)
{{cite journal}}
: CS1 maint: multiple names: authors list (link - S2CID 120113700.
- S2CID 55496460.