Controlling for a variable
This article needs additional citations for verification. (February 2017) |
In
When estimating the effect of explanatory variables on an outcome by regression, controlled-for variables are included as inputs in order to separate their effects from the explanatory variables.[1]
A limitation of controlling for variables is that a causal model is needed to identify important confounders (backdoor criterion is used for the identification). Without having one, a possible confounder might remain unnoticed. Another associated problem is that if a variable which is not a real confounder is controlled for, it may in fact make other variables (possibly not taken into account) become confounders while they were not confounders before. In other cases, controlling for a non-confounding variable may cause underestimation of the true causal effect of the explanatory variables on an outcome (e.g. when controlling for a
Experiments
Experiments attempt to assess the effect of manipulating one or more independent variables on one or more dependent variables. To ensure the measured effect is not influenced by external factors, other variables must be held constant. The variables made to remain constant during an experiment are referred to as control variables.
For example, if an outdoor experiment were to be conducted to compare how different wing designs of a
In controlled experiments of medical treatment options on humans, researchers randomly assign individuals to a treatment group or control group. This is done to reduce the confounding effect of irrelevant variables that are not being studied, such as the placebo effect.
Observational studies
In an observational study, researchers have no control over the values of the independent variables, such as who receives the treatment. Instead, they must control for variables using statistics.
Observational studies are used when controlled experiments may be unethical or impractical. For instance, if a researcher wished to study the effect of unemployment (
In this context the extraneous variables can be controlled for by using
OLS Regressions and control variables
The simplest examples of control variables in regression analysis comes from Ordinary Least Squares (OLS) estimators. The OLS framework assumes the following:
- Linear relationship - OLS statistical models are linear. Hence the relationship between explanatory variables and the mean of Y must be linear.
- Homoscedasticity - This requires homogeneity of variances, that is equal or similar variances across these data.
- Independence/No Autocorrelation - Error terms from one (or more) observation can not be influenced by error terms of other observations.
- Normality of Errors - The errors are jointly normal and uncorrelated, this implies that i.e. that the error terms are an independently and identically distributed set (iid). This implies that the unobservables between different groups or observations are independent.
- No multicollinearity - Independent variables must not be highly correlated with each other. For regressions using matrix notation, the matrix must be full rank i.e. is invertible.
Accordingly, a control variable can be interpreted as a linear explanatory variable that affects the mean value of Y (Assumption 1), but which does not present the primary variable of investigation, and which also satisfies the other assumptions above.[4]
Example
Consider a study about whether getting older affects someone's life satisfaction. (Some researchers perceive a "u-shape": life satisfaction appears to decline first and then rise after middle age.[5]) To identify the control variables needed here, one could ask what other variables determine not only someone's life satisfaction but also their age. Many other variables determine life satisfaction. But no other variable determines how old someone is (as long as they remain alive). (All people keep getting older, at the same rate, no matter what their other characteristics.) So, no control variables are needed here.[6]
To determine the needed control variables, it can be useful to construct a directed acyclic graph.[3]
See also
References
- ^ Frost, Jim. "A Tribute to Regression Analysis | Minitab". Retrieved 2015-08-04.
- S2CID 11155639.
- ^ ISBN 978-0-241-24263-6.
- OCLC 1225621417.
- PMID 18316146.
- .
Further reading
- Freedman, David; Pisani, Robert; Purves, Roger (2007). Statistics. W. W. Norton & Company. ISBN 978-0393929720.