Conditional logistic regression: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 21:12, 17 February 2021

Conditional logistic regression is an extension of

Ross L. Prentice and C. Sabai.^[1]

It is the most flexible and general procedure for matched data.

Motivation

Observational studies use stratification or matching as a way to control for confounding. Several tests existed before conditional logistic regression for matched data as shown in related tests. However, they did not allow for the analysis of continuous predictors with arbitrary stratum size. All of those procedures also lack the flexibility of conditional logistic regression and in particular the possibility to control for covariates.

Logistic regression can take into account stratification by having a different constant term for each stratum. Let us denote $Y_{i\ell }\in \{0,1\}$ the label (e.g. case status) of the $\ell$ th observation of the $i$ th stratum and $X_{i\ell }\in \mathbb {R} ^{p}$ the values of the corresponding predictors. Then, the likelihood of one observation is

\mathbb {P} (Y_{i\ell }=1|X_{i\ell })={\frac {\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i\ell })}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i\ell })}}

where $\alpha _{i}$ is the constant term for the $i$ th stratum. While this works satisfactorily for a limited number of strata, pathological behavior occurs when the strata are small. When the strata are pairs, the number of parameters grows with the number of observations $N$ (it equals ${\frac {N}{2}}+p$ ). The asymptotic results on which maximum likelihood estimation is based on are therefore not valid and the estimation is biased. In fact, it can be shown that the unconditional analysis of matched pair data results in an estimate of the odds ratio which is the square of the correct, conditional one.^[2]

Conditional likelihood

The conditional likelihood approach deals with the above pathological behavior by conditioning on the number of cases in each stratum and therefore eliminating the need to estimate the strata parameters. In the case where the strata are pairs, where the first observation is a case and the second is a control, this can be seen as follows

{\begin{aligned}&\mathbb {P} (Y_{i1}=1,Y_{i2}=0|X_{i1},X_{i2},Y_{i1}+Y_{i2}=1)\\&={\frac {\mathbb {P} (Y_{i1}=1|X_{i1})\mathbb {P} (Y_{i2}=0|X_{i2})}{\mathbb {P} (Y_{i1}=1|X_{i1})\mathbb {P} (Y_{i2}=0|X_{i2})+\mathbb {P} (Y_{i1}=0|X_{i1})\mathbb {P} (Y_{i2}=1|X_{i2})}}\\[6pt]\ &={\frac {{\frac {\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i1})}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i1})}}\times {\frac {1}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i2})}}}{{\frac {\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i1})}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i1})}}\times {\frac {1}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i2})}}+{\frac {1}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i1})}}\times {\frac {\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i2})}{1+\exp(\alpha _{i}+{\boldsymbol {\beta }}^{\top }X_{i2})}}}}\\[6pt]\ &={\frac {\exp({\boldsymbol {\beta }}^{\top }X_{i1})}{\exp({\boldsymbol {\beta }}^{\top }X_{i1})+\exp({\boldsymbol {\beta }}^{\top }X_{i2})}}.\\[6pt]\end{aligned}}

With similar computations, the conditional likelihood of a stratum of size $m$ , with the $k$ first observations being the cases, is

\mathbb {P} (Y_{ij}=1{\text{ for }}j\leq k,Y_{ij}=0{\text{ for }}k<j\leq m|X_{i1},...,X_{im},\sum _{j=1}^{m}Y_{ij}=k)={\frac {\exp(\sum _{j=1}^{k}{\boldsymbol {\beta }}^{\top }X_{ij})}{\sum _{J\in {\mathcal {C}}_{k}^{m}}\exp(\sum _{j\in J}{\boldsymbol {\beta }}^{\top }X_{ij})}},

where ${\mathcal {C}}_{k}^{m}$ is the set of all subsets of size $k$ of the set $\{1,...,m\}$ .

The full conditional log likelihood is then simply the sum of the log likelihoods for each stratum. The estimator is then defined as the $\beta$ that maximizes the conditional log likelihood.

Implementation

Conditional logistic regression is available in R as the function clogit in the survival package. It is in the survival package because the log likelihood of a conditional logistic model is the same as the log likelihood of a Cox model with a particular data structure.^[3]

Related tests

Paired difference test allows to test the association between a binary outcome and a continuous predictor while taking into account pairing.
Cochran-Mantel-Haenszel test allows to test the association between a binary outcome and a binary predictor while taking into account stratification with arbitrary strata size. When its conditions of application are verified, it is identical to the conditional logistic regression score test.^[4]

Notes

PMID 727199
.

^ Breslow, N.E.; Day, N.E. (1980). Statistical Methods in Cancer Research. Volume 1-The Analysis of Case-Control Studies. Lyon, France: IARC. pp. 249–251. Archived from the original on 2016-12-26. Retrieved 2016-11-04.

^ Lumley, Thomas. "R documentation Conditional logistic regression". Retrieved November 3, 2016.

doi:10.2307/2530253.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

External Links

Weisstein, Eric W. "Conditional Logit Regression." From MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/ConditionalLogitRegression.html

Retrieved from "https://en.wikipedia.org/w/index.php?title=Conditional_logistic_regression&oldid=1007383438"

[pmid727199-1] PMID 727199
.

[2] Breslow, N.E.; Day, N.E. (1980). Statistical Methods in Cancer Research. Volume 1-The Analysis of Case-Control Studies. Lyon, France: IARC. pp. 249–251. Archived from the original on 2016-12-26. Retrieved 2016-11-04.

[3] Lumley, Thomas. "R documentation Conditional logistic regression". Retrieved November 3, 2016.

[4] :10.2307/2530253.{{cite journal}}: CS1 maint: multiple names: authors list (link
)

[1]

[2]

[3]

[4]

@@ Line 39: / Line 39: @@
 {{reflist}}
+== External Links ==
+* Weisstein, Eric W. "Conditional Logit Regression." From ''MathWorld''--A Wolfram Web Resource. https://mathworld.wolfram.com/ConditionalLogitRegression.html
 [[Category:Logistic regression]]