Observed information

In

log-likelihood" (the logarithm of the likelihood function). It is a sample-based version of the Fisher information

Definition

Suppose we observe random variables $X_{1},\ldots ,X_{n}$ , independent and identically distributed with density f(X; θ), where θ is a (possibly unknown) vector. Then the log-likelihood of the parameters $\theta$ given the data $X_{1},\ldots ,X_{n}$ is

\ell (\theta |X_{1},\ldots ,X_{n})=\sum _{i=1}^{n}\log f(X_{i}|\theta )

.

We define the observed information matrix at $\theta ^{*}$ as

{\mathcal {J}}(\theta ^{*})=-\left.\nabla \nabla ^{\top }\ell (\theta )\right|_{\theta =\theta ^{*}}

=-\left.\left({\begin{array}{cccc}{\tfrac {\partial ^{2}}{\partial \theta _{1}^{2}}}&{\tfrac {\partial ^{2}}{\partial \theta _{1}\partial \theta _{2}}}&\cdots &{\tfrac {\partial ^{2}}{\partial \theta _{1}\partial \theta _{p}}}\\{\tfrac {\partial ^{2}}{\partial \theta _{2}\partial \theta _{1}}}&{\tfrac {\partial ^{2}}{\partial \theta _{2}^{2}}}&\cdots &{\tfrac {\partial ^{2}}{\partial \theta _{2}\partial \theta _{p}}}\\\vdots &\vdots &\ddots &\vdots \\{\tfrac {\partial ^{2}}{\partial \theta _{p}\partial \theta _{1}}}&{\tfrac {\partial ^{2}}{\partial \theta _{p}\partial \theta _{2}}}&\cdots &{\tfrac {\partial ^{2}}{\partial \theta _{p}^{2}}}\\\end{array}}\right)\ell (\theta )\right|_{\theta =\theta ^{*}}

Since the inverse of the information matrix is the

maximum-likelihood estimators

allows the observed information matrix to be evaluated before being inverted.

Alternative definition

Andrew Gelman, David Dunson and Donald Rubin^[2] define observed information instead in terms of the parameters' posterior probability, $p(\theta |y)$ :

$I(\theta )=-{\frac {d^{2}}{d\theta ^{2}}}\log p(\theta |y)$

Fisher information

The Fisher information ${\mathcal {I}}(\theta )$ is the expected value of the observed information given a single observation $X$ distributed according to the hypothetical model with parameter $\theta$ :

{\mathcal {I}}(\theta )=\mathrm {E} ({\mathcal {J}}(\theta ))

.

Comparison with the expected information

The comparison between the observed information and the expected information remains an active and ongoing area of research and debate.

maximum-likelihood estimator in one-parameter families in the presence of an ancillary statistic that affects the precision of the MLE. Lindsay and Li showed that the observed information matrix gives the minimum mean squared error

as an approximation of the true information if an error term of

O(n^{-3/2})

is ignored.[4] In Lindsay and Li's case, the expected information matrix still requires evaluation at the obtained ML estimates, introducing randomness.

However, when the construction of

confidence intervals is of primary focus, there are reported findings that the expected information outperforms the observed counterpart. Yuan and Spall showed that the expected information outperforms the observed counterpart for confidence-interval constructions of scalar parameters in the mean squared error sense.^[5] This finding was later generalized to multiparameter cases, although the claim had been weakened to the expected information matrix performing at least as well as the observed information matrix.^[6]

References

ISBN 0-19-920613-9

^ Gelman, Andrew; Carlin, John; Stern, Hal; Dunson, David; Vehtari, Aki; Rubin, Donald (2014). Bayesian Data Analysis (3rd ed.). p. 84.

MR 0521817
.

doi:10.1214/aos/1069362393
.

S2CID 220888731
.

S2CID 233332868
.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Observed_information&oldid=1182977712"

[1] ISBN 0-19-920613-9

[2] Gelman, Andrew; Carlin, John; Stern, Hal; Dunson, David; Vehtari, Aki; Rubin, Donald (2014). Bayesian Data Analysis (3rd ed.). p. 84.

[3] MR 0521817
.

[4] :10.1214/aos/1069362393
.

[5] S2CID 220888731
.

[6] S2CID 233332868
.

[2]

[5]

[6]

Definition

Alternative definition

Fisher information

Comparison with the expected information

See also

References