Point estimation

2.1 Estimators and estimates

The problem

Given a sample \(\mathbf{x}\in\mathcal{X}\) from some \(F\in\mathcal{F}=\{F(x\mid \theta):\theta\in \Theta\}\) determine a plausible value for \(\theta\) (or some function \(\psi(\theta)\)) from \(\mathbf{x}\).

The procedure

Select a statistic \(T(\mathbf{X})\) such that \(\Theta\subset T(\mathcal{X})\). Call \(T(\mathbf{X})\) an estimator and use any observed value \(T(\mathbf{x})\) as an estimate of \(\theta\).

Pending issues

For any given parameter, many estimators can always be proposed. How to choose between them?

It is not possible to evaluate estimates. So, some properties of any candidate estimator must be known beforehand.

Some properties of estimators

Finite sample properties

Minimal sufficiency

Nice, but . . . it can be impossible.

Bias

\[Bias_{\psi(\theta)}[T\mid\theta]=E[T\mid\theta] - \psi(\theta)\]

Efficiency

\[\begin{split} MSE_{\psi(\theta)}[T\mid\theta] & = E[(T -\psi(\theta))^2\mid\theta]= \\ \\& = Var[T\mid\theta]+Bias^2_{\psi(\theta)}[T\mid\theta]\end{split}\]

The MSE represents a compromise between:

accuracy (measured by \(Bias_{\psi(\theta)}[T\mid\theta]\)) and
precision (measured by \(Var[T\mid\theta]\)).

The relative efficiency of \(T\) and \(U\):

\[e(T,U\mid\theta)=\frac{MSE_{\psi(\theta)}[T\mid\theta]}{MSE_{\psi(\theta)}[U\mid\theta]}\]

Asymptotic properties

Consistency

\(T\) is consistent for \(\psi(\theta)\iff\left\{T_n\right\}_{n\in\mathbb{N}} \mathrel{\mathop{\xrightarrow{n\rightarrow +\infty}}}\psi(\theta)\)

Theorem If \(\displaystyle{\lim_{n\rightarrow +\infty}Bias_{\psi(\theta)}[T_n\mid\theta]=\lim_{n\rightarrow +\infty}Var[T_n\mid\theta]=0}\) then \(T\) is consistent for \(\psi(\theta)\).

Asymptotic efficiency
Asymptotic normality

Compare \(S^2_n\) and \(S^2_{n-1}\) as estimators of the parameter \(\sigma^2\) from \[\mathcal{F}=\{N(\mu,\sigma^2):\mu\in \mathbb{R},\; \sigma^2\in\mathbb{R}^+\}\] according to some of its properties.

2.2 The search for the best estimator

There is no such thing as an overall best estimator!

To establish different criteria we will:

choose some property to compare estimators (the MSE);
restrict the search to some suitable class of estimators or models.

\[\mathcal{U}\left(\psi(\theta)\right)=\left\{T(\mathbf{X}): E[T(\mathbf{X})\mid\theta]=\psi(\theta),\,\forall \theta \in \Theta\right\}\]

is the class of unbiased estimators of \(\psi(\theta)\).

\[\mathcal{LU}\left(\psi(\theta)\right)=\left\{T(\mathbf{X})\in \mathcal{U}\left(\psi(\theta)\right) : T(\mathbf{X})=\sum_{i=1}^n{a_iX_i} \right\}\]

is the class of linear unbiased estimators of \(\psi(\theta)\).

Best linear unbiased estimators

An estimator \(T\) is said the best linear unbiased estimator (BLUE) of \(\psi(\theta)\) if \(T\in\mathcal{LU}\left(\psi(\theta)\right)\) and \(Var[T\mid\theta]\leq Var[W\mid\theta],\;\forall\theta\in\Theta,\) \(\forall W\in\mathcal{LU}\left(\psi(\theta)\right)\).

Note To find a BLUE we need to solve a constrained optimization problem.

Let \(X_1,\ldots,X_n\) be uncorrelated variables with common and finite mean \(\mu\) and variance \(\sigma^2\). Find the BLUE of \(\mu\).

Let \((X_1,\ldots,X_n)\) be a random sample from \[\mathcal{F}=\{U(\theta-1/2,\theta+1/2):\theta\in \mathbb{R}\}.\] Find the BLUE of \(\theta\) and compare it with \(C=\frac{X_{(1)}+X_{(n)}}{2}\).

Note: \(Cov[X_{(1)},X_{(n)}]=\frac{1}{(n+1)^2(n+2)}\)

Best unbiased estimators

Hereafter we will consider only regular models.

Theorem If \(T\) is an estimator of \(\psi(\theta)\) with a differentiable bias \(b(\theta)\) then \[MSE_{\psi(\theta)}[T\mid\theta]\geq \frac{[\psi'(\theta)+b'(\theta)]^2}{nI(\theta)}+b^2(\theta).\]

Fréchet-Cramér-Rao inequality Let \(T\) be an estimator in \(\mathcal{U}\left(\psi(\theta)\right)\), where \(\psi\) is a differentiable function. Then \[Var[T\mid\theta]\geq \frac{[\psi'(\theta)]^2}{nI(\theta)}=R(\psi(\theta)).\]

Note We will call \(R(\psi(\theta))\) the FCR lower bound.

An estimator \(T\) is said the best unbiased estimator (BUE) of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and \(Var[T\mid\theta]=R(\psi(\theta)),\;\forall\theta\in\Theta\).

Note An estimator \(T\) is said the asymptoticaly best unbiased estimator of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and

\[\frac{Var[T\mid\theta]}{R(\psi(\theta))}\mathrel{\mathop{\xrightarrow{n\rightarrow +\infty}}}1,\;\forall\theta\in\Theta.\]

When is it possible to attain the FCR lower bound?

Theorem Let \(T\) be an estimator in \(\mathcal{U}\left(\psi(\theta)\right)\). \(T\) is the BUE of \(\psi(\theta)\) if and only if

\[S(\mathbf{X}\mid\theta)=g(\theta)\left[T(\mathbf{X})-\psi(\theta)\right],\]

for some function \(g\).

Corollary Let \(T\) be an estimator of \(\psi(\theta)\) with a differentiable bias \(b(\theta).\) The \(MSE_{\psi(\theta)}[T\mid\theta]\) equals its lower bound if and only if

\[S(\mathbf{X}\mid\theta)=g(\theta)\left[T(\mathbf{X})-\left(\psi(\theta)+b(\theta)\right)\right],\]

for some function \(g\).

Corollary The FCR lower bound is attainable for some \(\psi(\theta)\) if and only if \(T\) is a sufficient statistic of an uniparametric exponential family.

Note

{BUE} \(\subset\) {sufficient statistics}
\(\not\exists\) one-dimensional sufficient statistic \(\implies\not\exists\) BUE

Let \((X_1,\ldots,X_n)\) be a random sample from \[\mathcal{F}=\{Exp(\lambda):\lambda\in \mathbb{R}^+\}.\] Is there a BUE of \(\lambda\)? For which parametric functions does a BUE exist?

Let \((X_1,\ldots,X_n)\) be a random sample from a mixture of the distributions \(Exp(1/\theta)\) and \(Gamma(2,1/\theta)\) with weights \(\frac{1}{\theta+1}\) and \(\frac{\theta}{\theta+1}\). Find the BUE of \[\psi(\theta)=\frac{(3+2\theta)(2+\theta)}{\theta+1}.\]

Uniform minimum variance unbiased estimators

An estimator \(T\) is said the uniform minimum variance unbiased estimator (UMVUE) of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and \[Var[T\mid\theta]\leq Var[W\mid\theta],\;\forall W\in\mathcal{U}\left(\psi(\theta)\right),\;\forall\theta\in\Theta.\]

Rao-Blackwell’s theorem Let \(T\) be a sufficient statistic for \(\theta\), \(W\in\mathcal{U}\left(\psi(\theta)\right)\) and \(U=E[W\mid T]\). Then,

\(E[U\mid\theta]=\psi(\theta)\);
\(Var[U\mid\theta]\leq Var[W\mid\theta],\;\forall\theta\in\Theta\).

Notes

The equality in 2. happens if and only if \(W\) is a function of \(T\);

If the UMVUE exists it must be a function of a sufficient statistic;
Rao-Blackwell’s theorem does not provide the UMVUE. However . . .

Lehmann-Scheffé’s theorem If a model admits a complete sufficient statistic \(T\) and there is an unbiased estimator for \(\psi(\theta)\), then there is an unique UMVUE for \(\psi(\theta)\) that is a function of \(T\).

So, we have two possible strategies to find the UMVUE:

Apply Rao-Blackwell’s theorem using an unbiased estimator and a complete sufficient statistic;
Directly find an unbiased function of a complete sufficient statistic.

Let \((X_1,\ldots,X_n)\) be a random sample from

\[\mathcal{F}=\{Ber(\theta):\theta\in ]0,1[\}.\]

Find the UMVUE of \(\theta^2\).

Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{Exp(\lambda):\lambda\in \mathbb{R}^+\}\), with \(n>2\).

Let \(U\) be the UMVUE of \(\lambda\) and consider the class of estimators of \(\lambda\) defined by \(\frac{k}{n-1}U\), with \(k\in\mathbb{N}\). In this class, find the estimator with uniform minimum MSE. What does this say about the UMVU criterium?
Determine the UMVUE of \(\frac{1}{\lambda^2}\) and show that it is the asymptotical BUE.

Summary

For regular models: \[BUE\longrightarrow UMVUE \longrightarrow BLUE\]

For non-regular models: \[UMVUE \longrightarrow BLUE\]
Without a complete sufficient statistic it is usually difficult to find an UMVUE.

The restriction to unbiased estimators is still a limitation.

More recently, more attention has been given to a bias-variance trade-off, leading to the ideas of regularization and shrinkage estimators.

2.3 Methods of finding estimators

In simple situations, some ingenuity combined with the previous criteria can provide good estimators;
For more complex models we need more methodical ways of estimating parameters.

Method of moments

For a random sample \((X_1,\ldots,X_n)\) from \(\mathcal{F}=\{F(x\mid \theta):\theta=(\theta_1,\ldots,\theta_k)\}\), equate the first \(k\) (at least) sample moments to the corresponding population moments,

\[ M_r=\frac{\sum_{i=1}^n{X_i^r}}{n}=g_r(\theta) =E[X^r]=\mu_r,\;r=1,\ldots,k.\]

Solving this system of equations for \(\theta\) we find the method of moments estimators

\[\hat{\theta}_r=h_r(M_1,\ldots,M_k),\;r=1,\ldots,k.\]

The properties of these estimators can be derived from the properties of the sample moments which are:

Unbiased and consistent estimators of the population moments

\(E[M_r\mid\theta] = \mu_r\)

\(Var[M_r\mid\theta] = \frac{\mu_{2r}-\mu_r^2}{n}\)
Asymptotically normal

Using the CLT,

\[\sqrt{n}(M_r-\mu_r)\stackrel{D}{\longrightarrow}N(0,\mu_{2r}-\mu_r^2)\]

M-estimators

The solutions of \[\hat{\theta}=\arg\min_{\theta\in\Theta}\sum_{i=1}^n{g(X_i,\theta)}\] are called M-estimators of \(\theta\) (the M stands for “Maximum likelihood type”).

Note The function \(g\) may be chosen to provide estimators with desirable properties, in particular, regarding robustness.

Particular cases

Least squares estimation in linear models where \(g\) is defined as the square of a residual, such as

\[g(Y_i,\beta)=\left(Y_i-(\beta_0+\beta_1x_i)\right)^2,\]

in a simple linear regression model.

Maximum likelihood estimation with

\[g(X_i,\theta)=-\log f(X_i\mid\theta).\]

Maximum likelihood estimation

\(\hat{\theta}\in\Theta : L(\hat{\theta}\mid \mathbf{X})\geq L(\theta\mid \mathbf{X}),\;\forall\theta\in\Theta\) is the maximum likelihood estimate of \(\theta\).

If the likelihood function is differentiable then \(\hat{\theta}_{ML}\) may be any solution of \(S(\mathbf{X}\mid\theta)=0\) such that \(\left.\frac{\partial S(\mathbf{X}\mid\theta)}{\partial\theta}\right|_{\theta=\hat{\theta}_{ML}}<0\).

Two possible exceptions cannot be forgotten:

the global maximum can be in the boundary of \(\Theta\);
the global maximum can occur in a point where the likelihood function has no derivative.

Find the MLE of \(\theta\) based on a random sample \((X_1,\ldots,X_n)\) from each of the following models:

\(\{Ber(\theta):\theta\in ]0,1[\}\);
\(\{U(\theta-1/2,\theta+1/2):\theta\in \mathbb{R}\}\).

Note

The MLE may not exist and may not be unique.
Numerical methods are usually required.

Sufficiency

If \(T\) is a sufficient statistic can we claim that the MLE is a function of \(T\)?

For the uniform model in the last exercise,

\[T=\sin^2X_{(2)}(X_{(n)}-1/2)+\cos^2X_{(2)}(X_{(1)}+1/2)\]

is a MLE of \(\theta\) that is not a function of the sufficient statistic \((X_{(1)},X_{(n)})\).

Efficiency

In a regular model, if the BUE exists then it must be a MLE.

Invariance

For any \(g:\theta\subset \mathbb{R}^k\rightarrow \mathbb{R}^p\) with \(p\leq k\) we have

\[\hat{g}_{ML}(\theta)=g(\hat{\theta}_{ML}).\]

Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{Poi(\lambda):\lambda\in \mathbb{R}^+\}\). Show that the UMVUE of \(P(X>0\mid\lambda)\) exists for all \(n>1\) but that it is not the BUE.
Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{N(0,\sigma^2):\sigma^2\in \mathbb{R}^+\}\). Find the UMVUE of \(\sigma\) and check if it is also the BUE.
Based on a random sample of size \(n\) from \(\mathcal{F}=\{N(\mu,\sigma^2):\mu\in\mathbb{R},\;\sigma^2\in \mathbb{R}^+\}\) we want to estimate the relative precision measured by the square of the reciprocal of the coefficient of variation. Find the MLE and the UMVUE of that measure.

stylus_note ink_eraser delete_forever content_paste navigate_before navigate_next

draw

2 Point estimation

2.1 Estimators and estimates

2.2 The search for the best estimator

2.3 Methods of finding estimators