2 Point estimation

2.1 Estimators and estimates

The problem

Given a sample \(\mathbf{x}\in\mathcal{X}\) from some \(F\in\mathcal{F}=\{F(x\mid \theta):\theta\in \Theta\}\) determine a plausible value for \(\theta\) (or some function \(\psi(\theta)\)) from \(\mathbf{x}\).

The procedure

Select a statistic \(T(\mathbf{X})\) such that \(\Theta\subset T(\mathcal{X})\). Call \(T(\mathbf{X})\) an estimator and use any observed value \(T(\mathbf{x})\) as an estimate of \(\theta\).

Pending issues

Some properties of estimators

Finite sample properties

  1. Minimal sufficiency

    Nice, but . . . it can be impossible.

  1. Bias

    \[Bias_{\psi(\theta)}[T\mid\theta]=E[T\mid\theta] - \psi(\theta)\]

  1. Efficiency

    \[\begin{split} MSE_{\psi(\theta)}[T\mid\theta] & = E[(T -\psi(\theta))^2\mid\theta]= \\ \\& = Var[T\mid\theta]+Bias^2_{\psi(\theta)}[T\mid\theta]\end{split}\]

The MSE represents a compromise between:

The relative efficiency of \(T\) and \(U\):

\[e(T,U\mid\theta)=\frac{MSE_{\psi(\theta)}[T\mid\theta]}{MSE_{\psi(\theta)}[U\mid\theta]}\]

Asymptotic properties

  1. Consistency

    \(T\) is consistent for \(\psi(\theta)\iff\left\{T_n\right\}_{n\in\mathbb{N}} \mathrel{\mathop{\xrightarrow{n\rightarrow +\infty}}}\psi(\theta)\)

    Theorem If \(\displaystyle{\lim_{n\rightarrow +\infty}Bias_{\psi(\theta)}[T_n\mid\theta]=\lim_{n\rightarrow +\infty}Var[T_n\mid\theta]=0}\) then \(T\) is consistent for \(\psi(\theta)\).

  1. Asymptotic efficiency

  2. Asymptotic normality

Compare \(S^2_n\) and \(S^2_{n-1}\) as estimators of the parameter \(\sigma^2\) from \[\mathcal{F}=\{N(\mu,\sigma^2):\mu\in \mathbb{R},\; \sigma^2\in\mathbb{R}^+\}\] according to some of its properties.

2.2 The search for the best estimator

There is no such thing as an overall best estimator!

To establish different criteria we will:

  1. choose some property to compare estimators (the MSE);

  2. restrict the search to some suitable class of estimators or models.

\[\mathcal{U}\left(\psi(\theta)\right)=\left\{T(\mathbf{X}): E[T(\mathbf{X})\mid\theta]=\psi(\theta),\,\forall \theta \in \Theta\right\}\]

is the class of unbiased estimators of \(\psi(\theta)\).

\[\mathcal{LU}\left(\psi(\theta)\right)=\left\{T(\mathbf{X})\in \mathcal{U}\left(\psi(\theta)\right) : T(\mathbf{X})=\sum_{i=1}^n{a_iX_i} \right\}\]

is the class of linear unbiased estimators of \(\psi(\theta)\).

Best linear unbiased estimators

An estimator \(T\) is said the best linear unbiased estimator (BLUE) of \(\psi(\theta)\) if \(T\in\mathcal{LU}\left(\psi(\theta)\right)\) and \(Var[T\mid\theta]\leq Var[W\mid\theta],\;\forall\theta\in\Theta,\) \(\forall W\in\mathcal{LU}\left(\psi(\theta)\right)\).

Note To find a BLUE we need to solve a constrained optimization problem.

Let \(X_1,\ldots,X_n\) be uncorrelated variables with common and finite mean \(\mu\) and variance \(\sigma^2\). Find the BLUE of \(\mu\).

Let \((X_1,\ldots,X_n)\) be a random sample from \[\mathcal{F}=\{U(\theta-1/2,\theta+1/2):\theta\in \mathbb{R}\}.\] Find the BLUE of \(\theta\) and compare it with \(C=\frac{X_{(1)}+X_{(n)}}{2}\).

Note: \(Cov[X_{(1)},X_{(n)}]=\frac{1}{(n+1)^2(n+2)}\)

Best unbiased estimators

Hereafter we will consider only regular models.

Theorem If \(T\) is an estimator of \(\psi(\theta)\) with a differentiable bias \(b(\theta)\) then \[MSE_{\psi(\theta)}[T\mid\theta]\geq \frac{[\psi'(\theta)+b'(\theta)]^2}{nI(\theta)}+b^2(\theta).\]

Fréchet-Cramér-Rao inequality Let \(T\) be an estimator in \(\mathcal{U}\left(\psi(\theta)\right)\), where \(\psi\) is a differentiable function. Then \[Var[T\mid\theta]\geq \frac{[\psi'(\theta)]^2}{nI(\theta)}=R(\psi(\theta)).\]

Note We will call \(R(\psi(\theta))\) the FCR lower bound.

An estimator \(T\) is said the best unbiased estimator (BUE) of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and \(Var[T\mid\theta]=R(\psi(\theta)),\;\forall\theta\in\Theta\).

Note An estimator \(T\) is said the asymptoticaly best unbiased estimator of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and

\[\frac{Var[T\mid\theta]}{R(\psi(\theta))}\mathrel{\mathop{\xrightarrow{n\rightarrow +\infty}}}1,\;\forall\theta\in\Theta.\]

When is it possible to attain the FCR lower bound?

Theorem Let \(T\) be an estimator in \(\mathcal{U}\left(\psi(\theta)\right)\). \(T\) is the BUE of \(\psi(\theta)\) if and only if

\[S(\mathbf{X}\mid\theta)=g(\theta)\left[T(\mathbf{X})-\psi(\theta)\right],\]

for some function \(g\).

Corollary Let \(T\) be an estimator of \(\psi(\theta)\) with a differentiable bias \(b(\theta).\) The \(MSE_{\psi(\theta)}[T\mid\theta]\) equals its lower bound if and only if

\[S(\mathbf{X}\mid\theta)=g(\theta)\left[T(\mathbf{X})-\left(\psi(\theta)+b(\theta)\right)\right],\]

for some function \(g\).

Corollary The FCR lower bound is attainable for some \(\psi(\theta)\) if and only if \(T\) is a sufficient statistic of an uniparametric exponential family.

Note

  • {BUE} \(\subset\) {sufficient statistics}

  • \(\not\exists\) one-dimensional sufficient statistic \(\implies\not\exists\) BUE

Let \((X_1,\ldots,X_n)\) be a random sample from \[\mathcal{F}=\{Exp(\lambda):\lambda\in \mathbb{R}^+\}.\] Is there a BUE of \(\lambda\)? For which parametric functions does a BUE exist?

Let \((X_1,\ldots,X_n)\) be a random sample from a mixture of the distributions \(Exp(1/\theta)\) and \(Gamma(2,1/\theta)\) with weights \(\frac{1}{\theta+1}\) and \(\frac{\theta}{\theta+1}\). Find the BUE of \[\psi(\theta)=\frac{(3+2\theta)(2+\theta)}{\theta+1}.\]

Uniform minimum variance unbiased estimators

An estimator \(T\) is said the uniform minimum variance unbiased estimator (UMVUE) of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and \[Var[T\mid\theta]\leq Var[W\mid\theta],\;\forall W\in\mathcal{U}\left(\psi(\theta)\right),\;\forall\theta\in\Theta.\]

Rao-Blackwell’s theorem Let \(T\) be a sufficient statistic for \(\theta\), \(W\in\mathcal{U}\left(\psi(\theta)\right)\) and \(U=E[W\mid T]\). Then,

  1. \(E[U\mid\theta]=\psi(\theta)\);

  2. \(Var[U\mid\theta]\leq Var[W\mid\theta],\;\forall\theta\in\Theta\).

Notes

Lehmann-Scheffé’s theorem If a model admits a complete sufficient statistic \(T\) and there is an unbiased estimator for \(\psi(\theta)\), then there is an unique UMVUE for \(\psi(\theta)\) that is a function of \(T\).

So, we have two possible strategies to find the UMVUE:

  1. Apply Rao-Blackwell’s theorem using an unbiased estimator and a complete sufficient statistic;

  2. Directly find an unbiased function of a complete sufficient statistic.

Let \((X_1,\ldots,X_n)\) be a random sample from

\[\mathcal{F}=\{Ber(\theta):\theta\in ]0,1[\}.\]

Find the UMVUE of \(\theta^2\).

Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{Exp(\lambda):\lambda\in \mathbb{R}^+\}\), with \(n>2\).

  1. Let \(U\) be the UMVUE of \(\lambda\) and consider the class of estimators of \(\lambda\) defined by \(\frac{k}{n-1}U\), with \(k\in\mathbb{N}\). In this class, find the estimator with uniform minimum MSE. What does this say about the UMVU criterium?

  2. Determine the UMVUE of \(\frac{1}{\lambda^2}\) and show that it is the asymptotical BUE.

Summary

2.3 Methods of finding estimators

Method of moments

For a random sample \((X_1,\ldots,X_n)\) from \(\mathcal{F}=\{F(x\mid \theta):\theta=(\theta_1,\ldots,\theta_k)\}\), equate the first \(k\) (at least) sample moments to the corresponding population moments,

\[ M_r=\frac{\sum_{i=1}^n{X_i^r}}{n}=g_r(\theta) =E[X^r]=\mu_r,\;r=1,\ldots,k.\]

Solving this system of equations for \(\theta\) we find the method of moments estimators

\[\hat{\theta}_r=h_r(M_1,\ldots,M_k),\;r=1,\ldots,k.\]

The properties of these estimators can be derived from the properties of the sample moments which are:

  1. Unbiased and consistent estimators of the population moments

    \(E[M_r\mid\theta] = \mu_r\)

    \(Var[M_r\mid\theta] = \frac{\mu_{2r}-\mu_r^2}{n}\)

  2. Asymptotically normal

    Using the CLT,

    \[\sqrt{n}(M_r-\mu_r)\stackrel{D}{\longrightarrow}N(0,\mu_{2r}-\mu_r^2)\]

M-estimators

The solutions of \[\hat{\theta}=\arg\min_{\theta\in\Theta}\sum_{i=1}^n{g(X_i,\theta)}\] are called M-estimators of \(\theta\) (the M stands for “Maximum likelihood type”).

Note The function \(g\) may be chosen to provide estimators with desirable properties, in particular, regarding robustness.

Particular cases

  1. Least squares estimation in linear models where \(g\) is defined as the square of a residual, such as

    \[g(Y_i,\beta)=\left(Y_i-(\beta_0+\beta_1x_i)\right)^2,\]

    in a simple linear regression model.

  1. Maximum likelihood estimation with

    \[g(X_i,\theta)=-\log f(X_i\mid\theta).\]

Maximum likelihood estimation

\(\hat{\theta}\in\Theta : L(\hat{\theta}\mid \mathbf{X})\geq L(\theta\mid \mathbf{X}),\;\forall\theta\in\Theta\) is the maximum likelihood estimate of \(\theta\).

If the likelihood function is differentiable then \(\hat{\theta}_{ML}\) may be any solution of \(S(\mathbf{X}\mid\theta)=0\) such that \(\left.\frac{\partial S(\mathbf{X}\mid\theta)}{\partial\theta}\right|_{\theta=\hat{\theta}_{ML}}<0\).

Two possible exceptions cannot be forgotten:

  1. the global maximum can be in the boundary of \(\Theta\);

  2. the global maximum can occur in a point where the likelihood function has no derivative.

Find the MLE of \(\theta\) based on a random sample \((X_1,\ldots,X_n)\) from each of the following models:

  1. \(\{Ber(\theta):\theta\in ]0,1[\}\);

  2. \(\{U(\theta-1/2,\theta+1/2):\theta\in \mathbb{R}\}\).

Note

  1. The MLE may not exist and may not be unique.

  2. Numerical methods are usually required.

Sufficiency

If \(T\) is a sufficient statistic can we claim that the MLE is a function of \(T\)?

For the uniform model in the last exercise,

\[T=\sin^2X_{(2)}(X_{(n)}-1/2)+\cos^2X_{(2)}(X_{(1)}+1/2)\]

is a MLE of \(\theta\) that is not a function of the sufficient statistic \((X_{(1)},X_{(n)})\).

Efficiency

In a regular model, if the BUE exists then it must be a MLE.

Invariance

For any \(g:\theta\subset \mathbb{R}^k\rightarrow \mathbb{R}^p\) with \(p\leq k\) we have

\[\hat{g}_{ML}(\theta)=g(\hat{\theta}_{ML}).\]

  1. Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{Poi(\lambda):\lambda\in \mathbb{R}^+\}\). Show that the UMVUE of \(P(X>0\mid\lambda)\) exists for all \(n>1\) but that it is not the BUE.

  2. Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{N(0,\sigma^2):\sigma^2\in \mathbb{R}^+\}\). Find the UMVUE of \(\sigma\) and check if it is also the BUE.

  3. Based on a random sample of size \(n\) from \(\mathcal{F}=\{N(\mu,\sigma^2):\mu\in\mathbb{R},\;\sigma^2\in \mathbb{R}^+\}\) we want to estimate the relative precision measured by the square of the reciprocal of the coefficient of variation. Find the MLE and the UMVUE of that measure.

pages
grid_view zoom_out zoom_in swap_horiz autorenew visibility_off
menu
fullscreen
stylus_note ink_eraser delete_forever content_paste navigate_before navigate_next
draw