The problem
Given a sample \(\mathbf{x}\in\mathcal{X}\) from some \(F\in\mathcal{F}=\{F(x\mid \theta):\theta\in \Theta\}\) determine a plausible value for \(\theta\) (or some function \(\psi(\theta)\)) from \(\mathbf{x}\).
The procedure
Select a statistic \(T(\mathbf{X})\) such that \(\Theta\subset T(\mathcal{X})\). Call \(T(\mathbf{X})\) an estimator and use any observed value \(T(\mathbf{x})\) as an estimate of \(\theta\).
Pending issues
Some properties of estimators
Finite sample properties
Minimal sufficiency
Nice, but . . . it can be impossible.
Bias
\[Bias_{\psi(\theta)}[T\mid\theta]=E[T\mid\theta] - \psi(\theta)\]
Efficiency
\[\begin{split} MSE_{\psi(\theta)}[T\mid\theta] & = E[(T -\psi(\theta))^2\mid\theta]= \\ \\& = Var[T\mid\theta]+Bias^2_{\psi(\theta)}[T\mid\theta]\end{split}\]
The MSE represents a compromise between:
accuracy (measured by \(Bias_{\psi(\theta)}[T\mid\theta]\)) and
precision (measured by \(Var[T\mid\theta]\)).
The relative efficiency of \(T\) and \(U\):
\[e(T,U\mid\theta)=\frac{MSE_{\psi(\theta)}[T\mid\theta]}{MSE_{\psi(\theta)}[U\mid\theta]}\]
Asymptotic properties
Consistency
\(T\) is consistent for \(\psi(\theta)\iff\left\{T_n\right\}_{n\in\mathbb{N}} \mathrel{\mathop{\xrightarrow{n\rightarrow +\infty}}}\psi(\theta)\)
Theorem If \(\displaystyle{\lim_{n\rightarrow +\infty}Bias_{\psi(\theta)}[T_n\mid\theta]=\lim_{n\rightarrow +\infty}Var[T_n\mid\theta]=0}\) then \(T\) is consistent for \(\psi(\theta)\).
Asymptotic efficiency
Asymptotic normality
Compare \(S^2_n\) and \(S^2_{n-1}\) as estimators of the parameter \(\sigma^2\) from \[\mathcal{F}=\{N(\mu,\sigma^2):\mu\in \mathbb{R},\; \sigma^2\in\mathbb{R}^+\}\] according to some of its properties.
There is no such thing as an overall best estimator!
To establish different criteria we will:
choose some property to compare estimators (the MSE);
restrict the search to some suitable class of estimators or models.
\[\mathcal{U}\left(\psi(\theta)\right)=\left\{T(\mathbf{X}): E[T(\mathbf{X})\mid\theta]=\psi(\theta),\,\forall \theta \in \Theta\right\}\]
is the class of unbiased estimators of \(\psi(\theta)\).
\[\mathcal{LU}\left(\psi(\theta)\right)=\left\{T(\mathbf{X})\in \mathcal{U}\left(\psi(\theta)\right) : T(\mathbf{X})=\sum_{i=1}^n{a_iX_i} \right\}\]
is the class of linear unbiased estimators of \(\psi(\theta)\).
Best linear unbiased estimators
An estimator \(T\) is said the best linear unbiased estimator (BLUE) of \(\psi(\theta)\) if \(T\in\mathcal{LU}\left(\psi(\theta)\right)\) and \(Var[T\mid\theta]\leq Var[W\mid\theta],\;\forall\theta\in\Theta,\) \(\forall W\in\mathcal{LU}\left(\psi(\theta)\right)\).
Note To find a BLUE we need to solve a constrained optimization problem.
Let \(X_1,\ldots,X_n\) be uncorrelated variables with common and finite mean \(\mu\) and variance \(\sigma^2\). Find the BLUE of \(\mu\).
Let \((X_1,\ldots,X_n)\) be a random sample from \[\mathcal{F}=\{U(\theta-1/2,\theta+1/2):\theta\in \mathbb{R}\}.\] Find the BLUE of \(\theta\) and compare it with \(C=\frac{X_{(1)}+X_{(n)}}{2}\).
Note: \(Cov[X_{(1)},X_{(n)}]=\frac{1}{(n+1)^2(n+2)}\)
Best unbiased estimators
Hereafter we will consider only regular models.
Theorem If \(T\) is an estimator of \(\psi(\theta)\) with a differentiable bias \(b(\theta)\) then \[MSE_{\psi(\theta)}[T\mid\theta]\geq \frac{[\psi'(\theta)+b'(\theta)]^2}{nI(\theta)}+b^2(\theta).\]
Fréchet-Cramér-Rao inequality Let \(T\) be an estimator in \(\mathcal{U}\left(\psi(\theta)\right)\), where \(\psi\) is a differentiable function. Then \[Var[T\mid\theta]\geq \frac{[\psi'(\theta)]^2}{nI(\theta)}=R(\psi(\theta)).\]
Note We will call \(R(\psi(\theta))\) the FCR lower bound.
An estimator \(T\) is said the best unbiased estimator (BUE) of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and \(Var[T\mid\theta]=R(\psi(\theta)),\;\forall\theta\in\Theta\).
Note An estimator \(T\) is said the asymptoticaly best unbiased estimator of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and
\[\frac{Var[T\mid\theta]}{R(\psi(\theta))}\mathrel{\mathop{\xrightarrow{n\rightarrow +\infty}}}1,\;\forall\theta\in\Theta.\]
When is it possible to attain the FCR lower bound?
Theorem Let \(T\) be an estimator in \(\mathcal{U}\left(\psi(\theta)\right)\). \(T\) is the BUE of \(\psi(\theta)\) if and only if
\[S(\mathbf{X}\mid\theta)=g(\theta)\left[T(\mathbf{X})-\psi(\theta)\right],\]
for some function \(g\).
Corollary Let \(T\) be an estimator of \(\psi(\theta)\) with a differentiable bias \(b(\theta).\) The \(MSE_{\psi(\theta)}[T\mid\theta]\) equals its lower bound if and only if
\[S(\mathbf{X}\mid\theta)=g(\theta)\left[T(\mathbf{X})-\left(\psi(\theta)+b(\theta)\right)\right],\]
for some function \(g\).
Corollary The FCR lower bound is attainable for some \(\psi(\theta)\) if and only if \(T\) is a sufficient statistic of an uniparametric exponential family.
Note
{BUE} \(\subset\) {sufficient statistics}
\(\not\exists\) one-dimensional sufficient statistic \(\implies\not\exists\) BUE
Let \((X_1,\ldots,X_n)\) be a random sample from \[\mathcal{F}=\{Exp(\lambda):\lambda\in \mathbb{R}^+\}.\] Is there a BUE of \(\lambda\)? For which parametric functions does a BUE exist?
Let \((X_1,\ldots,X_n)\) be a random sample from a mixture of the distributions \(Exp(1/\theta)\) and \(Gamma(2,1/\theta)\) with weights \(\frac{1}{\theta+1}\) and \(\frac{\theta}{\theta+1}\). Find the BUE of \[\psi(\theta)=\frac{(3+2\theta)(2+\theta)}{\theta+1}.\]
Uniform minimum variance unbiased estimators
An estimator \(T\) is said the uniform minimum variance unbiased estimator (UMVUE) of \(\psi(\theta)\) if \(T\in\mathcal{U}\left(\psi(\theta)\right)\) and \[Var[T\mid\theta]\leq Var[W\mid\theta],\;\forall W\in\mathcal{U}\left(\psi(\theta)\right),\;\forall\theta\in\Theta.\]
Rao-Blackwell’s theorem Let \(T\) be a sufficient statistic for \(\theta\), \(W\in\mathcal{U}\left(\psi(\theta)\right)\) and \(U=E[W\mid T]\). Then,
\(E[U\mid\theta]=\psi(\theta)\);
\(Var[U\mid\theta]\leq Var[W\mid\theta],\;\forall\theta\in\Theta\).
Notes
If the UMVUE exists it must be a function of a sufficient statistic;
Rao-Blackwell’s theorem does not provide the UMVUE. However . . .
Lehmann-Scheffé’s theorem If a model admits a complete sufficient statistic \(T\) and there is an unbiased estimator for \(\psi(\theta)\), then there is an unique UMVUE for \(\psi(\theta)\) that is a function of \(T\).
So, we have two possible strategies to find the UMVUE:
Apply Rao-Blackwell’s theorem using an unbiased estimator and a complete sufficient statistic;
Directly find an unbiased function of a complete sufficient statistic.
Let \((X_1,\ldots,X_n)\) be a random sample from
\[\mathcal{F}=\{Ber(\theta):\theta\in ]0,1[\}.\]
Find the UMVUE of \(\theta^2\).
Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{Exp(\lambda):\lambda\in \mathbb{R}^+\}\), with \(n>2\).
Let \(U\) be the UMVUE of \(\lambda\) and consider the class of estimators of \(\lambda\) defined by \(\frac{k}{n-1}U\), with \(k\in\mathbb{N}\). In this class, find the estimator with uniform minimum MSE. What does this say about the UMVU criterium?
Determine the UMVUE of \(\frac{1}{\lambda^2}\) and show that it is the asymptotical BUE.
Summary
For non-regular models: \[UMVUE \longrightarrow BLUE\]
Without a complete sufficient statistic it is usually difficult to find an UMVUE.
In simple situations, some ingenuity combined with the previous criteria can provide good estimators;
For more complex models we need more methodical ways of estimating parameters.
Method of moments
For a random sample \((X_1,\ldots,X_n)\) from \(\mathcal{F}=\{F(x\mid \theta):\theta=(\theta_1,\ldots,\theta_k)\}\), equate the first \(k\) (at least) sample moments to the corresponding population moments,
\[ M_r=\frac{\sum_{i=1}^n{X_i^r}}{n}=g_r(\theta) =E[X^r]=\mu_r,\;r=1,\ldots,k.\]
Solving this system of equations for \(\theta\) we find the method of moments estimators
\[\hat{\theta}_r=h_r(M_1,\ldots,M_k),\;r=1,\ldots,k.\]
The properties of these estimators can be derived from the properties of the sample moments which are:
Unbiased and consistent estimators of the population moments
\(E[M_r\mid\theta] = \mu_r\)
\(Var[M_r\mid\theta] = \frac{\mu_{2r}-\mu_r^2}{n}\)
Asymptotically normal
Using the CLT,
\[\sqrt{n}(M_r-\mu_r)\stackrel{D}{\longrightarrow}N(0,\mu_{2r}-\mu_r^2)\]
M-estimators
The solutions of \[\hat{\theta}=\arg\min_{\theta\in\Theta}\sum_{i=1}^n{g(X_i,\theta)}\] are called M-estimators of \(\theta\) (the M stands for “Maximum likelihood type”).
Note The function \(g\) may be chosen to provide estimators with desirable properties, in particular, regarding robustness.
Particular cases
Least squares estimation in linear models where \(g\) is defined as the square of a residual, such as
\[g(Y_i,\beta)=\left(Y_i-(\beta_0+\beta_1x_i)\right)^2,\]
in a simple linear regression model.
Maximum likelihood estimation with
\[g(X_i,\theta)=-\log f(X_i\mid\theta).\]
Maximum likelihood estimation
\(\hat{\theta}\in\Theta : L(\hat{\theta}\mid \mathbf{X})\geq L(\theta\mid \mathbf{X}),\;\forall\theta\in\Theta\) is the maximum likelihood estimate of \(\theta\).
If the likelihood function is differentiable then \(\hat{\theta}_{ML}\) may be any solution of \(S(\mathbf{X}\mid\theta)=0\) such that \(\left.\frac{\partial S(\mathbf{X}\mid\theta)}{\partial\theta}\right|_{\theta=\hat{\theta}_{ML}}<0\).
Two possible exceptions cannot be forgotten:
the global maximum can be in the boundary of \(\Theta\);
the global maximum can occur in a point where the likelihood function has no derivative.
Find the MLE of \(\theta\) based on a random sample \((X_1,\ldots,X_n)\) from each of the following models:
\(\{Ber(\theta):\theta\in ]0,1[\}\);
\(\{U(\theta-1/2,\theta+1/2):\theta\in \mathbb{R}\}\).
Note
The MLE may not exist and may not be unique.
Numerical methods are usually required.
Sufficiency
If \(T\) is a sufficient statistic can we claim that the MLE is a function of \(T\)?
For the uniform model in the last exercise,
\[T=\sin^2X_{(2)}(X_{(n)}-1/2)+\cos^2X_{(2)}(X_{(1)}+1/2)\]
is a MLE of \(\theta\) that is not a function of the sufficient statistic \((X_{(1)},X_{(n)})\).
Efficiency
In a regular model, if the BUE exists then it must be a MLE.
Invariance
For any \(g:\theta\subset \mathbb{R}^k\rightarrow \mathbb{R}^p\) with \(p\leq k\) we have
\[\hat{g}_{ML}(\theta)=g(\hat{\theta}_{ML}).\]
Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{Poi(\lambda):\lambda\in \mathbb{R}^+\}\). Show that the UMVUE of \(P(X>0\mid\lambda)\) exists for all \(n>1\) but that it is not the BUE.
Let \((X_1,\ldots,X_n)\) be a random sample from \(\mathcal{F}=\{N(0,\sigma^2):\sigma^2\in \mathbb{R}^+\}\). Find the UMVUE of \(\sigma\) and check if it is also the BUE.
Based on a random sample of size \(n\) from \(\mathcal{F}=\{N(\mu,\sigma^2):\mu\in\mathbb{R},\;\sigma^2\in \mathbb{R}^+\}\) we want to estimate the relative precision measured by the square of the reciprocal of the coefficient of variation. Find the MLE and the UMVUE of that measure.