Linear models

1st semester 2025/2026

1 Linear models

\(\def\bs#1{\boldsymbol{#1}} \def\b#1{\mathbf{#1}} \DeclareMathOperator{\diag}{diag} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\r}{r} \DeclareMathOperator{\det}{det} \DeclareMathOperator{\logit}{\text{logit}}\)

1.1 Introduction

What is a statistical model?

Why are there so many statistical models?

\(+\) time, space, text, images, sounds, . . .

Our context

\(Y\) – an univariate quantitative response variable

\({\bf x}=(x_1,\ldots,x_k)'\) – any number of error-free predictors, covariates or explanatory variables

Some important issues

What is a linear model?

Basic (additive) modelling idea:

\[Y = f(\mathbf{x}, \boldsymbol{\beta}) + E\]

where \(E\) is called the random error.

A linear model is characterized by:

  1. the choice of a linear function of \(\bs{\beta}\):

    \(f(\mathbf{x}, \boldsymbol{\beta})=\mathbf{x}' \boldsymbol{\beta}\) – the linear predictor

  1. a set of assumptions regarding the random error:

    \(E\sim N(0, \sigma^2)\), uncorrelated \(\forall \mathbf{x}\) (usually)

Which of the following models are linear?

  1. \(Y=\beta_0 + \beta_1 x + \beta_2 x^2 + E\)

  2. \(\log Y=\beta_0 + \beta_1 \log x + E\)

  3. \(Y=\beta_0 x^{\beta_1} + E\)

  4. \(Y=\beta_0 x^{\beta_1} E\)

Alternative formulation

\[Y\sim F(\boldsymbol{\theta})\]

where \(F\) belongs to the exponential family of distributions

and let \(g(E[Y])=\mathbf{x}' \boldsymbol{\beta}\), where \(g\) is called the link function

\[\begin{cases} F\ \text{gaussian} \\ g= \text{identity function} \end{cases} \implies \text{Linear model}\]

Other choices for bounded or discrete responses:

\(\leadsto\) Generalized linear models

Why linear models?

1.2 Simple linear regression model – a review

Data: \((y_i,x_i)\), \(i=1,\ldots,n\)

\[Y_i=\beta_0+\beta_1x_i+E_i,\, i=1,\ldots,n\]

\(E_i\) – random error associated to \(Y_i=(Y\mid x=x_i)\)

OLS estimation

\[S(\beta_0,\beta_1)=\sum_{i=1}^{n}{E_i^2}=\sum_{i=1}^{n}{(Y_i-\beta_0-\beta_1x_i)^2}\]

\[(\hat{\beta}_0,\hat{\beta}_1)_{OLS}=\arg\min_\limits{\boldsymbol{\beta}\in \mathbb{R}^2} S(\beta_0,\beta_1)\]

\[\left\lbrace\begin{array}{l} \frac{\partial S}{\partial \beta_0}=0\\ \frac{\partial S}{\partial \beta_1}=0\\ \end{array}\right. \iff \left\lbrace\begin{array}{lcl} \beta_0 n+\beta_1 \sum\limits_{i = 1}^n{x_i} &=& \sum\limits_{i = 1}^n {Y_i }\\ \beta_0 \sum\limits_{i = 1}^n{x_i}+\beta_1 \sum\limits_{i = 1}^n{x_i^2} &=& \sum\limits_{i = 1}^n {x_i Y_i}\\ \end{array}\right.\]

\[\iff \left\lbrace\begin{array}{lcl} \hat {\beta }_1 &=& \frac{\sum\limits_{i = 1}^n {x_i Y_i} - n\bar {x}\bar {Y}}{\sum\limits_{i = 1}^n {x_i^2 } - n\bar {x}^2}\\ \hat{\beta }_0 &=& \bar{Y} - \hat{\beta }_1 \bar{x}\\ \end{array}\right.\]

A unique solution exists if and only if \[\begin{vmatrix} n & \sum\limits_{i = 1}^n{x_i} \\ \sum\limits_{i = 1}^n{x_i} & \sum\limits_{i = 1}^n{x_i^2} \end{vmatrix}=n \sum_{i = 1}^n\left(x_i-\bar{x}\right)^2\neq 0,\]

that is, if the sample contains at least two different values of \(x\).

When the solution exists, it is indeed a minimum since the Hessian matrix of \(S\) is \[2\times\begin{pmatrix} n & \sum\limits_{i = 1}^n{x_i} \\ \sum\limits_{i = 1}^n{x_i} & \sum\limits_{i = 1}^n{x_i^2} \end{pmatrix}\]

and its determinant is \(4n \sum\limits_{i = 1}^n{(x_i-\bar{x})^2}>0\).

How can we trust this estimator?

Under the Gauss-Markov conditions:

  1. \(E[E_i]=0\) \(\iff E[Y_i]=\beta_0+\beta_1x_i\)

  2. \(Var[E_i]=\sigma^2\) \(\iff Var[Y_i]=\sigma^2\)

  3. \(E_i\)’s uncorrelated \(\iff\) \(Y_i\)’s uncorrelated

it can be shown that the OLS estimator is the Best Linear Unbiased Estimator (BLUE) for \(\boldsymbol{\beta}\).

Alternative

ML estimation under the Gaussian assumption

\[E_i\sim N\left(0,\sigma^2\right)\iff Y_i\sim N\left(\beta_0+\beta_1 x_i,\sigma^2\right)\]

\[\mathcal{L}\left(\beta_0,\beta_1,\sigma^2\mid \mathbf{y}\right)=\prod_{i=1}^{n}{\frac{1}{\sqrt{2\pi\sigma^2}}\, e^{-\frac{1}{2\sigma^2}\left(y_i-(\beta_0+\beta_1 x_i)\right)^2}}\]

Somewhat surprisingly, the OLS and ML estimators for \(\beta_0\) an \(\beta_1\) coincide and

\[\hat{\sigma}_{ML}^2=\frac{\sum_{i=1}^{n}{\left(Y_i-(\hat{\beta}_0+\hat{\beta}_1 x_i)\right)^2}}{n}\]

Some notation

\(SXX = \sum\limits_{i=1}^n (x_i-\bar{x})^2 = \sum\limits_{i=1}^n x_i^2-n \bar{x}^2 = \sum\limits_{i=1}^n x_i(x_i-\bar{x})\)

\(\begin{split} SXY & = \sum_{i=1}^n (x_i-\bar{x})(Y_i-\bar{Y}) = \\ & = \sum_{i=1}^n x_i(Y_i-\bar{Y}) = \\ & = \sum_{i=1}^n Y_i(x_i-\bar{x}) = \\ & = \sum_{i=1}^n x_iY_i - n \bar{x} \bar{Y} \end{split}\)

Properties of the estimators

\[\hat{\beta}_1 = \frac{SXY}{SXX} = \frac{\sum_{i=1}^n Y_i(x_i-\bar{x})}{SXX} = \sum_{i=1}^n k_i Y_i\] with \(k_i = \dfrac{x_i-\bar{x}}{SXX}\)

\(\therefore \hat{\beta}_1\) is a linear estimator

\(E[\hat{\beta}_1] = \sum_{i=1}^n k_i E[Y_i] = \ldots = \beta_1\)

\(\therefore \hat{\beta}_1\) is an unbiased estimator of \(\beta_1\)

\(Var[\hat{\beta}_1] = \sum_{i=1}^n k_i^2 Var[Y_i] = \sigma^2\sum_{i=1}^n k_i^2 = \ldots = \dfrac{\sigma^2}{SXX}\)

\(\therefore \hat{\beta}_1 \sim N\left(\beta_1, \dfrac{\sigma^2}{SXX}\right)\)

\[\hat{\beta}_0 = \bar{Y} - \hat{\beta }_1 \bar{x} = \ldots = \sum_{i=1}^n \left(\frac{1}{n} - \bar{x}k_i \right) Y_i\]

\(\therefore \hat{\beta}_0\) is also a linear estimator and \(E[\hat{\beta}_0] = \ldots = \beta_0\)

\(\begin{split} Var[\hat{\beta}_0] & = Var[\bar{Y} - \hat{\beta }_1 \bar{x}] = \\ & = Var[\bar{Y}] + \bar{x}^2 Var[\hat{\beta }_1] - 2 \bar{x} Cov[\bar{Y}, \hat{\beta }_1] = \\[0.25cm] & = \hspace{12.2cm}= \\[0.5cm] & = \sigma^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{SXX} \right) \end{split}\)

\(\therefore \hat{\beta}_0 \sim N\left(\beta_0, \sigma^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{SXX} \right) \right)\)

Show that, in general, \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are not independent since

\[Cov[\hat{\beta}_0, \hat{\beta}_1] = -\frac{\bar{x}\sigma^2}{SXX}\]

Regarding the estimator of \(\sigma^2\), \[\hat{\sigma}^2_{ML}=\frac{\sum_{i=1}^n(Y_i-\hat{Y}_i)^2}{n} = \frac{SSE}{n}\]

it can be shown that \(E[SSE]=(n-2)\sigma^2 \implies E[\hat{\sigma}^2_{ML}] = \dfrac{n-2}{n}\sigma^2\)

\(\therefore \hat{\sigma}^2=\dfrac{n}{n-2}\hat{\sigma}^2_{ML}=\frac{SSE}{n-2}\) is an unbiased estimator of \(\sigma^2\).

Now, Cochran’s theorem states that:

  1. \(\frac{SSE}{\sigma^2}= \frac{(n-2)\hat{\sigma}^2}{\sigma^2}\sim\chi_{(n-2)}^2\)

  2. \((\hat{\beta}_0, \hat{\beta}_1)\) are jointly independent of \(\hat{\sigma}^2\)

All previous results lead to:

\[\frac{\hat{\beta}_1-\beta_1}{\hat{\sigma}\sqrt{\frac{1}{SXX}}} = \frac{\hat{\beta}_1-\beta_1}{se(\hat{\beta}_1)} \sim t_{(n-2)}\]

and

\[\frac{\hat{\beta}_0-\beta_0}{\hat{\sigma}\sqrt{\frac{1}{n} + \frac{\bar{x}^2}{SXX}}} = \frac{\hat{\beta}_0-\beta_0}{se(\hat{\beta}_0)} \sim t_{(n-2)}\]

that allow us to carry out hypothesis tests and compute confidence intervals for \(\beta_0\) and \(\beta_1\).

Inferences for the mean response

\[E[Y|x_0] = \beta_0 + \beta_1 x_0\]

  1. \(\hat{\beta}_0 + \hat{\beta}_1 x_0\) is an unbiased estimator for \(E[Y|x_0]\)

  2. \(Var[\hat{\beta}_0 + \hat{\beta}_1 x_0]= \sigma^2\left(\frac{1}{n} + \frac{(x_0-\bar{x})^2}{SXX}\right)\)

\(\hat{\beta}_0\) and \(\hat{\beta}_1\) are usually correlated but they are jointly gaussian. Therefore:

\[\frac{\hat{\beta}_0+ \hat{\beta}_1 x_0 - E[Y|x_0]}{\hat{\sigma}\sqrt{\frac{1}{n} + \frac{(x_0-\bar{x})^2}{SXX}}} \sim t_{(n-2)}\]

Prediction of a new observation

The goal is to predict \(Y_u\), an unobserved response value for \(x=x_u,\) independent from observed data.

To do that we need to consider 2 sources of uncertainty:

  1. the uncertainty in the mean of the distribution of \(Y_u\);

  2. the variation within that distribution.

Again, the idea is to use some sort of standardized r.v. \[\frac{Y_u-E[Y_u\mid x]}{sd(pred)}\]

Since \(E[Y_u\mid x]\) is unknown, we must use its estimate \(\hat{Y}_u = \hat{\beta}_0 + \hat{\beta}_1 x_u\).

As we have seen \(E[Y_u - \hat{Y}_u] = 0\) and

\(\begin{split} Var[Y_u - \hat{Y}_u] & = Var[Y_u] + Var[\hat{Y}_u] = \\ & = \sigma^2 + \sigma^2\left(\frac{1}{n} + \frac{(x_n-\bar{x})^2}{SXX}\right) = \\ & = \sigma^2\left(1 +\frac{1}{n} + \frac{(x_n-\bar{x})^2}{SXX}\right) \end{split}\)

Finally, it can be shown that

\[\frac{Y_u-\hat{Y}_u}{se(pred)}\sim t_{(n-2)}\]

where \(se(pred)=\hat{\sigma}^2\left(1 +\frac{1}{n} + \frac{(x_u-\bar{x})^2}{SXX}\right)\).

Note

Inferences and predictions drawn from linear models are only valid within the original data domain.

Extrapolations should be made with caution!

1.3 The general linear model

\[y_i = \beta_0 + \sum_{j=1}^{p-1}\beta_j x_{i,j} + e_i,\,i=1,\ldots,n\]

\[\underset{n\times 1}{\b{y}} = \underset{n\times p}{\b{X}}\quad \underset{p\times 1}{\bs{\beta}}+\underset{n\times 1}{\b{e}}\]

\[\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}=\begin{pmatrix} 1 & x_{1,1} & \cdots & x_{1,p-1}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n,1} & \cdots & x_{n,p-1} \end{pmatrix} \begin{pmatrix}\beta_0\\ \vdots\\ \beta_{p-1}\end{pmatrix}+ \begin{pmatrix}e_1\\ \vdots\\ e_n\end{pmatrix}\]

\[\b{y} = \b{X} \bs{\beta} + \b{e}\]

\(\b{y}\ (n\times 1)\) – vector of observations of a quantitative response variable or some transformation of it

\(\b{X}\ (n\times p)\) – matrix of model terms (values of covariates or functions of those) – the model specification matrix or design matrix

\(\bs{\beta}\ (p\times 1)\) – vector of parameters

\(\b{e}\ (n\times 1)\) – vector of random components with \(\b{e}\sim N_n(\b{0}, \sigma^2\b{I})\)

Remarks

  1. The number of observations (\(n\)) must be greater than the number of parameters \((p+1)\)
  1. If \(r(\b{X}) = p\), the model is said to be a full rank model (regression model). Otherwise, the model is said to be an incomplete rank model (analysis of variance model).
  1. Variants of this model include the cases where \(\bs{\beta}\) or \(\b{X}\) are treated as random variables (Bayesian models and correlation models, respectively).

Multiple linear regression (1st order model)

\[y_i = \beta_0 + \sum_{j=1}^{p-1}\beta_j x_{i,j} + e_i,\,i=1,\ldots,n\]

\[\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}=\begin{pmatrix} 1 & x_{1,1} & \cdots & x_{1,p-1}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n,1} & \cdots & x_{n,p-1} \end{pmatrix} \begin{pmatrix}\beta_0\\ \vdots\\ \beta_{p-1}\end{pmatrix}+ \begin{pmatrix}e_1\\ \vdots\\ e_n\end{pmatrix}\]

Linear regression by transformation

\[y_i = \beta_0 x_{i}^{\beta_1} e_i,\,i=1,\ldots,n\] \[\Updownarrow\] \[\log y_i = \log\beta_0 + \beta_1 \log x_{i} + \log e_i,\,i=1,\ldots,n\]

\[\begin{pmatrix}\log y_1\\ \vdots\\ \log y_n\end{pmatrix}=\begin{pmatrix} 1 & \log x_{1} \\ \vdots & \vdots \\ 1 & \log x_{n} \end{pmatrix} \begin{pmatrix}\log \beta_0\\ \beta_{1}\end{pmatrix}+ \begin{pmatrix}\log e_1\\ \vdots\\ \log e_n\end{pmatrix}\]

Polynomial regression

\[y_i = \sum_{j=0}^{p-1}\beta_j x_{i}^j + e_i,\,i=1,\ldots,n\]

\[\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}=\begin{pmatrix} 1 & x_{1} & x_{1}^2 & \cdots & x_1^{p-1}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n} & x_{n}^2 & \cdots & x_n^{p-1} \end{pmatrix} \begin{pmatrix}\beta_0\\ \vdots\\ \beta_{p-1}\end{pmatrix}+ \begin{pmatrix}e_1\\ \vdots\\ e_n\end{pmatrix}\]

An incomplete rank design

Results from an experiment to compare yields \(Y\) (measured as dried weight of plants) obtained for a control \((x_1=1)\) and a treatment \((x_2=1)\) groups, where \(x_1\) and \(x_2\) are binary variables.

\(y\) (\(x_1=1\)) 4.17 5.58 5.18 6.11 4.50 4.61 5.17 4.53 5.33 5.14
\(y\) (\(x_2=1\)) 6.31 5.12 5.54 5.50 5.37 5.29 4.92 6.15 5.80 5.26

Notation: \(y_{i,j}\) is the dried weight of the \(j\)-th plant belonging to the group \(i\), \(j=1,\ldots,n_i\), \(i=1,2\).

\[y_{i,j} = \beta_{0}+ \beta_{1}x_{1,j} + \beta_{2}x_{2,j} + e_{i,j}\] \[\Updownarrow\] \[y_{i,j} = \beta_{0}+ \beta_{i} + e_{i,j}\]

\[\begin{pmatrix} y_{1,1} \\ \vdots \\ y_{1,n_1} \\ y_{2,1} \\ \vdots \\ y_{2,n_2} \end{pmatrix} = \begin{pmatrix} 1&1 &0 \\ \vdots & \vdots &\vdots \\ 1&1 &0 \\ 1&0 &1 \\ \vdots &\vdots &\vdots \\ 1 &0 &1 \end{pmatrix} \begin{pmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \end{pmatrix} + \begin{pmatrix} e_{1,1} \\ \cdots \\ e_{1,n_1} \\ e_{2,1} \\ \cdots \\ e_{2,n_2} \end{pmatrix}\]

pages
grid_view zoom_out zoom_in swap_horiz autorenew visibility_off
menu
fullscreen
stylus_note ink_eraser delete_forever content_paste navigate_before navigate_next
draw