1 Linear models

1.1 Introduction

What is a statistical model?

A probabilistic model that establishes mathematical relationships between some statistical variables
A way to embody a set of assumptions about the observation of data

Why are there so many statistical models?

\(+\) time, space, text, images, sounds, . . .

Our context

\(Y\) – an univariate quantitative response variable

\({\bf x}=(x_1,\ldots,x_k)'\) – any number of error-free predictors, covariates or explanatory variables

Some important issues

Which covariates contribute to explain the variation of \(Y\)?

How strong is the relationship between each of those covariates and the response variable?
How accurately can we predict future or past unobserved values of \(Y\)?

What is a linear model?

Basic (additive) modelling idea:

\[Y = f(\mathbf{x}, \boldsymbol{\beta}) + E\]

where \(E\) is called the random error.

A linear model is characterized by:

the choice of a linear function of \(\bs{\beta}\):

\(f(\mathbf{x}, \boldsymbol{\beta})=\mathbf{x}' \boldsymbol{\beta}\) – the linear predictor

a set of assumptions regarding the random error:

\(E\sim N(0, \sigma^2)\), uncorrelated \(\forall \mathbf{x}\) (usually)

Which of the following models are linear?

\(Y=\beta_0 + \beta_1 x + \beta_2 x^2 + E\)
\(\log Y=\beta_0 + \beta_1 \log x + E\)
\(Y=\beta_0 x^{\beta_1} + E\)
\(Y=\beta_0 x^{\beta_1} E\)

Alternative formulation

\[Y\sim F(\boldsymbol{\theta})\]

where \(F\) belongs to the exponential family of distributions

and let \(g(E[Y])=\mathbf{x}' \boldsymbol{\beta}\), where \(g\) is called the link function

\[\begin{cases} F\ \text{gaussian} \\ g= \text{identity function} \end{cases} \implies \text{Linear model}\]

Other choices for bounded or discrete responses:

Bernoulli (Logistic regression)
Poisson/Negative binomial
Gamma
Beta

\(\leadsto\) Generalized linear models

Why linear models?

Key analytical tool in the investigation of competing causes for some phenomenon
Some of the simplest models that we know the most about
Building block for more complex models (GLM’s, mixed models, machine learning algorithms, . . .)

1.2 Simple linear regression model – a review

Data: \((y_i,x_i)\), \(i=1,\ldots,n\)

\[Y_i=\beta_0+\beta_1x_i+E_i,\, i=1,\ldots,n\]

\(E_i\) – random error associated to \(Y_i=(Y\mid x=x_i)\)

OLS estimation

\[S(\beta_0,\beta_1)=\sum_{i=1}^{n}{E_i^2}=\sum_{i=1}^{n}{(Y_i-\beta_0-\beta_1x_i)^2}\]

\[(\hat{\beta}_0,\hat{\beta}_1)_{OLS}=\arg\min_\limits{\boldsymbol{\beta}\in \mathbb{R}^2} S(\beta_0,\beta_1)\]

\[\left\lbrace\begin{array}{l} \frac{\partial S}{\partial \beta_0}=0\\ \frac{\partial S}{\partial \beta_1}=0\\ \end{array}\right. \iff \left\lbrace\begin{array}{lcl} \beta_0 n+\beta_1 \sum\limits_{i = 1}^n{x_i} &=& \sum\limits_{i = 1}^n {Y_i }\\ \beta_0 \sum\limits_{i = 1}^n{x_i}+\beta_1 \sum\limits_{i = 1}^n{x_i^2} &=& \sum\limits_{i = 1}^n {x_i Y_i}\\ \end{array}\right.\]

\[\iff \left\lbrace\begin{array}{lcl} \hat {\beta }_1 &=& \frac{\sum\limits_{i = 1}^n {x_i Y_i} - n\bar {x}\bar {Y}}{\sum\limits_{i = 1}^n {x_i^2 } - n\bar {x}^2}\\ \hat{\beta }_0 &=& \bar{Y} - \hat{\beta }_1 \bar{x}\\ \end{array}\right.\]

A unique solution exists if and only if \[\begin{vmatrix} n & \sum\limits_{i = 1}^n{x_i} \\ \sum\limits_{i = 1}^n{x_i} & \sum\limits_{i = 1}^n{x_i^2} \end{vmatrix}=n \sum_{i = 1}^n\left(x_i-\bar{x}\right)^2\neq 0,\]

that is, if the sample contains at least two different values of \(x\).

When the solution exists, it is indeed a minimum since the Hessian matrix of \(S\) is \[2\times\begin{pmatrix} n & \sum\limits_{i = 1}^n{x_i} \\ \sum\limits_{i = 1}^n{x_i} & \sum\limits_{i = 1}^n{x_i^2} \end{pmatrix}\]

and its determinant is \(4n \sum\limits_{i = 1}^n{(x_i-\bar{x})^2}>0\).

How can we trust this estimator?

Under the Gauss-Markov conditions:

\(E[E_i]=0\) \(\iff E[Y_i]=\beta_0+\beta_1x_i\)
\(Var[E_i]=\sigma^2\) \(\iff Var[Y_i]=\sigma^2\)
\(E_i\)’s uncorrelated \(\iff\) \(Y_i\)’s uncorrelated

it can be shown that the OLS estimator is the Best Linear Unbiased Estimator (BLUE) for \(\boldsymbol{\beta}\).

Alternative

ML estimation under the Gaussian assumption

\[E_i\sim N\left(0,\sigma^2\right)\iff Y_i\sim N\left(\beta_0+\beta_1 x_i,\sigma^2\right)\]

\[\mathcal{L}\left(\beta_0,\beta_1,\sigma^2\mid \mathbf{y}\right)=\prod_{i=1}^{n}{\frac{1}{\sqrt{2\pi\sigma^2}}\, e^{-\frac{1}{2\sigma^2}\left(y_i-(\beta_0+\beta_1 x_i)\right)^2}}\]

Somewhat surprisingly, the OLS and ML estimators for \(\beta_0\) an \(\beta_1\) coincide and

\[\hat{\sigma}_{ML}^2=\frac{\sum_{i=1}^{n}{\left(Y_i-(\hat{\beta}_0+\hat{\beta}_1 x_i)\right)^2}}{n}\]

Some notation

\(SXX = \sum\limits_{i=1}^n (x_i-\bar{x})^2 = \sum\limits_{i=1}^n x_i^2-n \bar{x}^2 = \sum\limits_{i=1}^n x_i(x_i-\bar{x})\)

\(\begin{split} SXY & = \sum_{i=1}^n (x_i-\bar{x})(Y_i-\bar{Y}) = \\ & = \sum_{i=1}^n x_i(Y_i-\bar{Y}) = \\ & = \sum_{i=1}^n Y_i(x_i-\bar{x}) = \\ & = \sum_{i=1}^n x_iY_i - n \bar{x} \bar{Y} \end{split}\)

Properties of the estimators

\[\hat{\beta}_1 = \frac{SXY}{SXX} = \frac{\sum_{i=1}^n Y_i(x_i-\bar{x})}{SXX} = \sum_{i=1}^n k_i Y_i\] with \(k_i = \dfrac{x_i-\bar{x}}{SXX}\)

\(\therefore \hat{\beta}_1\) is a linear estimator

\(E[\hat{\beta}_1] = \sum_{i=1}^n k_i E[Y_i] = \ldots = \beta_1\)

\(\therefore \hat{\beta}_1\) is an unbiased estimator of \(\beta_1\)

\(Var[\hat{\beta}_1] = \sum_{i=1}^n k_i^2 Var[Y_i] = \sigma^2\sum_{i=1}^n k_i^2 = \ldots = \dfrac{\sigma^2}{SXX}\)

\(\therefore \hat{\beta}_1 \sim N\left(\beta_1, \dfrac{\sigma^2}{SXX}\right)\)

\[\hat{\beta}_0 = \bar{Y} - \hat{\beta }_1 \bar{x} = \ldots = \sum_{i=1}^n \left(\frac{1}{n} - \bar{x}k_i \right) Y_i\]

\(\therefore \hat{\beta}_0\) is also a linear estimator and \(E[\hat{\beta}_0] = \ldots = \beta_0\)

\(\begin{split} Var[\hat{\beta}_0] & = Var[\bar{Y} - \hat{\beta }_1 \bar{x}] = \\ & = Var[\bar{Y}] + \bar{x}^2 Var[\hat{\beta }_1] - 2 \bar{x} Cov[\bar{Y}, \hat{\beta }_1] = \\[0.25cm] & = \hspace{12.2cm}= \\[0.5cm] & = \sigma^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{SXX} \right) \end{split}\)

\(\therefore \hat{\beta}_0 \sim N\left(\beta_0, \sigma^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{SXX} \right) \right)\)

Show that, in general, \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are not independent since

\[Cov[\hat{\beta}_0, \hat{\beta}_1] = -\frac{\bar{x}\sigma^2}{SXX}\]

Regarding the estimator of \(\sigma^2\), \[\hat{\sigma}^2_{ML}=\frac{\sum_{i=1}^n(Y_i-\hat{Y}_i)^2}{n} = \frac{SSE}{n}\]

it can be shown that \(E[SSE]=(n-2)\sigma^2 \implies E[\hat{\sigma}^2_{ML}] = \dfrac{n-2}{n}\sigma^2\)

\(\therefore \hat{\sigma}^2=\dfrac{n}{n-2}\hat{\sigma}^2_{ML}=\frac{SSE}{n-2}\) is an unbiased estimator of \(\sigma^2\).

Now, Cochran’s theorem states that:

\(\frac{SSE}{\sigma^2}= \frac{(n-2)\hat{\sigma}^2}{\sigma^2}\sim\chi_{(n-2)}^2\)
\((\hat{\beta}_0, \hat{\beta}_1)\) are jointly independent of \(\hat{\sigma}^2\)

All previous results lead to:

\[\frac{\hat{\beta}_1-\beta_1}{\hat{\sigma}\sqrt{\frac{1}{SXX}}} = \frac{\hat{\beta}_1-\beta_1}{se(\hat{\beta}_1)} \sim t_{(n-2)}\]

and

\[\frac{\hat{\beta}_0-\beta_0}{\hat{\sigma}\sqrt{\frac{1}{n} + \frac{\bar{x}^2}{SXX}}} = \frac{\hat{\beta}_0-\beta_0}{se(\hat{\beta}_0)} \sim t_{(n-2)}\]

that allow us to carry out hypothesis tests and compute confidence intervals for \(\beta_0\) and \(\beta_1\).

Inferences for the mean response

\[E[Y|x_0] = \beta_0 + \beta_1 x_0\]

\(\hat{\beta}_0 + \hat{\beta}_1 x_0\) is an unbiased estimator for \(E[Y|x_0]\)
\(Var[\hat{\beta}_0 + \hat{\beta}_1 x_0]= \sigma^2\left(\frac{1}{n} + \frac{(x_0-\bar{x})^2}{SXX}\right)\)

\(\hat{\beta}_0\) and \(\hat{\beta}_1\) are usually correlated but they are jointly gaussian. Therefore:

\[\frac{\hat{\beta}_0+ \hat{\beta}_1 x_0 - E[Y|x_0]}{\hat{\sigma}\sqrt{\frac{1}{n} + \frac{(x_0-\bar{x})^2}{SXX}}} \sim t_{(n-2)}\]

Prediction of a new observation

The goal is to predict \(Y_u\), an unobserved response value for \(x=x_u,\) independent from observed data.

To do that we need to consider 2 sources of uncertainty:

the uncertainty in the mean of the distribution of \(Y_u\);
the variation within that distribution.

Again, the idea is to use some sort of standardized r.v. \[\frac{Y_u-E[Y_u\mid x]}{sd(pred)}\]

Since \(E[Y_u\mid x]\) is unknown, we must use its estimate \(\hat{Y}_u = \hat{\beta}_0 + \hat{\beta}_1 x_u\).

As we have seen \(E[Y_u - \hat{Y}_u] = 0\) and

\(\begin{split} Var[Y_u - \hat{Y}_u] & = Var[Y_u] + Var[\hat{Y}_u] = \\ & = \sigma^2 + \sigma^2\left(\frac{1}{n} + \frac{(x_n-\bar{x})^2}{SXX}\right) = \\ & = \sigma^2\left(1 +\frac{1}{n} + \frac{(x_n-\bar{x})^2}{SXX}\right) \end{split}\)

Finally, it can be shown that

\[\frac{Y_u-\hat{Y}_u}{se(pred)}\sim t_{(n-2)}\]

where \(se(pred)=\hat{\sigma}^2\left(1 +\frac{1}{n} + \frac{(x_u-\bar{x})^2}{SXX}\right)\).

Note

Inferences and predictions drawn from linear models are only valid within the original data domain.

Extrapolations should be made with caution!

1.3 The general linear model

\[y_i = \beta_0 + \sum_{j=1}^{p-1}\beta_j x_{i,j} + e_i,\,i=1,\ldots,n\]

\[\underset{n\times 1}{\b{y}} = \underset{n\times p}{\b{X}}\quad \underset{p\times 1}{\bs{\beta}}+\underset{n\times 1}{\b{e}}\]

\[\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}=\begin{pmatrix} 1 & x_{1,1} & \cdots & x_{1,p-1}\\ \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n,1} & \cdots & x_{n,p-1} \end{pmatrix} \begin{pmatrix}\beta_0\\ \vdots\\ \beta_{p-1}\end{pmatrix}+ \begin{pmatrix}e_1\\ \vdots\\ e_n\end{pmatrix}\]

\[\b{y} = \b{X} \bs{\beta} + \b{e}\]

\(\b{y}\ (n\times 1)\) – vector of observations of a quantitative response variable or some transformation of it

\(\b{X}\ (n\times p)\) – matrix of model terms (values of covariates or functions of those) – the model specification matrix or design matrix

\(\bs{\beta}\ (p\times 1)\) – vector of parameters

\(\b{e}\ (n\times 1)\) – vector of random components with \(\b{e}\sim N_n(\b{0}, \sigma^2\b{I})\)

Remarks

The number of observations (\(n\)) must be greater than the number of parameters \((p+1)\)

If \(r(\b{X}) = p\), the model is said to be a full rank model (regression model). Otherwise, the model is said to be an incomplete rank model (analysis of variance model).

Variants of this model include the cases where \(\bs{\beta}\) or \(\b{X}\) are treated as random variables (Bayesian models and correlation models, respectively).

Multiple linear regression (1^st order model)

\[y_i = \beta_0 + \sum_{j=1}^{p-1}\beta_j x_{i,j} + e_i,\,i=1,\ldots,n\]

Linear regression by transformation

\[y_i = \beta_0 x_{i}^{\beta_1} e_i,\,i=1,\ldots,n\] \[\Updownarrow\] \[\log y_i = \log\beta_0 + \beta_1 \log x_{i} + \log e_i,\,i=1,\ldots,n\]

\[\begin{pmatrix}\log y_1\\ \vdots\\ \log y_n\end{pmatrix}=\begin{pmatrix} 1 & \log x_{1} \\ \vdots & \vdots \\ 1 & \log x_{n} \end{pmatrix} \begin{pmatrix}\log \beta_0\\ \beta_{1}\end{pmatrix}+ \begin{pmatrix}\log e_1\\ \vdots\\ \log e_n\end{pmatrix}\]

Polynomial regression

\[y_i = \sum_{j=0}^{p-1}\beta_j x_{i}^j + e_i,\,i=1,\ldots,n\]

\[\begin{pmatrix}y_1\\ \vdots\\ y_n\end{pmatrix}=\begin{pmatrix} 1 & x_{1} & x_{1}^2 & \cdots & x_1^{p-1}\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & x_{n} & x_{n}^2 & \cdots & x_n^{p-1} \end{pmatrix} \begin{pmatrix}\beta_0\\ \vdots\\ \beta_{p-1}\end{pmatrix}+ \begin{pmatrix}e_1\\ \vdots\\ e_n\end{pmatrix}\]

An incomplete rank design

Results from an experiment to compare yields \(Y\) (measured as dried weight of plants) obtained for a control \((x_1=1)\) and a treatment \((x_2=1)\) groups, where \(x_1\) and \(x_2\) are binary variables.

\(y\) (\(x_1=1\))	4.17	5.58	5.18	6.11	4.50	4.61	5.17	4.53	5.33	5.14
\(y\) (\(x_2=1\))	6.31	5.12	5.54	5.50	5.37	5.29	4.92	6.15	5.80	5.26

Notation: \(y_{i,j}\) is the dried weight of the \(j\)-th plant belonging to the group \(i\), \(j=1,\ldots,n_i\), \(i=1,2\).

\[y_{i,j} = \beta_{0}+ \beta_{1}x_{1,j} + \beta_{2}x_{2,j} + e_{i,j}\] \[\Updownarrow\] \[y_{i,j} = \beta_{0}+ \beta_{i} + e_{i,j}\]

\[\begin{pmatrix} y_{1,1} \\ \vdots \\ y_{1,n_1} \\ y_{2,1} \\ \vdots \\ y_{2,n_2} \end{pmatrix} = \begin{pmatrix} 1&1 &0 \\ \vdots & \vdots &\vdots \\ 1&1 &0 \\ 1&0 &1 \\ \vdots &\vdots &\vdots \\ 1 &0 &1 \end{pmatrix} \begin{pmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \end{pmatrix} + \begin{pmatrix} e_{1,1} \\ \cdots \\ e_{1,n_1} \\ e_{2,1} \\ \cdots \\ e_{2,n_2} \end{pmatrix}\]

stylus_note ink_eraser delete_forever content_paste navigate_before navigate_next

draw

Linear models

1st semester 2025/2026

1 Linear models

1.1 Introduction

1.2 Simple linear regression model – a review

1.3 The general linear model

1^st semester 2025/2026