A bridge to somewhere

An exploration in Linear Models

Authors

Paulo Soares

et

al.

Published

November 9, 2025

\(\def\bs#1{\boldsymbol{#1}} \def\b#1{\mathbf{#1}}}\)

1 Introduction

\[\b{y} \sim N_n \left(\b{X} \bs{\beta}, \sigma^2\b{I}\right)\]

We will keep using the dataset introduced in Lab#1 where we had this brief description:

Ten baseline variables, AGE (in years), SEX, body mass index (BMI), average blood pressure (BP), and six blood serum measurements (S1 to S6) were obtained for each of 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline (Y). The data is available in the file diabetes.txt.

1.1 The specification of linear models

In R, models are specified in a compact symbolic form. From the documentation:

The ~ operator is basic in the formation of such models. An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model. Such a model consists of a series of terms separated by + operators. The terms themselves consist of variable and factor names separated by : operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term.

Define the following models:

  1. a first-order model with all covariates except SEX;

    dataset <- data.frame(x = 1:10, y = 1:10 + rnorm(10))
    dataset
    Table 1: Simple demo R table
    ggplot(dataset) + geom_point(aes(x, y), col = "tomato")
    Figure 1: Simple demo R plot

    See Figure 1 for an illustration.

2 Description of the data

3 Methods

4 Results

5 Conclusion

Appendix I