
Linear models
Project assignment 2025/26
1 Overview
This project looks at some data from the US National Bridge Inspection maintained by the Federal Highways Agency (FHWA), part of the US Department of Transportation. The dataset has information about bridges in a single state and it includes a numerical measure of each bridge Condition derived from ratings attributed to the bridge deck, superstructure and foundations in the most recent inspection.
| Variable | Description |
|---|---|
| Structure_id | Identification key |
Urban |
Whether the bridge is in an urban or rural area |
Year |
The year the bridge was built |
Lanes_on |
Number of traffic lanes on the bridge |
AverageDaily |
The average daily traffic (number of vehicles) |
Historic |
Whether the bridge is historic |
Material |
The dominant material the bridge is made from |
Spans |
Number of spans of the bridge |
Length |
The length of the bridge (m) |
Width |
The width of the bridge (m) |
Trucks_percent |
The percentage of traffic made up of trucks |
You are required to propose one or more linear regression models with the following goals:
to analyse how the covariates can explain the bridge condition;
to predict the responde variable (
Condition) for an additional set of bridges.
2 Getting started
Students should create groups of size 3 (exceptionally 2, with my agreement).
One element of the group should inform me about the group composition.
All the elements of the group will receive the data files by e-mail.
Go ahead with your analysis.
3 Final report
Your team should write a report describing the problem and the proposed solution.
All required computations should be done in R and the report should be written as a R Quarto document with no local external dependencies other than the data files and R packages. The compiled HTML file (with the source document embedded) is the only file that is to be delivered until the project deadline. This file should produce no more than 20 pages when printed to an A4 PDF file. You can use this bare bones template as a starting point.
The code in the source file should produce a data frame named predictions with the following structure:
4 Assessment
Each team will be graded a mark between 0 and 20 with the following distribution:
Report (0–17)
The report will be classified using a weighted mean of partial classifications with the following weights:
Item Weight Introduction 5% Exploratory data analysis 15% Modelling 40% Diagnostics 25% Results and conclusions 10% Instructions compliance and overall evaluation 5% Prediction score (0–3)
The highest and the smallest RMSE obtained will be classified with grades \(g_{\min}\) and \(g_{\max}\in [0, 5]\), all other results will be linearly interpolated in the interval \([g_{\min}, g_{\max}]\).
5 Important date
Final report deadline: Friday, December 19, 2025 (23:59, Europe/Lisbon GMT+01:00)