IST
Linear models

Project assignment 2025/26

Author

Paulo Soares

Published

November 10, 2025

1 Overview

This project looks at some data from the US National Bridge Inspection maintained by the Federal Highways Agency (FHWA), part of the US Department of Transportation. The dataset has information about bridges in a single state and it includes a numerical measure of each bridge Condition derived from ratings attributed to the bridge deck, superstructure and foundations in the most recent inspection.

Variable Description
Structure_id Identification key
Urban Whether the bridge is in an urban or rural area
Year The year the bridge was built
Lanes_on Number of traffic lanes on the bridge
AverageDaily The average daily traffic (number of vehicles)
Historic Whether the bridge is historic
Material The dominant material the bridge is made from
Spans Number of spans of the bridge
Length The length of the bridge (m)
Width The width of the bridge (m)
Trucks_percent The percentage of traffic made up of trucks

 

You are required to propose one or more linear regression models with the following goals:

  1. to analyse how the covariates can explain the bridge condition;

  2. to predict the responde variable (Condition) for an additional set of bridges.

2 Getting started

  1. Students should create groups of size 3 (exceptionally 2, with my agreement).

  2. One element of the group should inform me about the group composition.

  3. All the elements of the group will receive the data files by e-mail.

  4. Go ahead with your analysis.

3 Final report

Your team should write a report describing the problem and the proposed solution.

All required computations should be done in R and the report should be written as a R Quarto document with no local external dependencies other than the data files and R packages. The compiled HTML file (with the source document embedded) is the only file that is to be delivered until the project deadline. This file should produce no more than 20 pages when printed to an A4 PDF file. You can use this bare bones template as a starting point.

The code in the source file should produce a data frame named predictions with the following structure:

4 Assessment

Each team will be graded a mark between 0 and 20 with the following distribution:

  • Report (0–17)

    The report will be classified using a weighted mean of partial classifications with the following weights:

    Item Weight
    Introduction 5%
    Exploratory data analysis 15%
    Modelling 40%
    Diagnostics 25%
    Results and conclusions 10%
    Instructions compliance and overall evaluation 5%

     

  • Prediction score (0–3)

    The highest and the smallest RMSE obtained will be classified with grades \(g_{\min}\) and \(g_{\max}\in [0, 5]\), all other results will be linearly interpolated in the interval \([g_{\min}, g_{\max}]\).

5 Important date

Final report deadline: Friday, December 19, 2025 (23:59, Europe/Lisbon GMT+01:00)