Linear Models

Lab#5 – ANOVA models

Author

Paulo Soares

Published

December 15, 2025

1 Do tattoos and how you dress influence whether someone will help you?

An experiment was designed to investigate whether dressing and presence of visible tattoos, among other factors, can influence how long a person will interact with a stranger asking for directions. The data is available in the file tattoos.csv.

1.1 Read and explore the data

library(dplyr)

url <- "https://web.tecnico.ulisboa.pt/paulo.soares/aml/data/tattoos.csv"
att <- read.csv(url)

str(att)
'data.frame':   80 obs. of  6 variables:
 $ dress    : chr  "casual" "casual" "casual" "casual" ...
 $ tattoo   : chr  "vis" "not" "vis" "not" ...
 $ time     : int  10 51 31 75 132 112 13 122 7 140 ...
 $ gender   : chr  "Female" "Male" "Female" "Male" ...
 $ ethnicity: chr  "White" "White" "White" "African American" ...
 $ age      : chr  "appears under 40" "appears over 40" "appears under 40" "appears under 40" ...
att <- att |> mutate(across(where(is.character), as.factor))

summary(att)
    dress    tattoo        time          gender              ethnicity 
 casual:40   not:40   Min.   :  2.0   Female:35   African American:14  
 prof  :40   vis:40   1st Qu.:  9.5   Male  :45   Other           : 8  
                      Median : 35.0               White           :58  
                      Mean   : 46.1                                    
                      3rd Qu.: 65.2                                    
                      Max.   :166.0                                    
               age    
 appears over 40 :39  
 appears under 40:41  
                      
                      
                      
                      
plot.design(att)

with(att, {
  table(dress, tattoo)
})
        tattoo
dress    not vis
  casual  20  20
  prof    20  20

1.2 Single factor analysis

Perform separate ANOVA analyses for the factors dress and ethnicity. Can we have the same confidence in the results from both models?

fit <- aov(time ~ dress, data = att)
fit
Call:
   aov(formula = time ~ dress, data = att)

Terms:
                 dress Residuals
Sum of Squares    2880    143494
Deg. of Freedom      1        78

Residual standard error: 42.89
Estimated effects may be unbalanced
summary(fit)
            Df Sum Sq Mean Sq F value Pr(>F)
dress        1   2880    2880    1.57   0.21
Residuals   78 143494    1840               
fit <- aov(time ~ ethnicity, data = att)
fit
Call:
   aov(formula = time ~ ethnicity, data = att)

Terms:
                ethnicity Residuals
Sum of Squares        519    145855
Deg. of Freedom         2        77

Residual standard error: 43.52
Estimated effects may be unbalanced
summary(fit)
            Df Sum Sq Mean Sq F value Pr(>F)
ethnicity    2    519     259    0.14   0.87
Residuals   77 145855    1894               

1.3 Two-way analysis

  1. Consider now an ANOVA model including the factors dress and tattoo. Start by showing that we have a balanced design.

    att |> summarize(n = n(), cell_mean = mean(time), .by = c(dress, tattoo))
       dress tattoo  n cell_mean
    1 casual    vis 20     67.60
    2 casual    not 20     36.55
    3   prof    not 20     47.50
    4   prof    vis 20     32.65
    with(att, {
      interaction.plot(dress, tattoo, time)
    })

    fit <- lm(time ~ dress * tattoo, data = att)
    summary(fit)
    
    Call:
    lm(formula = time ~ dress * tattoo, data = att)
    
    Residuals:
       Min     1Q Median     3Q    Max 
    -59.60 -29.65  -6.53  18.96 122.35 
    
    Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
    (Intercept)            36.55       9.31    3.93  0.00019 ***
    dressprof              10.95      13.16    0.83  0.40802    
    tattoovis              31.05      13.16    2.36  0.02088 *  
    dressprof:tattoovis   -45.90      18.61   -2.47  0.01592 *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 41.6 on 76 degrees of freedom
    Multiple R-squared:  0.101, Adjusted R-squared:  0.0651 
    F-statistic: 2.83 on 3 and 76 DF,  p-value: 0.0438
    anova(fit)
    Analysis of Variance Table
    
    Response: time
                 Df Sum Sq Mean Sq F value Pr(>F)  
    dress         1   2880    2880    1.66  0.201  
    tattoo        1   1312    1312    0.76  0.387  
    dress:tattoo  1  10534   10534    6.08  0.016 *
    Residuals    76 131647    1732                 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    # Order of factors
    fit <- aov(time ~ tattoo * dress, data = att)
    summary(fit)
                 Df Sum Sq Mean Sq F value Pr(>F)  
    tattoo        1   1312    1312    0.76  0.387  
    dress         1   2880    2880    1.66  0.201  
    tattoo:dress  1  10534   10534    6.08  0.016 *
    Residuals    76 131647    1732                 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    par(mfrow = c(2, 2))
    plot(fit)

    par(mfrow = c(1, 1))
    
    ci <- TukeyHSD(fit)
    plot(ci)

    ci
      Tukey multiple comparisons of means
        95% family-wise confidence level
    
    Fit: aov(formula = time ~ tattoo * dress, data = att)
    
    $tattoo
            diff    lwr   upr  p adj
    vis-not  8.1 -10.44 26.64 0.3868
    
    $dress
                diff    lwr   upr  p adj
    prof-casual  -12 -30.54 6.535 0.2012
    
    $`tattoo:dress`
                            diff     lwr     upr  p adj
    vis:casual-not:casual  31.05  -3.522 65.6221 0.0939
    not:prof-not:casual    10.95 -23.622 45.5221 0.8391
    vis:prof-not:casual    -3.90 -38.472 30.6721 0.9909
    not:prof-vis:casual   -20.10 -54.672 14.4721 0.4265
    vis:prof-vis:casual   -34.95 -69.522 -0.3779 0.0466
    vis:prof-not:prof     -14.85 -49.422 19.7221 0.6733
    NoteRecommendation

    There is some mild evidence that if you have visible tattoos, to get the best attention from strangers when asking for directions, you better dress casually.

  2. Fit a second ANOVA model with the factors dress and ethnicity. Check that now we have an unbalanced design and explore some consequence of that.

    att |> summarize(n = n(), cell_mean = mean(time), .by = c(dress, ethnicity))
       dress        ethnicity  n cell_mean
    1 casual            White 27     60.07
    2 casual African American  9     45.89
    3 casual            Other  4     12.00
    4   prof            Other  4     66.50
    5   prof            White 31     34.19
    6   prof African American  5     55.40
    with(att, {
      interaction.plot(dress, ethnicity, time)
    })

    fit <- lm(time ~ dress * ethnicity, data = att)
    summary(fit)
    
    Call:
    lm(formula = time ~ dress * ethnicity, data = att)
    
    Residuals:
       Min     1Q Median     3Q    Max 
     -63.5  -30.2   -9.3   19.1  120.8 
    
    Coefficients:
                             Estimate Std. Error t value Pr(>|t|)   
    (Intercept)                 45.89      13.97    3.29   0.0016 **
    dressprof                    9.51      23.37    0.41   0.6853   
    ethnicityOther             -33.89      25.18   -1.35   0.1825   
    ethnicityWhite              14.19      16.13    0.88   0.3820   
    dressprof:ethnicityOther    44.99      37.74    1.19   0.2371   
    dressprof:ethnicityWhite   -35.39      25.85   -1.37   0.1751   
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Residual standard error: 41.9 on 74 degrees of freedom
    Multiple R-squared:  0.112, Adjusted R-squared:  0.0522 
    F-statistic: 1.87 on 5 and 74 DF,  p-value: 0.11
    anova(fit)
    Analysis of Variance Table
    
    Response: time
                    Df Sum Sq Mean Sq F value Pr(>F)  
    dress            1   2880    2880    1.64  0.204  
    ethnicity        2    424     212    0.12  0.887  
    dress:ethnicity  2  13112    6556    3.73  0.029 *
    Residuals       74 129958    1756                 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    # Order of factors
    fit <- aov(time ~ ethnicity * dress, data = att)
    summary(fit)
                    Df Sum Sq Mean Sq F value Pr(>F)  
    ethnicity        2    519     259    0.15  0.863  
    dress            1   2785    2785    1.59  0.212  
    ethnicity:dress  2  13112    6556    3.73  0.029 *
    Residuals       74 129958    1756                 
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    WarningWarning

    With unbalanced data, results become dependent on the order of the factors and, if using only basic R anova functions, they can be unreliable.