Modeling two-level longitudinal data

BMLR Chapter 9

Author

Affiliation

Tyler George

Cornell College
STA 363 Spring 2025 Block 8

Setup

library(tidyverse)
library(tidymodels)
library(GGally)
library(knitr)
library(patchwork)
library(viridis)
library(ggfortify)
library(kableExtra)
library(lme4)
library(broom.mixed)

Learning goals

Describe general process for fitting and comparing multilevel models
Fit and interpret multilevel models for longitudinal data
Compare multilevel models
Conduct inference for random effects
Conduct inference for fixed effects

Data: Charter schools in MN

The data set charter-long.csv contains standardized test scores and demographic information for schools in Minneapolis, MN from 2008 to 2010. The data were collected by the Minnesota Department of Education. Understanding the effectiveness of charter schools is of particular interest, since they often incorporate unique methods of instruction and learning that differ from public schools.

MathAvgScore: Average MCA-II score for all 6th grade students in a school (response variable)
urban: urban (1) or rural (0) location school location
charter: charter school (1) or a non-charter public school (0)
schPctfree: proportion of students who receive free or reduced lunches in a school (based on 2010 figures).
year08: Years since 2008

Data

charter <- read_csv("data/charter-long.csv")

Rows: 1854 Columns: 9
── Column specification ──────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): schoolid, schoolName
dbl (7): urban, charter, schPctnonw, schPctsped, schPctfree, year08, MathAvgScore

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

charter |>
  select(schoolName, year08, urban, charter, schPctfree, MathAvgScore) |>
  slice(1:3, 1852:1854) |>
  kable(digits = 3)

schoolName	year08	urban	charter	schPctfree	MathAvgScore
RIPPLESIDE ELEMENTARY	0	0	0	0.363	652.8
RIPPLESIDE ELEMENTARY	1	0	0	0.363	656.6
RIPPLESIDE ELEMENTARY	2	0	0	0.363	652.6
RICHARD ALLEN MATH&SCIENCE ACADEMY	0	1	1	0.545	NA
RICHARD ALLEN MATH&SCIENCE ACADEMY	1	1	1	0.545	NA
RICHARD ALLEN MATH&SCIENCE ACADEMY	2	1	1	0.545	631.2

Assess missingness (1/2)

Missing data is common in longitudinal data. Before starting the analysis, it is important to understand the missing data patterns. Use the skim function from the skimr R package to get a quick view of the missingness.

library(skimr)

Warning: package 'skimr' was built under R version 4.4.1

charter |> skim() |> select(skim_variable, n_missing, complete_rate)

# A tibble: 9 × 3
  skim_variable n_missing complete_rate
  <chr>             <int>         <dbl>
1 schoolid              0         1    
2 schoolName            0         1    
3 urban                 0         1    
4 charter               0         1    
5 schPctnonw            0         1    
6 schPctsped            0         1    
7 schPctfree            0         1    
8 year08                0         1    
9 MathAvgScore        121         0.935

Assess missingness (2/2)

Another option is from the DataExplorer package. The function creations a plot.

library(DataExplorer)

Warning: package 'DataExplorer' was built under R version 4.4.1

plot_missing(charter)

Closer look at missing

Question: If we are going to fit a multilevel by schoolid, what types of missing data will be particularly problematic?

. . .

Output
Code

MathAvgScore0_miss	MathAvgScore1_miss	MathAvgScore2_miss	n
0	0	0	540
0	0	1	6
0	1	0	4
0	1	1	7
1	0	0	25
1	0	1	1
1	1	0	35

charter |>
  select(schoolid, schoolName, year08, MathAvgScore) |>
  pivot_wider(id_cols = c(schoolid, schoolName), names_from = year08,
              names_prefix = "MathAvgScore.", values_from = MathAvgScore) |> 
  mutate(MathAvgScore0_miss = if_else(is.na(MathAvgScore.0), 1, 0),
         MathAvgScore1_miss = if_else(is.na(MathAvgScore.1), 1, 0),
         MathAvgScore2_miss = if_else(is.na(MathAvgScore.2), 1, 0)) |> 
  count(MathAvgScore0_miss, MathAvgScore1_miss, MathAvgScore2_miss) |> 
  kable()

Dealing with missing data

Complete case analysis: Only include schools with complete data for all three years. This would remove 12.6% of observations in this data.
Last observation carried forward: Keep the last observation from each group (school) and conduct analysis for independent observations.
Impute missing observations: “Fill in” values of missing observations using the typical observed trends from groups with similar covariates.
Apply multilevel methods: Estimate patterns using available data recognizing that trends for groups with complete data are more precise than for those with fewer measurements. This is under the condition that the probability of missingness does not depend on unobserved predictors or the response.

. . .

Question: What is an advantage of each method? What is a disadvantage?

Strategy for building multilevel models

Conduct exploratory data analysis for Level One and Level Two variables.
Fit model with no covariates to assess variability at each level.
Create Level One models. Start with a single term, then add terms as needed.
Create Level Two models. Start with a single term, then add terms as needed. Start with equation for intercept term.
Begin with the full set of variance components, then remove variance terms as needed.

. . .

Alternate model building strategies in BMLR Section 8.6

Exploratory data analysis

Given the longitudinal structure of the data, we are able to answer questions at two levels

Level One (within school): How did average math scores for a given school change over time?
Level Two (between schools): What is the effect of school-specific covariates on the average math scores in 2008 and the rate of change from 2008 to 2010?

. . .

We can conduct exploratory data analysis at both levels, e.g.,

Univariate and bivariate EDA
lattice plots
Spaghetti plots
Let’s do some EDA together.

See BMLR Section 9.3 for full exploratory data analysis.

Unconditional means model

Start with the unconditional means model, a model with no covariates at any level. This is also called the random intercepts model.

Level One : \(Y_{ij} = a_{i} + \epsilon_{ij}, \hspace{5mm} \epsilon_{ij} \sim N(0, \sigma^2)\)

Level Two: \(a_i = \alpha_0 + u_i, \hspace{5mm} u_{i} \sim N(0, \sigma^2_u)\)

Question: Write the composite model.

Intraclass correlation (1/2)

The intraclass correlation is the relative variability between groups

\[\hat{\rho} = \frac{\text{Between variability}}{\text{Total variability}} = \frac{\hat{\sigma}_u^2}{\hat{\sigma}^2_u + \hat{\sigma}^2}\]

What is the meaning of \(\hat{\rho}\) close to 0?
What is the meaning of \(\hat{\rho}\) close to 1?

. . .

Question: Fit the unconditional means model and calculate the intraclass correlation.

Intraclass correlation (2/2)

The intraclass correlation is the relative variability between groups

\(\hat{\rho} = 0.798\). This means…

About 79.8% of the variability in math scores can be attributed to differences between schools (school-to-school variability). About 20.2% of the variability can be attributed to changes over time.
The average correlation between any two responses from the same school is about 0.798.
The effective sample size (number of independent pieces of information available for modeling) is closer to the number of schools \((\rho \text{ close to 1})\) than the number of observations \((\rho\text{ close to 0})\)

Unconditional growth model

A next step in the model building is the unconditional growth model, a model with Level One predictors but no Level Two predictors.

Level One: \(Y_{ij} = a_i + b_iYear08_{ij} + \epsilon_{ij}\)

Level Two: - \(a_i = \alpha_0 + u_i\) - \(b_i = \beta_0 + v_i\) . . .

. . .

Questions:

Write the composite model.
What can we learn from this model?
Fit the unconditional growth model.

Pseudo \(R^2\)

We can use Pseudo R² to explain changes in variance components between two models

Note: This should only be used when the definition of the variance component is the same between the two models

. . .

\[\text{Pseudo }R^2 = \frac{\hat{\sigma}^2(\text{Model 1}) - \hat{\sigma}^2(\text{Model 2})}{\hat{\sigma}^2(\text{Model 1})}\]

. . .

Question: Calculate the \(\text{Pseudo }R^2\) to estimate the change of within school variance between the unconditional means and unconditional growth models.

Model with school-level covariates

Fit a model with school-level covariates that takes the following form:

Level One

\[\begin{equation*} Y_{ij}= a_{i} + b_{i}Year08_{ij} + \epsilon_{ij} \end{equation*}\]

Level Two

\[\begin{align*} a_{i} & = \alpha_{0} + \alpha_{1}Charter_i + \alpha_{2}urban_i + \alpha_{3}schpctfree_i + u_{i} \\ b_{i} & = \beta_{0} + \beta_{1}Charter_i + \beta_{2}urban_i + v_{i} \end{align*}\]

Questions

Write out the composite model.
Fit the model in R.
Use the model to describe how the average math scores differed between charter and non-charter schools.

Consider a simpler model

Would a model without the effects \(v_i\) and \(\rho_{uv}\) be preferable?

. . .

Level One

\[\begin{equation*} Y_{ij}= a_{i} + b_{i}Year08_{ij} + \epsilon_{ij} \end{equation*}\]

Level Two

\[\begin{align*} a_{i} & = \alpha_{0} + \alpha_{1}Charter_i + \alpha_{2}urban_i + \alpha_{3}schpctfree_i + u_{i} \\ b_{i} & = \beta_{0} + \beta_{1}Charter_i + \beta_{2}urban_i \end{align*}\]

. . .

In this model, the effect of year is the same for all schools with a given combination of Charter and urban

Compare two models

Full model

\[\begin{aligned}Y_{ij} = [&\alpha_{0} + \alpha_{1}Charter_i + \alpha_{2}Urban_i+ \alpha_{3}schpctfree_i \\ &+ \beta_0Year08_{ij} + \beta_1Charter_i:Year08_{ij} + \beta_2Urban_i:Year_{ij}] \\&+ [u_i + v_iYear08_{ij} + \epsilon_{ij}]\end{aligned}\]

where

\[\left[ \begin{array}{c} u_{i} \\ v_{i} \end{array} \right] \sim N \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \left[ \begin{array}{cc} \sigma_{u}^{2} & \sigma_{uv} \\ \sigma_{uv} & \sigma_{v}^{2} \end{array} \right] \right) \hspace{2mm} \text{ and }\hspace{2mm} \epsilon_{ij} \sim N(0, \sigma^2)\]

. . .

Estimated fixed effects: \(\alpha_0, \alpha_1, \alpha_2, \alpha_3, \beta_0, \beta_1, \beta_2\)

Variance components to estimate: \(\sigma, \sigma_u, \sigma_v, \rho_{uv}\) (Note: \(\sigma_{uv} = \rho_{uv}\sigma_u\sigma_v\))

Compare two models

Null Model (simplified variance structure)

where

\[u_i \sim N(0, \sigma^2_u) \hspace{2mm} \text{ and }\hspace{2mm} \epsilon_{ij} \sim N(0, \sigma^2)\]

. . .

Estimated fixed effects: \(\alpha_0, \alpha_1, \alpha_2, \alpha_3, \beta_0, \beta_1, \beta_2\)

Variance components to estimate: \(\sigma, \sigma_u\)

Full and reduced models

full_model <- lmer(MathAvgScore ~ charter + urban + schPctfree + 
                     charter:year08  + urban:year08 + year08 + 
                     (year08|schoolid), REML = T, data = charter) #<<

reduced_model <-  lmer(MathAvgScore ~ charter + urban + schPctfree + 
                     charter:year08  + urban:year08 + year08 +
                       (1 | schoolid), REML = T, data = charter) #<<

. . .

Hypotheses

\[\begin{aligned}&H_0: \sigma_v = \rho_{uv} = 0\\ &H_a: \text{at least one of the parameters is not equal to 0}\end{aligned}\]

Note: \(\rho_{uv} \neq 0 \Rightarrow \sigma_v \neq 0\)

Issues with the drop-in-deviance test

anova(full_model, reduced_model, test = "Chisq") |>
  kable(digits = 3)

refitting model(s) with ML (instead of REML)

	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)
reduced_model	9	9952.992	10002.11	-4967.496	9934.992	NA	NA	NA
full_model	11	9953.793	10013.83	-4965.897	9931.793	3.199	2	0.202

\(\chi^2\) test conservative, i.e., the p-values are larger than they should be, when testing random effects at the boundary ( e.g., \(\sigma_v^2 = 0\)) or those with bounded ranges (e.g., \(\rho_{uv}\) )
If you observe small p-values, you can feel relatively certain the tested effects are statistically significant
Use bootstrap methods to obtain more accurate p-values

Parametric bootstrapping

Bootstrapping is from the phrase “pulling oneself up by one’s bootstraps”
Accomplishing a difficult task without any outside help
Task: conduct inference for model parameters (fixed and random effects) using only the sample data

Parametric bootstrapping for likelihood ratio test

1️⃣ Fit the null (reduced) model to obtain the fixed effects and variance components (parametric part).

2️⃣ Use the estimated fixed effects and variance components to generate a new set of response values with the same sample size and associated covariates for each observation as the original data (bootstrap part).

3️⃣ Fit the full and null models to the newly generated data.

4️⃣ Compute the likelihood test statistic comparing the models from the previous step.

5️⃣ Repeat steps 2 - 4 many times (~ 1000).

Parametric bootstrapping for likelihood ratio test

6️⃣ Create a histogram of the likelihood ratio statistics to get the distribution of likelihood ratio statistic under the null hypothesis.
7️⃣ Get the p-value by calculating the proportion of bootstrapped test statistics greater than the observed statistic.
Let’s calculate the bootstrapped p-value for the likelihood ratio test statistic testing whether \(v_i\) and \(\rho_{uv}\) can be removed from the model.

LRT using \(\chi^2\) and parametric bootstrap

Likelihood ratio test using \(\chi^2\) distribution

refitting model(s) with ML (instead of REML)

	npar	AIC	BIC	logLik	deviance	Chisq	Df	Pr(>Chisq)
reduced_model	9	9952.992	10002.11	-4967.496	9934.992	NA	NA	NA
full_model	11	9953.793	10013.83	-4965.897	9931.793	3.199	2	0.202

Likelihood ratio test using parametric bootstrap

term	npar	AIC	BIC	logLik	deviance	statistic	df	Pr_boot..Chisq.
m0	9	9952.992	10002.11	-4967.496	9934.992	NA	NA	NA
mA	11	9953.793	10013.83	-4965.897	9931.793	3.199347	2	0.144

Inference for fixed effects (1/2)

The output for multilevel models do not contain p-values for the coefficients of fixed effects
The exact distribution of the test statistic under the null hypothesis (no fixed effect) is unknown, because the exact degrees of freedom are unknown
- Finding suitable approximations is an area of ongoing research
In the tidy function, with its variant from broom.mixed, you can ask for conf.int = T to get confidence intervals.
We can look up how that works in the packages vignette. Let’s look up the package!

Inference for fixed effects (2/2)

We can use likelihood ratio test with an approximate \(\chi^2\) distribution to test these effects, since we’re not testing on the boundary and fixed effects do not have limited ranges
- Some research suggests the p-values are too low but approximations are generally pretty good
- Can also calculate the p-values using parametric bootstrap approach
Let’s test whether schPctFree should be included in the current model using likelihood ratio test with the \(\chi^2\) distribution and the parametric bootstrap.

Methods to conduct inference for individual coefficients

Use the t-value \(\big(\frac{estimate}{std.error}\big)\) in the model output
- General rule: Coefficients with |t-value| > 2 considered to be statistically significant, i.e., different from 0
The confidence intervals given by tidy are not proven to be good
- We better use multiple approaches and see if they agree.
Calculate confidence intervals using nonparametric bootstrapping

Nonparametric bootstrapping

1️⃣ Take a sample, with replacement, of size \(n\) (the size of the original data) (called case resampling).

2️⃣ Fit the model to obtain estimates of the coefficients.

3️⃣ Repeat steps 1 - 2 many times (~ 1000) to obtain the bootstrap distribution.

4️⃣ Get the coefficients for the 95% confidence interval by taking the middle 95% of the bootstrap distribution.

Bootstrapped CI for coefficients

confint(reduced_model, method = "boot", level = 0.95, oldNames = F) |>
  kable(digits = 3)

Computing bootstrap confidence intervals ...

	2.5 %	97.5 %
sd_(Intercept)\|schoolid	4.264	4.829
sigma	2.870	3.110
(Intercept)	659.315	661.415
charter	-4.374	-1.395
urban	-1.958	-0.218
schPctfree	-19.523	-16.289
year08	1.259	1.784
charter:year08	0.397	1.639
urban:year08	-0.900	-0.193

Summary: Model comparisons

Methods to compare models with different fixed effects

Likelihood ratio tests based on \(\chi^2\) distribution
Likelihood ratio test based on parametric bootstrapped p-values
AIC or BIC

. . .

Methods to compare models with different variance components

Likelihood ratio test based on parametric bootstrapped p-values
AIC or BIC

Summary: Understanding the model

Methods to understand individual coefficients

Likelihood ratio tests
Bootstrap confidence intervals
Pseudo \(R^2\) for variance components (if meaning is unchanged between models)

. . .

Methods to understand data structure

Calculate intraclass correlation coefficient using unconditional means model

Acknowledgements

The content in the slides is from

BMLR: Chapter 9 - Two-level Longitudinal Data
- Sections 9.1 - 9.6
Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation by Peter Boedeker
Applied longitudinal data analysis: Modeling change and event occurrence. by J.D. Singer and J.B. Willett
- Online copy available through Duke library
Extending the linear model with R. by Julian Faraway
- Online copy available through Duke library
Initial versions of the slides are by Dr. Maria Tackett, Duke University