Homework 8

Chapter 9: Two-level Longitudinal Data

Setup

Each of your assignments will begin with the following steps.

Going to our RStudio Server at http://turing.cornellcollege.edu:8787/
Creating a new R project, inside your homework folder on the server, and giving it a sensible name such as homework_8 and having that project in the course folder you created.
Create a new quarto document and give it a sensible name such as hw8.
In the YAML add the following (add what you don’t have). The embed-resources component will make your final rendered html self-contained.

---
title: "Document title"
author: "my name"
format:
  html:
    embed-resources: true
---

Instructions

Be sure to include the relevant R code as well as full sentences answering each of the questions (i.e. if I ask for the average, you can output the answer in R but also write a full sentence with the answer). Be sure to frequently save your files!

Data for the homework will be in the STA363_inst_files -> data folder.

Exercises

All problems are from The main textbook is: Beyond Multiple Linear Regression by Paul Roback and Julie Legler – it is freely available online. Chapters 1-9. Abbreviated BMLR.

Use the numbering on the left. The codes are for instructor use (Ex: C1).

(C1-5) Walker and Barnes (2001) describe, “Ethnic differences in the effect of parenting on gang involvement and gang delinquency: a longitudinal, hierarchical linear modeling perspective”. In this study, 300 ninth graders from one high school in an urban southeastern city were assessed at the beginning of the school year about their gang activity, the gang activity of their peers, behavior of their parents, and their ethnic and cultural heritage. Then, information about their gang activity was collected at 7 additional occasions during the school year.

For this study:

Give the observational units at Level One and Level Two
List potential explanatory variables at both Level One and Level Two.

Describe the difference between the wide and long formats for longitudinal data in this study.
Describe scenarios or research questions in which a lattice plot would be more informative than a spaghetti plot, and other scenarios or research questions in which a spaghetti plot would be preferable to a lattice plot.
Walker-Barnes and Mason summarize their analytic approach in the following way, where HLM = hierarchical linear models, a synonym for multilevel models:

The first series [of analyses] tested whether there was overall change and/or significant individual variability in gang [activity] over time, regardless of parenting behavior, peer behavior, or ethnic and cultural heritage. Second, given the well documented relation between peer and adolescent behavior . . . HLM analyses were conducted examining the effect of peer gang [activity] on [initial gang activity and] changes in gang [activity] over time. Finally, four pairs of analyses were conducted examining the role of each of the four parenting variables on [initial gang activity and] changes in gang [activity].

The last series of analyses controlled for peer gang activity and ethnic and cultural heritage, in addition to examining interactions between parenting and ethnic and cultural heritage.

Although the authors examined four parenting behaviors—behavioral control, lax control, psychological control, and parental warmth—they did so one at a time, using four separate multilevel models. Based on their description, write out a sample model from each of the three steps in the series. For each model, (a) write out the two-level model for predicting gang activity, (b) write out the corresponding composite model, and (c) determine how many model parameters (fixed effects and variance components) must be estimated.
Table 1 shows a portion of Table 2: Results of Hierarchical Linear Modeling Analyses Modeling Gang Involvement from Walker and Barnes (2001). Provide interpretations of significant coefficients in context.

A portion of Table 2: Results of Hierarchical Linear Modeling Analyses Modeling Gang Involvement from Walker-Barnes and Mason (2001).
Predictor	Coefficient	SE
Intercept (initial status)
Base (intercept for predicting int term)	-.219	.160
Peer behavior	.252**	.026
Black ethnicity	.671*	.289
White/Other ethnicity	.149	.252
Parenting	.076	.050
Black ethnicity X parenting	-.161+	.088
White/Other ethnicity X parenting	-.026	.082
Slope (change)
Base (intercept for predicting slope term)	.028	.030
Peer behavior	-.011*	.005
Black ethnicity	-.132*	.054
White/Other ethnicity	-.059	.046
Parenting	-.015+	.009
Black ethnicity X parenting	.048**	.017
White/Other ethnicity X parenting	.016	.015
Table 1: These columns focus on the parenting behavior of psychological control.
Table reports values for coefficients in the final model with all
variables entered. * p<.05; ** p<.01; + p<.10

(C6) Differences exist in both sets of boxplots in Figure 9.12. What do these differences imply for multilevel modeling?
(C7) What implications do the scatterplots in Figures 9.14 (b) and (c) have for multilevel modeling? What implications does the boxplot in Figure 9.14 (a) have?
(C8) What are the implications of Figure 9.15 for multilevel modeling?
(C11) In Chapter 8 Model B is called the “random slopes and intercepts model”, while in this chapter Model B is called the “unconditional growth model”. Are these models essentially the same or systematically different? Explain.
(C12) In Section 9.5.2, why don’t we examine the pseudo R-squared value for Level Two?
(G1) Curran (1997) collected data on 82 adolescents at three time points starting at age 14 to assess factors that affect teen drinking behavior. Key variables in the data set alcohol.csv (accessed via (Singer, 2003) are as follows:
- id = numerical identifier for subject
- age = 14, 15, or 16
- coa = 1 if the teen is a child of an alcoholic parent; 0 otherwise
- male = 1 if male; 0 if female
- peer = a measure of peer alcohol use, taken when each subject was 14. This is the square root of the sum of two 6-point items about the proportion of friends who drink occasionally or regularly.
- alcuse = the primary response. Four items—(a) drank beer or wine, (b) drank hard liquor, (c) 5 or more drinks in a row, and (d) got drunk—were each scored on an 8-point scale, from 0=“not at all” to 7=“every day”. Then alcuse is the square root of the sum of these four items.

Primary research questions included: Do trajectories of alcohol use differ by parental alcoholism? Do trajectories of alcohol use differ by peer alcohol use?

Identify Level One and Level Two predictors.
Perform a quick EDA. What can you say about the shape of alcuse, and the relationship between alcuse and coa, male, and peer? Appeal to plots and summary statistics in making your statements.
Generate a plot as in Figure 9.4 with alcohol use over time for all 82 subjects. Comment.
Generate three spaghetti plots with loess fits similar to Figure 9.7 (one for coa, one for male, and one after creating a binary variable from peer). Comment on what you can conclude from each plot.
Fit a linear trend to the data from each of the 82 subjects using age as the time variable. Generate histograms as in Figure 9.10 showing the results of these 82 linear regression lines, and generate pairs of boxplots as in Figure 9.12 for coa and male. No commentary necessary. [Hint: to produce Figure Figure 9.12, you will need a data frame with one observation per subject.]
Repeat (e) using centered age (age14 = age - 14) as the time variable. Also generate a pair of scatterplots as in Figure 9.14 for peer alcohol use. Comment on trends you observe in these plots. [Hint: after forming age14, append it to your current data frame.]
Discuss similarities and differences between (e) and (f). Why does using age14 as the time variable make more sense in this example?
(Model A) Run an unconditional means model. Report and interpret the intraclass correlation coefficient.
(Model B) Run an unconditional growth model with age14 as the time variable at Level One. Report and interpret estimated fixed effects, using proper notation. Also report and interpret a pseudo R-squared value.
(Model C) Build upon the unconditional growth model by adding the effects of having an alcoholic parent and peer alcohol use in both Level Two equations. Report and interpret all estimated fixed effects, using proper notation.
(Model D) Remove the child of an alcoholic indicator variable as a predictor of slope in Model C (it will still be a predictor of intercept). Write out Model D as both a two-level and a composite model using proper notation (including error distributions); how many parameters (fixed effects and variance components) must be estimated? Compare Model D to Model C using an appropriate method and state a conclusion.

Submission

When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:

Finding the .html file in your File pane (on the bottom right of the screen)
Click the check box next to the file
Click the blue gear above and then click “Export” to download
Submit your final html document to the respective assignment on Moodle