---
: "Document title"
title: "my name"
author:
format:
html-resources: true
embed---
Homework 8
Chapter 9: Two-level Longitudinal Data
Setup
Each of your assignments will begin with the following steps.
Going to our RStudio Server at http://turing.cornellcollege.edu:8787/
Creating a new R project, inside your homework folder on the server, and giving it a sensible name such as homework_8 and having that project in the course folder you created.
Create a new quarto document and give it a sensible name such as hw8.
In the
YAML
add the following (add what you don’t have). The embed-resources component will make your final renderedhtml
self-contained.
Instructions
Be sure to include the relevant R code as well as full sentences answering each of the questions (i.e. if I ask for the average, you can output the answer in R but also write a full sentence with the answer). Be sure to frequently save your files!
Data for the homework will be in the STA363_inst_files -> data folder.
Exercises
All problems are from The main textbook is: Beyond Multiple Linear Regression by Paul Roback and Julie Legler – it is freely available online. Chapters 1-9. Abbreviated BMLR.
Use the numbering on the left. The codes are for instructor use (Ex: C1).
- (C1-5) Walker and Barnes (2001) describe, “Ethnic differences in the effect of parenting on gang involvement and gang delinquency: a longitudinal, hierarchical linear modeling perspective”. In this study, 300 ninth graders from one high school in an urban southeastern city were assessed at the beginning of the school year about their gang activity, the gang activity of their peers, behavior of their parents, and their ethnic and cultural heritage. Then, information about their gang activity was collected at 7 additional occasions during the school year.
- For this study:
- Give the observational units at Level One and Level Two
- List potential explanatory variables at both Level One and Level Two.
Describe the difference between the wide and long formats for longitudinal data in this study.
Describe scenarios or research questions in which a lattice plot would be more informative than a spaghetti plot, and other scenarios or research questions in which a spaghetti plot would be preferable to a lattice plot.
Walker-Barnes and Mason summarize their analytic approach in the following way, where HLM = hierarchical linear models, a synonym for multilevel models:
The first series [of analyses] tested whether there was overall change and/or significant individual variability in gang [activity] over time, regardless of parenting behavior, peer behavior, or ethnic and cultural heritage. Second, given the well documented relation between peer and adolescent behavior . . . HLM analyses were conducted examining the effect of peer gang [activity] on [initial gang activity and] changes in gang [activity] over time. Finally, four pairs of analyses were conducted examining the role of each of the four parenting variables on [initial gang activity and] changes in gang [activity].
The last series of analyses controlled for peer gang activity and ethnic and cultural heritage, in addition to examining interactions between parenting and ethnic and cultural heritage.
Although the authors examined four parenting behaviors—behavioral control, lax control, psychological control, and parental warmth—they did so one at a time, using four separate multilevel models. Based on their description, write out a sample model from each of the three steps in the series. For each model, (a) write out the two-level model for predicting gang activity, (b) write out the corresponding composite model, and (c) determine how many model parameters (fixed effects and variance components) must be estimated.
Table 1 shows a portion of Table 2: Results of Hierarchical Linear Modeling Analyses Modeling Gang Involvement from Walker and Barnes (2001). Provide interpretations of significant coefficients in context.
Predictor | Coefficient | SE |
---|---|---|
Intercept (initial status) | ||
Base (intercept for predicting int term) | -.219 | .160 |
Peer behavior | .252** | .026 |
Black ethnicity | .671* | .289 |
White/Other ethnicity | .149 | .252 |
Parenting | .076 | .050 |
Black ethnicity X parenting | -.161+ | .088 |
White/Other ethnicity X parenting | -.026 | .082 |
Slope (change) | ||
Base (intercept for predicting slope term) | .028 | .030 |
Peer behavior | -.011* | .005 |
Black ethnicity | -.132* | .054 |
White/Other ethnicity | -.059 | .046 |
Parenting | -.015+ | .009 |
Black ethnicity X parenting | .048** | .017 |
White/Other ethnicity X parenting | .016 | .015 |
Table 1: These columns focus on the parenting behavior of psychological control. | ||
Table reports values for coefficients in the final model with all | ||
variables entered. * p<.05; ** p<.01; + p<.10 |
(C6) Differences exist in both sets of boxplots in Figure 9.12. What do these differences imply for multilevel modeling?
(C7) What implications do the scatterplots in Figures 9.14 (b) and (c) have for multilevel modeling? What implications does the boxplot in Figure 9.14 (a) have?
(C8) What are the implications of Figure 9.15 for multilevel modeling?
(C11) In Chapter 8 Model B is called the “random slopes and intercepts model”, while in this chapter Model B is called the “unconditional growth model”. Are these models essentially the same or systematically different? Explain.
(C12) In Section 9.5.2, why don’t we examine the pseudo R-squared value for Level Two?
(G1) Curran (1997) collected data on 82 adolescents at three time points starting at age 14 to assess factors that affect teen drinking behavior. Key variables in the data set
alcohol.csv
(accessed via (Singer, 2003) are as follows:id
= numerical identifier for subjectage
= 14, 15, or 16coa
= 1 if the teen is a child of an alcoholic parent; 0 otherwisemale
= 1 if male; 0 if femalepeer
= a measure of peer alcohol use, taken when each subject was 14. This is the square root of the sum of two 6-point items about the proportion of friends who drink occasionally or regularly.alcuse
= the primary response. Four items—(a) drank beer or wine, (b) drank hard liquor, (c) 5 or more drinks in a row, and (d) got drunk—were each scored on an 8-point scale, from 0=“not at all” to 7=“every day”. Thenalcuse
is the square root of the sum of these four items.
Primary research questions included: Do trajectories of alcohol use differ by parental alcoholism? Do trajectories of alcohol use differ by peer alcohol use?
Identify Level One and Level Two predictors.
Perform a quick EDA. What can you say about the shape of
alcuse
, and the relationship betweenalcuse
andcoa
,male
, andpeer
? Appeal to plots and summary statistics in making your statements.Generate a plot as in Figure 9.4 with alcohol use over time for all 82 subjects. Comment.
Generate three spaghetti plots with loess fits similar to Figure 9.7 (one for
coa
, one formale
, and one after creating a binary variable frompeer
). Comment on what you can conclude from each plot.Fit a linear trend to the data from each of the 82 subjects using
age
as the time variable. Generate histograms as in Figure 9.10 showing the results of these 82 linear regression lines, and generate pairs of boxplots as in Figure 9.12 forcoa
andmale
. No commentary necessary. [Hint: to produce Figure Figure 9.12, you will need a data frame with one observation per subject.]Repeat (e) using centered age (
age14 = age - 14
) as the time variable. Also generate a pair of scatterplots as in Figure 9.14 for peer alcohol use. Comment on trends you observe in these plots. [Hint: after formingage14
, append it to your current data frame.]Discuss similarities and differences between (e) and (f). Why does using
age14
as the time variable make more sense in this example?(Model A) Run an unconditional means model. Report and interpret the intraclass correlation coefficient.
(Model B) Run an unconditional growth model with
age14
as the time variable at Level One. Report and interpret estimated fixed effects, using proper notation. Also report and interpret a pseudo R-squared value.(Model C) Build upon the unconditional growth model by adding the effects of having an alcoholic parent and peer alcohol use in both Level Two equations. Report and interpret all estimated fixed effects, using proper notation.
(Model D) Remove the child of an alcoholic indicator variable as a predictor of slope in Model C (it will still be a predictor of intercept). Write out Model D as both a two-level and a composite model using proper notation (including error distributions); how many parameters (fixed effects and variance components) must be estimated? Compare Model D to Model C using an appropriate method and state a conclusion.
Submission
When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:
- Finding the .html file in your File pane (on the bottom right of the screen)
- Click the check box next to the file
- Click the blue gear above and then click “Export” to download
- Submit your final html document to the respective assignment on Moodle