---
: "Document title"
title: "my name"
author:
format:
html-resources: true
embed---
Homework 7
Ch 7: Correlated Data and Ch 8: Multilevel Models
Setup
Each of your assignments will begin with the following steps.
Going to our RStudio Server at http://turing.cornellcollege.edu:8787/
Creating a new R project, inside your homework folder on the server, and giving it a sensible name such as homework_6 and having that project in the course folder you created.
Create a new quarto document and give it a sensible name such as hw6.
In the
YAML
add the following (add what you don’t have). The embed-resources component will make your final renderedhtml
self-contained.
Instructions
Be sure to include the relevant R code as well as full sentences answering each of the questions (i.e. if I ask for the average, you can output the answer in R but also write a full sentence with the answer). Be sure to frequently save your files!
Data for the homework will be in the STA363_inst_files -> data folder.
Exercises
All problems are from The main textbook is: Beyond Multiple Linear Regression by Paul Roback and Julie Legler – it is freely available online. Chapters 1-9. Abbreviated BMLR.
Use the numbering on the left. The codes are for instructor use (Ex: C1).
Part 2: Chapter 8 - Multilevel Models
Exercise 2
- (C1,2,3)@Brown2004 describe “A Hierarchical Linear Model Approach for Assessing the Effects of House and Neighborhood Characteristics on Housing Prices”.
Based on the title of their paper: (a) give the observational units at Level One and Level Two, and (b) list potential explanatory variables at both Level One and Level Two.
In the preceding problem, why can’t we assume all houses in the data set are independent? What would be the potential implications to our analysis of assuming independence among houses?
In the preceding problem, for each of the following sets of predictors:
- write out the two-level model for predicting housing prices,
- write out the corresponding composite model, and
- determine how many model parameters (fixed effects and variance components) must be estimated.
- Predictor set 1: Square footage, number of bedrooms
- Predictor set 2: Median neighborhood income, rating of neighborhood schools
- Predictor set 3: Square footage, number of bedrooms, age of house, median neighborhood housing price
- Predictor set 4: Square footage, median neighborhood income, rating of neighborhood schools, median neighborhood housing price
Exercise 3
- (C6) Why is the contour plot for multivariate normal density in Figure @ref(fig:contour-boundary)(b) tilted from southwest to northeast, but the contour plot in Figure @ref(fig:contour-boundary)(a) is not tilted?
Exercise 4
- (C8) Why is Model A (Section @ref(modela8) in 8.6.2) sometimes called the “unconditional means model”? Why is it also sometimes called the “random intercepts model”? Are these two labels consistent with each other?
Exercise 5
- (C9) Consider adding an indicator variable in Model B (Section @ref(randomslopeandint)) for Small Ensemble performances.
- Write out the two-level model for performance anxiety,
- Write out the corresponding composite model,
- Determine how many model parameters (fixed effects and variance components) must be estimated, and
- Explain how the interpretation for the coefficient in front of Large Ensembles would change.
Exercise 6
- (C10) Give a short rule in your own words describing when an interpretation of an estimated coefficient should “hold constant” another covariate or “set to 0” that covariate (see Section @ref(interp:modeld)).
Exercise 7
- (C14) Interpret other estimated parameters from Model F beyond those interpreted in Section @ref(modelf): \(\hat{\alpha}_{0}\), \(\hat{\alpha}_{2}\), \(\hat{\alpha}_{3}\), \(\hat{\beta}_{0}\), \(\hat{\gamma}_{0}\), \(\hat{\zeta}_{0}\), \(\hat{\rho}_{wx}\), \(\hat{\sigma}^{2}\), \(\hat{\sigma}_{u}^{2}\), and \(\hat{\sigma}_{z}^{2}\).
Exercise 8
- (O1) @Chapp2018 explored 2014 congressional candidates’ ambiguity on political issues in their paper, Going Vague: Ambiguity and Avoidance in Online Political Messaging. They hand-coded a random sample of 2012 congressional candidates’ websites, assigning an ambiguity score. A total of 870 websites from 2014 were then automatically scored using Wordscores, a program designed for political textual analysis. In their paper, they fit a multilevel model for candidates’ ambiguities with predictors at both the candidate and district levels. Some of their hypotheses include that:
- “when incumbents do hazard issue statements, these statements will be marked by a higher degree of clarity.” (Hypothesis 1b)
- “ideological distance [from district residents] will be associated with greater ambiguity.” (Hypothesis 2a)
- “controlling for ideological distance, ideological extremity [of the candidate] should correspond to less ambiguity.” (Hypothesis 2b)
- “more variance in attitudes [among district residents] will correspond to a higher degree of ambiguity in rhetoric” (Hypothesis 3a)
- “a more heterogeneous mix of subgroups [among district residents] will also correspond to a higher degree of ambiguity in rhetoric” (Hypothesis 3b)
ambiguity.csv
. Variables of interest include:ambiguity
= assigned ambiguity score. Higher scores indicate greater clarity (less ambiguity)democrat
= 1 if a Democrat, 0 otherwise (Republican)incumbent
= 1 if an incumbent, 0 otherwiseideology
= a measure of the candidate’s left-right orientation. Higher (positive) scores indicate more conservative candidates and lower (negative) scores indicate more liberal candidates.mismatch
= the distance between the candidate’s ideology and the district’s ideology (candidate ideology scores were regressed against district ideology scores; mismatch values represent the absolute value of the residual associated with each candidate)distID
= the congressional district’s unique IDdistLean
= the district’s political leaning. Higher scores imply more conservative districts.attHeterogeneity
= a measure of the variability of ideologies within the district. Higher scores imply more attitudinal heterogeneity among voters.demHeterogeneity
= a measure of the demographic variability within the district. Higher scores imply more demographic heterogeneity among voters.
With this in mind, fit your own models to address these hypotheses from @Chapp2018. Be sure to use a two-level structure to account for variables at both the candidate and district levels.
Hints: (1) Make sure to conduct an EDA. (2) Build appropriate model(s) that allow you to test these hypothesis using the significance of the predictors.
Submission
When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:
- Finding the .html file in your File pane (on the bottom right of the screen)
- Click the check box next to the file
- Click the blue gear above and then click “Export” to download
- Submit your final html document to the respective assignment on Moodle