---
: "GIVE YOUR EXAM A TITLE"
title: "PLACE YOUR NAME"
author:
format:
html-resources: true
embed-fold: true
code-summary: "Show the code"
code---
STA 363 Fall 2024
Take-Home Final Exam Instructions
Due: Wednesday, September 18th, at 12 pm
Rules
Your solutions must be written up in a Quarto (qmd) file and then rendered into an HTML file called
exam-02.html
. This file must include your code, output, and write-up for each question. Please use thehead
function when printing any result with over 20 rows.This exam is open-book, open-internet, closed to other people. You may use any online or book-based resource you would like, but you must include citations for any code you use (directly or indirectly). You may not consult anyone about this exam other than the Professor. You cannot ask direct questions online, or consult other students even for a hypothetical question.
You will be required to upload the HTML file from your output. Technical difficulties are not an excuse for late work - do not wait until the last minute. Verify your HTML file includes all graphs and tables before uploading to Moodle. Use the embedded resources option in the YAML. If I have you to re-render, and resubmit, you will lose 5 points on the exam. Use the following YAML in your document so images are embedded:
Your analysis’, outputs, and narratives, not your code, should answer the questions. I have enabled fold folding so your code will stay collapsed in your rendered documents.
The in-class exam is worth
50 points
. The take-home is worth100 points
.
Data
A student research team at St. Olaf College contributed to the efforts of biologist Dr. Kathy Shea to investigate a rich data set concerning forestation in the surrounding land. Tubes were placed on trees in some locations or transects but not in others. Interest centers on whether tree growth is affected by the presence of tubes. The data is currently stored in long format in treetube.csv
. Each row represents one tree in a given year. Key variables include:
- `id`: id of individual tree
- `transect`: The id of the transect housing the tree
- `tubex`: 1 if the tree had a tube, 0 if not
- `species`: The tree's species
- `year`: year of the observation
- `height`: The tree's height in meters
Questions
Complete an exploratory data analysis. The code provided should be useful but it is not exhaustive (more may be needed). You do not need to replicate any of the included EDA in your script. Please reference the included plots using the figure numbers on the bottom right.
Fit two level models to predict tree height. Start with an unconditional means model with
species
as your grouping variable. Build more appropriate two level models but make sure to explain your variable choices by using your EDA.species
should remain as your grouping variable.Compare the models and choose the best model. Include a Likelihood Ratio Test using the parametric bootstrap (use the function provided), as well as other metrics.
Assess your final chosen model. Check model conditions and any use limitations.
Interpret coefficients from your chosen model. Interpret all fixed effects of variables of your best model using bootstrap confidence intervals. Interpret the estimates of the variance components (no confidence interval is needed).
Submission
When you are finished with your exam, be sure to Render the final document. Once rendered, you can download your file by:
- Finding the .html file in your File pane (on the bottom right of the screen)
- Click the check box next to the file
- Click the blue gear above and then click “Export” to download
- Submit your final html document to the exam spot on Moodle
Code
Setup and Function
library(tidyverse)
library(tidymodels)
library(broom.mixed)
library(ggridges)
library(DataExplorer)
<- function(mA, m0, B=1000){
bootstrapAnova <- function(m0, mA){
oneBootstrap <- drop(simulate(m0))
d <-refit(mA, newresp=d)
m2 <-refit(m0, newresp=d)
m1 return(anova(m2,m1)$Chisq[2])
} <- replicate(B, oneBootstrap(m0, mA))
nulldist <- anova(mA, m0)
ret $"Pr(>Chisq)"[2] <- mean(ret$Chisq[2] < nulldist)
retnames(ret)[8] <- "Pr_boot(>Chisq)"
attr(ret, "heading") <- c(attr(ret, "heading")[1],
paste("Parametric bootstrap with", B,"samples."),
attr(ret, "heading")[-1])
attr(ret, "nulldist") <- nulldist
return(ret)
}
Data
<- read_csv("data/treetube_exam.csv") |> janitor::clean_names() treeTubes
Rows: 4643 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): species
dbl (5): id, transect, tubex, year, height
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
EDA Code
introduce(treeTubes)
# A tibble: 1 × 9
rows columns discrete_columns continuous_columns all_missing_columns
<int> <int> <int> <int> <int>
1 4643 6 1 5 0
# ℹ 4 more variables: total_missing_values <int>, complete_rows <int>,
# total_observations <int>, memory_usage <dbl>
glimpse(treeTubes)
Rows: 4,643
Columns: 6
$ id <dbl> 147, 869, 338, 1125, 569, 1215, 337, 974, 136, 1875, 1288, 61…
$ transect <dbl> 5, 26, 9, 17, 22, 9, 9, 5, 5, 9, 14, 26, 1, 22, 5, 9, 22, 5, …
$ species <chr> "Bur Oak", "Red Maple", "Black Walnut", "Sugar Maple", "Black…
$ tubex <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ year <dbl> 1999, 1999, 1990, 2002, 2002, 1993, 1999, 2006, 1995, 2002, 2…
$ height <dbl> 1.940, 2.560, 0.235, 1.000, 2.150, 0.846, 1.560, 6.300, 0.363…
#transect/height
ggplot(treeTubes) +geom_density(aes(x = height)) +
facet_wrap(~transect)+
labs(caption = "Figure 1")
ggplot(treeTubes) +
geom_boxplot(aes(x = as.factor(transect), y = height)) +
coord_flip()+
labs(caption = "Figure 2")
#tubex/height
ggplot(treeTubes) +
geom_density_ridges(aes(x = height, y = as.factor(tubex),fill=as.factor(tubex)))+
labs(caption = "Figure 3")
Picking joint bandwidth of 0.347
ggplot(treeTubes) +
geom_boxplot(aes(x = as.factor(tubex), y = height)) +
coord_flip()+
labs(caption = "Figure 4")
#year/height
ggplot(treeTubes) +
geom_density_ridges(aes(x = height, y = as.factor(year),fill=as.factor(year)))+
labs(caption = "Figure 5")
Picking joint bandwidth of 0.277
#species/height
ggplot(treeTubes) +
geom_density(aes(x = height)) +
facet_wrap(~species)+
labs(caption = "Figure 6")
# Repeat all with log height
ggplot(treeTubes) +
geom_density(aes(x = log(height))) +
facet_wrap(~transect)+
labs(caption = "Figure 7")
ggplot(treeTubes) +
geom_boxplot(aes(x = as.factor(transect), y=log(height))) +
coord_flip()+
labs(caption = "Figure 8")
ggplot(treeTubes) +
geom_density_ridges(aes(x = log(height), y = as.factor(tubex), fill = as.factor(tubex)))+
labs(caption = "Figure 9")
Picking joint bandwidth of 0.22
ggplot(treeTubes) +
geom_boxplot(aes(x = as.factor(tubex), y = log(height))) +
coord_flip()+
labs(caption = "Figure 10")
ggplot(treeTubes) +
geom_density_ridges(aes(x = log(height), y = as.factor(year),fill = as.factor(year)))+
labs(caption = "Figure 11")
Picking joint bandwidth of 0.13
ggplot(treeTubes) +
geom_density(aes(x = log(height))) +
facet_wrap(~species)+
labs(caption = "Figure 12")
ggplot(treeTubes) +
geom_density_ridges(aes(x = log(height), y = species,fill=species))+
labs(caption = "Figure 13")
Picking joint bandwidth of 0.283
plot_missing(treeTubes)
plot_correlation(treeTubes,type = "continuous")