Homework 2

Likelihoods

Setup

Each of your assignments will begin with the following steps.

Going to our RStudio Server at http://turing.cornellcollege.edu:8787/
Creating a new R project, inside your homework folder on the server, and giving it a sensible name such as homework_2 and having that project in the course folder you created.
Create a new quarto document and give it a sensible name such as hw2.
In the YAML add the following (add what you don’t have). The embed-resources component will make your final rendered html self-contained.
```
---
title: "Document title"
author: "my name"
format:
  html:
embed-resources: true
---
```

Instructions

Be sure to include the relevant R code as well as full sentences answering each of the questions (i.e. if I ask for the average, you can output the answer in R but also write a full sentence with the answer). Be sure to frequently save your files!

Data for the homework will be in the STA363_inst_files -> data folder.

Exercises

All problems are from The main textbook is: Beyond Multiple Linear Regression by Paul Roback and Julie Legler – it is freely available online. Chapters 1-9.. Abbreviated BMLR.

Use the numbering on the left. The codes are for instructor use (Ex: C1).

Exercise 1

(C1) Suppose we plan to use data to estimate one parameter, \(p_B\).
- When using a likelihood to obtain an estimate for the parameter, which is preferred: a large or a small likelihood value? Why?
- The height of a likelihood curve is the probability of the data for the given parameter. The horizontal axis represents different possible parameter values. Does the area under the likelihood curve for an interval from .25 to .75 equal the probability that the true probability of a boy is between 0.25 and 0.75?

Exercise 2

(C2) Suppose the families with an “only child” were excluded for the Sex Conditional Model. How might the estimates for the three parameters be affected? Would it still be possible to perform a Likelihood Ratio Test to compare the Sex Unconditional and Sex Conditional Models? Why or why not?

Exercise 3

(G2) Case 3 In Case 1 we used hypothetical data with 30 boys and 20 girls. Case 2 was a much larger study with 600 boys and 400 girls. Consider Case 3, a hypothetical data set with 6000 boys and 4000 girls.
- Use the methods for Case 1 and Case 2 and determine the MLE for \(p_B\) for the case 3 independence model. Compare your result to the MLEs for Cases 1 and 2.
- Describe how the graph of the log-likelihood for Case 3 would compare to the log-likelihood graphs for Cases 1 and 2.
- Compute the log-likelihood for Case 3. Why is it incorrect to perform an LRT comparing Cases 1, 2, and 3?

Exercise 4

(G3) Write out an expression for the likelihood of seeing our NLSY data (5,416 boys and 5,256 girls) if the true probability of a boy is:
1. \(p_B=0.5\)
2. \(p_B=0.45\)
3. \(p_B= 0.55\)
4. \(p_B= 0.5075\)
- Compute the value of the log-likelihood for each of the values of \(p_B\) above.
- Which of these four possibilities, \(p_B=0.45, p_B=0.5, p_B=0.55,\) or \(p_B=0.5075\) would be the best estimate of \(p_B\) given what we observed (our data)?

Exercise 5

(O2) The hot hand in basketball. @Gilovich1985 wrote a controversial but compelling article claiming that there is no such thing as “the hot hand” in basketball. That is, there is no empirical evidence that shooters have stretches where they are more likely to make consecutive shots, and basketball shots are essentially independent events. One of the many ways they tested for evidence of a “hot hand” was to record sequences of shots for players under game conditions and determine if players are more likely to make shots after made baskets than after misses. For instance, assume we recorded data from one player’s first 5 three-point attempts over a 5-game period. We can assume games are independent, but we’ll consider two models for shots within a game:
- No Hot Hand (1 parameter): \(p_B\) = probability of making a basket (thus \(1-p_B\) = probability of not making a basket).
- Hot Hand (2 parameters): \(p_B\) = probability of making a basket after a miss (or the first shot of a game); \(p_{B|B}\) = probability of making a basket after making the previous shot.
1. Fill out Table @ref(tab:hothandchp2)—write out the contribution of each game to the likelihood for both models along with the total likelihood for each model.
2. Given that, for the No Hot Hand model, \(\textrm{Lik}(p_B)=p_B^{10}(1-p_B)^{15}\) for the 5 games where we collected data, how do we know that 0.40 (the maximum likelihood estimator (MLE) of \(p_B\)) is a better estimate than, say, 0.30?
3. Find the MLEs for the parameters in each model, and then use those MLEs to determine if there’s significant evidence that the hot hand exists.

Data for Open-ended Exercise 2. (B = made basket. M = missed basket.)
Game	First 5 shots	Likelihood (No Hot Hand)	Likelihood (Hot Hand)
1	BMMBB
2	MBMBM
3	MMBBB
4	BMMMB
5	MMMMM
Total

Submission

When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:

Finding the .html file in your File pane (on the bottom right of the screen)
Click the check box next to the file
Click the blue gear above and then click “Export” to download
Submit your final html document to the respective assignment on Moodle