BMLR Chapter 2
Cornell College
STA 363 Spring 2025 Block 8
Describe the concept of a likelihood
Construct the likelihood for a simple model
Define the Maximum Likelihood Estimate (MLE) and use it to answer an analysis question
Identify three ways to calculate or approximate the MLE and apply these methods to find the MLE for a simple model
Use likelihoods to compare models (next week)
A likelihood is a function that tells us how likely we are to observe our data for a given parameter value (or values).
Unlike Ordinary Least Squares (OLS), they do not require the responses be independent, identically distributed, and normal (iidN)
They are not the same as probability functions
Probability function: Fixed parameter value(s) + input possible outcomes \(\Rightarrow\) probability of seeing the different outcomes given the parameter value(s)
Likelihood: Fixed data + input possible parameter values \(\Rightarrow\) probability of seeing the fixed data for each parameter value
The data set 04-refs.csv
includes 30 randomly selected NCAA men’s basketball games played in the 2009 - 2010 season.
We will focus on the variables foul1
, foul2
, and foul3
, which indicate which team had a foul called them for the 1st, 2nd, and 3rd fouls, respectively.
H
: Foul was called on the home teamV
: Foul was called on the visiting teamWe are focusing on the first three fouls for this analysis, but this could easily be extended to include all fouls in a game.
[The dataset was derived from basektball0910.csv
used in BMLR Section 11.2
game | date | visitor | hometeam | foul1 | foul2 | foul3 |
---|---|---|---|---|---|---|
166 | 20100126 | CLEM | BC | V | V | V |
224 | 20100224 | DEPAUL | CIN | H | H | V |
317 | 20100109 | MARQET | NOVA | H | H | H |
214 | 20100228 | MARQET | SETON | V | V | H |
278 | 20100128 | SETON | SFL | H | V | V |
We will treat the games as independent in this analysis.
Model 1 (Unconditional Model): What is the probability the referees call a foul on the home team, assuming foul calls within a game are independent?
Model 2 (Conditional Model):
Is there a tendency for the referees to call more fouls on the visiting team or home team?
Is there a tendency for referees to call a foul on the team that already has more fouls?
Ultimately we want to decide which model is better.
What is the probability the referees call a foul on the home team, assuming foul calls within a game are independent?
Let \(p_H\) be the probability the referees call a foul on the home team.
The likelihood for a single observation
\[Lik(p_H) = p_H^{y_i}(1 - p_H)^{n_i - y_i}\]
Where \(y_i\) is the number of fouls called on the home team.
(In this example, we know \(n_i = 3\) for all observations.)
Example
For a single game where the first three fouls are \(H, H, V\), then
\[Lik(p_H) = p_H^{2}(1 - p_H)^{3 - 2} = p_H^{2}(1 - p_H)\]
Foul1 | Foul2 | Foul3 | n | Likelihood Contribution |
---|---|---|---|---|
H | H | H | 3 | \(p_H^3\) |
H | H | V | 2 | \(p_H^2(1 - p_H)\) |
H | V | H | 3 | \(p_H^2(1 - p_H)\) |
H | V | V | 7 | A |
V | H | H | 7 | B |
V | H | V | 1 | \(p_H(1 - p_H)^2\) |
V | V | H | 5 | \(p_H(1 - p_H)^2\) |
V | V | V | 2 | \((1 - p_H)^3\) |
Fill in A and B.
Because the observations (the games) are independent, the likelihood is
\[Lik(p_H) = \prod_{i=1}^{n}p_H^{y_i}(1 - p_H)^{3 - y_i}\]
We will use this function to find the maximum likelihood estimate (MLE). The MLE is the value between 0 and 1 where we are most likely to see the observed data.
A. 0.489
B. 0.500
C. 0.511
D. 0.556
There are three primary ways to find the MLE
✅ Approximate using a graph
✅ Numerical approximation
✅ Using calculus
Specify a finite set of possible values the for \(p_H\) and calculate the likelihood for each value
Find the MLE by taking the first derivative of the likelihood function.
This can be tricky because of the Product Rule, so we can maximize the log(Likelihood) instead. The same value maximizes the likelihood and log(Likelihood)
Since calculus is not a pre-req, we will forgo this quest.
Is there a tendency for the referees to call more fouls on the visiting team or home team?
Is there a tendency for referees to call a foul on the team that already has more fouls?
Define new parameters:
\(p_{H|N}\): Probability referees call foul on home team given there are equal numbers of fouls on the home and visiting teams
\(p_{H|H Bias}\): Probability referees call foul on home team given there are more prior fouls on the home team
\(p_{H|V Bias}\): Probability referees call foul on home team given there are more prior fouls on the visiting team
Foul1 | Foul2 | Foul3 | n | Likelihood Contribution |
---|---|---|---|---|
H | H | H | 3 | \((p_{H\vert N})(p_{H\vert H Bias})(p_{H\vert H Bias}) = (p_{H\vert N})(p_{H\vert H Bias})^2\) |
H | H | V | 2 | \((p_{H\vert N})(p_{H\vert H Bias})(1 - p_{H\vert H Bias})\) |
H | V | H | 3 | \((p_{H\vert N})(1 - p_{H\vert H Bias})(p_{H\vert N}) = (p_{H\vert N})^2(1 - p_{H\vert H Bias})\) |
H | V | V | 7 | A |
V | H | H | 7 | B |
V | H | V | 1 | \((1 - p_{H\vert N})(p_{H\vert V Bias})(1 - p_{H\vert N}) = (1 - p_{H\vert N})^2(p_{H\vert V Bias})\) |
V | V | H | 5 | \((1 - p_{H\vert N})(1-p_{H\vert V Bias})(p_{H\vert V Bias})\) |
V | V | V | 2 | \(\begin{aligned}&(1 - p_{H\vert N})(1-p_{H\vert V Bias})(1-p_{H\vert V Bias})\\ &=(1 - p_{H\vert N})(1-p_{H\vert V Bias})^2\end{aligned}\) |
Fill in A and B
\[\begin{aligned}Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias}) &= [(p_{H| N})^{25}(1 - p_{H|N})^{23}(p_{H| H Bias})^8 \\ &(1 - p_{H| H Bias})^{12}(p_{H| V Bias})^{13}(1-p_{H|V Bias})^9]\end{aligned}\]
(Note: The exponents sum to 90, the total number of fouls in the data)
\[\begin{aligned}\log (Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias})) &= 25 \log(p_{H| N}) + 23 \log(1 - p_{H|N}) \\ & + 8 \log(p_{H| H Bias}) + 12 \log(1 - p_{H| H Bias})\\ &+ 13 \log(p_{H| V Bias}) + 9 \log(1-p_{H|V Bias})\end{aligned}\]
\(\hat{p}_H\) is greater than \(\hat{p}_{H\vert H Bias}\) and \(\hat{p}_{H \vert V Bias}\)
\(\hat{p}_{H\vert H Bias}\) is greater than \(\hat{p}_H\) and \(\hat{p}_{H \vert V Bias}\)
\(\hat{p}_{H\vert V Bias}\) is greater than \(\hat{p}_H\) and \(\hat{p}_{H \vert V Bias}\)
They are all approximately equal.
\(\hat{p}_H\) is greater than \(\hat{p}_{H\vert H Bias}\)
\(\hat{p}_{H\vert H Bias}\) is greater than \(\hat{p}_H\)
They are approximately equal.
Model 1 (Unconditional Model)
Model 2 (Conditional Model)
Model 1 (Unconditional Model)
\[Lik(p_H) = p_H^{46}(1 - p_H)^{44}\]
Model 2 (Conditional Model)
\[\begin{aligned}Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias}) &= [(p_{H| N})^{25}(1 - p_{H|N})^{23}(p_{H| H Bias})^8 \\ &(1 - p_{H| H Bias})^{12}(p_{H| V Bias})^{13}(1-p_{H|V Bias})^9]\end{aligned}\]
The maximum likelihood estimate (MLE) is the value between 0 and 1 where we are most likely to see the observed data.
Model 1 (Unconditional Model)
Model 2 (Conditional Model)
The likelihood is
\[\begin{aligned}Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias}) &= [(p_{H| N})^{25}(1 - p_{H|N})^{23}(p_{H| H Bias})^8 \\ &(1 - p_{H| H Bias})^{12}(p_{H| V Bias})^{13}(1-p_{H|V Bias})^9]\end{aligned}\]
The log-likelihood is
\[\begin{aligned}\log (Lik(p_{H| N}, p_{H|H Bias}, p_{H |V Bias})) &= 25 \log(p_{H| N}) + 23 \log(1 - p_{H|N}) \\ & + 8 \log(p_{H| H Bias}) + 12 \log(1 - p_{H| H Bias})\\ &+ 13 \log(p_{H| V Bias}) + 9 \log(1-p_{H|V Bias})\end{aligned}\]
We would like to find the MLEs for \(p_{H| N}, p_{H|H Bias}, \text{ and }p_{H |V Bias}\).
We can write a function and do a grid search to find the values that maximize the log-likelihood.
maxloglik<- function(nvals){
#nvals specifies the number of values
phn <- seq(0, 1, length = nvals)
phh <- seq(0, 1, length = nvals)
phv <- seq(0, 1, length = nvals)
loglik <- expand.grid(phn, phh, phv)
colnames(loglik) <- c("phn", "phh", "phv")
loglik <- loglik %>%
mutate(loglik = log(phn^25 * (1 - phn)^23 * phh^8 * (1 - phh)^12 *
phv^13 * (1 - phv)^9))
loglik %>%
arrange(desc(loglik)) %>%
slice(1)
}
optim
function.optim
differs from optimize
in that it can optimize over multiple parameter values (The optimize
function can only optimize over a single parameter value).# Use optim function in R to find the values to maximize the log-likelihood
#set fnscale = -1 to maximize (the default is minimize)
optim(par = start_vals, fn = loglik, control=list(fnscale=-1))
$par
phn phh phv
0.5208272 0.4000361 0.5909793
$value
[1] -61.57319
$counts
function gradient
66 NA
$convergence
[1] 0
$message
NULL
Nested models
Non-nested models
Nested models: Models such that the parameters of the reduced model are a subset of the parameters for a larger model
Example:
\[\begin{aligned}&\text{Model A: }y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon\\ &\text{Model B: }y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \epsilon\end{aligned}\]
Model A is nested in Model B. We could use likelihoods to test whether it is useful to add \(x_3\) and \(x_4\) to the model.
\[\begin{aligned}&H_0: \beta_3 = \beta_4 = 0 \\ &H_a: \text{ at least one }\beta_j \text{ is not equal to 0}\end{aligned}\]
Another way to think about nested models: Parameters in larger model can be equated to get the simpler model or if some parameters can be set to constants
Example:
\[\begin{aligned}&\text{Model 1: }p_H \\ &\text{Model 2: }p_{H| N}, p_{H| H Bias}, p_{H| V Bias}\end{aligned}\]
Model 1 is nested in Model 2. The parameters \(p_{H| N}\), \(p_{H|H Bias}\), and \(p_{H |V Bias}\) can be set equal to \(p_H\) to get Model 1.
\[\begin{aligned}&H_0: p_{H| N} = p_{H| H Bias} = p_{H| V Bias} = p_H \\ &H_a: \text{At least one of }p_{H| N}, p_{H| H Bias}, p_{H| V Bias} \text{ differs from the others}\end{aligned}\]
1️⃣ Find the MLEs for each model.
2️⃣ Plug the MLEs into the log-likelihood function for each model to get the maximum value of the log-likelihood for each model.
3️⃣ Find the difference in the maximum log-likelihoods
4️⃣ Use the Likelihood Ratio Test to determine if the difference is statistically significant
Find the MLEs for each model and plug them into the log-likelihood functions.
Model 1:
. . .
Find the difference in the log-likelihoods
Is the difference in the maximum log-likelihoods statistically significant?
Test statistic
\[\begin{aligned} LRT &= 2[\max\{\log(Lik(\text{larger model}))\} - \max\{\log(Lik(\text{reduced model}))\}]\\[10pt] &= 2\log\Bigg(\frac{\max\{(Lik(\text{larger model})\}}{\max\{(Lik(\text{reduced model})\}}\Bigg)\end{aligned}\]
LRT follows a \(\chi^2\) distribution where the degrees of freedom equal the difference in the number of parameters between the two models
The test statistic follows a \(\chi^2\) distribution with 2 degrees of freedom. Therefore, the p-value is \(P(\chi^2 > LRT)\).
The p-value is very large, so we fail to reject \(H_0\). We do not have convincing evidence that the conditional model is an improvement over the unconditional model. Therefore, we can stick with the unconditional model.
AIC = -2(max log-likelihood) + 2p
. . .
Likelihoods help us answer the question of how likely we are to observe the data given different parameters
In this example, we did not consider covariates, so in practice the parameters we want to estimate will look more similar to this
\[p_H = \frac{e^{\beta_0 + \beta_1x_1 + \dots + \beta_px_p}}{1 + e^{\beta_0 + \beta_1x_1 + \dots + \beta_px_p}}\]
These slides are based on content in BMLR: Chapter 1 - Review of Multiple Linear Regression
Initial versions of the slides are by Dr. Maria Tackett, Duke University