ISLR Chapter 7
Cornell College
STA 363 Spring 2025 Block 8
What have we used so far to deal with non-linear relationships?
Polynomials!
\[y_i = \beta_0 + \beta_1x_i + \beta_2x_i^2+\beta_3x_i^3 \dots + \beta_dx_i^d+\epsilon_i\]
\[y_i = \beta_0 + \beta_1x_i + \beta_2x_i^2+\beta_3x_i^3 \dots + \beta_dx_i^d+\epsilon_i\]
\[\hat{f}(b) -\hat{f}(a) =\hat\beta_1(b-a) + \hat\beta_2(b^2-a^2)+\hat\beta_3(b^3-a^3)+\hat\beta_4(b^4-a^4)\]
How do you pick \(a\) and \(b\)?
Application Exercise
\[pop = \beta_0 + \beta_1age + \beta_2age^2 + \beta_3age^3 +\beta_4age^4+ \epsilon\]
Using the information below, write out the equation to predicted change in population from a change in age from the 25th percentile (24.5) to a 75th percentile (73.5).
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1672.0854 | 64.5606 | 25.8995 | 0.0000 |
age | -10.6429 | 9.2268 | -1.1535 | 0.2516 |
I(age^2) | -1.1427 | 0.3857 | -2.9627 | 0.0039 |
I(age^3) | 0.0216 | 0.0059 | 3.6498 | 0.0004 |
I(age^4) | -0.0001 | 0.0000 | -3.6540 | 0.0004 |
\[y_i = \beta_0 + \beta_1x_i + \beta_2x_i^2+\beta_3x_i^3 \dots + \beta_dx_i^d+\epsilon_i\]
Why before looking?
Polynomials have notoriously bad tail behavior (so they can be bad for extrapolation)
What does this mean?
Another way to create a transformation is to cut the variable into distinct regions
\[C_1(X) = I(X < 35), C_2(X) = I(35\leq X<65), C_3(X) = I(X \geq 65)\]
\[C_1(X) = I(X < 35), C_2(X) = I(35\leq X<65), C_3(X) = I(X \geq 65)\]
What is the predicted value when \(age = 25\)?
\[C_1(X) = I(X < 15), C_2(X) = I(15\leq X<65), C_3(X) = I(X \geq 65)\]
What is the predicted value when \(age = 25\)?
Instead of a single polynomial in \(X\) over it’s whole domain, we can use different polynomials in regions defined by knots
\[y_i = \begin{cases}\beta_{01}+\beta_{11}x_i + \beta_{21}x^2_i+\beta_{31}x^3_i+\epsilon_i& \textrm{if } x_i < c\\ \beta_{02}+\beta_{12}x_i + \beta_{22}x_i^2 + \beta_{32}x_{i}^3+\epsilon_i&\textrm{if }x_i\geq c\end{cases}\]
What could go wrong here?
A linear spline with knots at \(\xi_k\), \(k = 1,\dots, K\) is a piecewise linear polynomial continuous at each knot
\[y_i = \beta_0 + \beta_1b_1(x_i)+\beta_2b_2(x_i)+\dots+\beta_{K+1}b_{K+1}(x_i)+\epsilon_i\]
Application Exercise
Let’s create data set to fit a linear spline with 2 knots: 35 and 65.
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
\(b_1(x)\) | \(b_2(x)\) | \(b_3(x)\) |
---|---|---|
4 | 0 | 0 |
15 | 0 | 0 |
25 | 0 | 0 |
37 | 2 | 0 |
49 | 14 | 0 |
66 | 31 | 1 |
70 | 35 | 5 |
80 | 45 | 15 |
Application Exercise
Below is a linear regression model fit to include the 3 bases you just created with 2 knots: 35 and 65. Use the information here to draw the relationship between \(x\) and \(y\).
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.3 | 0.2 | -1.3 | 0.3 |
b1 | 2.0 | 0.0 | 231.3 | 0.0 |
b2 | -2.0 | 0.0 | -130.0 | 0.0 |
b3 | -3.0 | 0.0 | -116.5 | 0.0 |
\(b_1(x)\) | \(b_2(x)\) | \(b_3(x)\) |
---|---|---|
4 | 0 | 0 |
15 | 0 | 0 |
25 | 0 | 0 |
37 | 2 | 0 |
49 | 14 | 0 |
66 | 31 | 1 |
70 | 35 | 5 |
80 | 45 | 15 |
A cubic splines with knots at \(\xi_i, k = 1, \dots, K\) is a piecewise cubic polynomial with continuous derivatives up to order 2 at each knot.
Again we can represent this model with truncated power functions
\[y_i = \beta_0 + \beta_1b_1(x_i)+\beta_2b_2(x_i)+\dots+\beta_{K+3}b_{K+3}(x_i) + \epsilon_i\]
\[\begin{align}b_1(x_i)&=x_i\\b_2(x_i)&=x_i^2\\b_3(x_i)&=x_i^3\\b_{k+3}(x_i)&=(x_i-\xi_k)^3_+, k = 1,\dots,K\end{align}\]
where
\[(x_i-\xi_k)^{3}_+=\begin{cases}(x_i-\xi_k)^3&\textrm{if }x_i>\xi_k\\0&\textrm{otherwise}\end{cases}\]
Application Exercise
Let’s create data set to fit a cubic spline with 2 knots: 35 and 65.
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
x |
---|
4 |
15 |
25 |
37 |
49 |
66 |
70 |
80 |
b1 | b2 | b3 | b4 | b5 |
---|---|---|---|---|
4 | 16 | 64 | 0 | 0 |
15 | 225 | 3375 | 0 | 0 |
25 | 625 | 15625 | 0 | 0 |
37 | 1369 | 50653 | 8 | 0 |
49 | 2401 | 117649 | 2744 | 0 |
66 | 4356 | 287496 | 29791 | 1 |
70 | 4900 | 343000 | 42875 | 125 |
80 | 6400 | 512000 | 91125 | 3375 |
newdat <- tibble(
b1 = -100:100,
b2 = b1^2,
b3 = b1^3,
b4 = ifelse(b1 > 35, (b1 - 35)^3, 0),
b5 = ifelse(b1 > 65, (b1 - 65)^3, 0)
)
p <- predict(lm(y ~ b1 + b2 + b3 + b4 + b5, data = d),
newdata = newdat)
ggplot(newdat, aes(x = b1, y = p)) +
geom_point() +
geom_vline(xintercept = c(4, 80), lty = 2) +
labs(x = "X",
y = expression(hat(y)))
A natural cubic spline extrapolates linearly beyond the boundary knots
This adds 4 extra constraints and allows us to put more internal knots for the same degrees of freedom as a regular cubic spline
da <- tibble(
x = newdat$b1,
ns = predict(lm(y ~ splines::ns(b1, knots = c(35, 65)), data = d),
newdata = newdat),
cubic = predict(lm(y ~ b1 + b2 + b3 + b4 + b5, data = d),
newdata = newdat),
linear = predict(lm(y ~ b1 + ifelse(b1>35, b1 - 35, 0) + ifelse(b1>65, b1 - 65, 0), data = d),
newdata = newdat)
) |>
pivot_longer(ns:linear)
da |>
filter(name != "linear") |>
ggplot(aes(x = x, y = value, color = name)) +
geom_point(alpha = 0.5) +
geom_vline(xintercept = c(4, 80), lty = 2) +
labs(x = "X",
y = expression(hat(y)),
color = "Spline")
Here is a comparison of a degree-14 polynomial and natural cubic spline (both have 15 degrees of freedom)
The content in the slides is from