Cornell College
STA 364 Spring 2025 Block 5
\[y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t.\]
That is, the coefficients measure the .
Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-2.58236 -0.27777 0.01862 0.32330 1.42229
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54454 0.05403 10.079 < 2e-16 ***
Income 0.27183 0.04673 5.817 2.4e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08
For forecasting purposes, we require the following assumptions:
\(\varepsilon_t\) have mean zero and are uncorrelated.
\(\varepsilon_t\) are uncorrelated with each \(x_{j,t}\).
It is useful to also have \(\varepsilon_t \sim \text{N}(0,\sigma^2)\) when producing prediction intervals or doing statistical tests.
\[\sum_{t=1}^T \varepsilon_t^2 = \sum_{t=1}^T (y_t - \beta_{0} - \beta_{1} x_{1,t} - \beta_{2} x_{2,t} - \cdots - \beta_{k} x_{k,t})^2\]
model(TSLM(y ~ x_1 + x_2 + ... + x_k))
fit_consMR <- us_change |>
model(lm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)
Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-0.90555 -0.15821 -0.03608 0.13618 1.15471
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
Income 0.740583 0.040115 18.461 < 2e-16 ***
Production 0.047173 0.023142 2.038 0.0429 *
Unemployment -0.174685 0.095511 -1.829 0.0689 .
Savings -0.052890 0.002924 -18.088 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3102 on 193 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
\[\hat{y}_t = \hat\beta_{0} + \hat\beta_{1} x_{1,t} + \hat\beta_{2} x_{2,t} + \cdots + \hat\beta_{k} x_{k,t}\]
augment(fit_consMR) |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y = NULL, title = "Percent change in US consumption expenditure") +
scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00")) +
guides(colour = guide_legend(title = NULL))
Coefficient of determination
\[R^2 = \frac{\sum(\hat{y}_{t} - \bar{y})^2}{\sum(y_{t}-\bar{y})^2}\]
Standard error of the regression
\[\hat{\sigma}_e=\sqrt{\frac{1}{T-k-1}\sum_{t=1}^{T}{e_t^2}}\]
where \(k\) is the number of predictors in the model.
For forecasting purposes, we require the following assumptions:
\(\varepsilon_t\) are uncorrelated and zero mean
\(\varepsilon_t\) are uncorrelated with each \(x_{j,t}\).
It is useful to also have \(\varepsilon_t \sim \text{N}(0,\sigma^2)\) when producing prediction intervals or doing statistical tests.
Useful for spotting outliers and whether the linear model was appropriate.
Scatterplot of residuals \(\varepsilon_t\) against each predictor \(x_{j,t}\).
Scatterplot residuals against the fitted values \(\hat y_t\)
Expect to see scatterplots resembling a horizontal band with no values too far from the band and no patterns such as curvature or increasing spread.
If a plot of the residuals vs any predictor in the model shows a pattern, then the relationship is nonlinear.
If a plot of the residuals vs any predictor not in the model shows a pattern, then the predictor should be added to the model.
If a plot of the residuals vs fitted values shows a pattern, then there is heteroscedasticity in the errors. (Could try a transformation.)
Linear trend
\[x_t = t\]
If a categorical variable takes only two values (e.g., Yes
, or No
), then an equivalent numerical variable can be constructed taking value 1 if yes and 0 if no. This is called a dummy variable.
Variable | dummy |
---|---|
Yes | 1 |
Yes | 1 |
No | 0 |
Yes | 1 |
No | 0 |
No | 0 |
Yes | 1 |
Yes | 1 |
No | 0 |
No | 0 |
If there are more than two categories, then the variable can be coded using several dummy variables (one fewer than the total number of categories).
Day | d1 | d2 | d3 | d4 |
---|---|---|---|---|
Monday | 1 | 0 | 0 | 0 |
Tuesday | 0 | 1 | 0 | 0 |
Wednesday | 0 | 0 | 1 | 0 |
Thursday | 0 | 0 | 0 | 1 |
Friday | 0 | 0 | 0 | 0 |
Monday | 1 | 0 | 0 | 0 |
Tuesday | 0 | 1 | 0 | 0 |
Wednesday | 0 | 0 | 1 | 0 |
Thursday | 0 | 0 | 0 | 1 |
Friday | 0 | 0 | 0 | 0 |
Using one dummy for each category gives too many dummy variables!
The regression will then be singular and inestimable.
Either omit the constant, or omit the dummy for one category.
The coefficients of the dummies are relative to the omitted category.
Seasonal dummies
Outliers
Public holidays
Regression model
\[y_t = \beta_0 + \beta_1 t + \beta_2d_{2,t} + \beta_3 d_{3,t} + \beta_4 d_{4,t} + \varepsilon_t\]
Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9029 -7.5995 -0.4594 7.9908 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.80044 3.73353 118.333 < 2e-16 ***
trend() -0.34027 0.06657 -5.111 2.73e-06 ***
season()year2 -34.65973 3.96832 -8.734 9.10e-13 ***
season()year3 -17.82164 4.02249 -4.430 3.45e-05 ***
season()year4 72.79641 4.02305 18.095 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16
Spikes
Steps
Change of slope
For monthly data
Lagged values of a predictor.
Example: \(x\) is advertising which has a delayed effect
\[\begin{align*} x_{1} &= \text{advertising for previous month;} \\ x_{2} &= \text{advertising for two months previously;} \\ & \vdots \\ x_{m} &= \text{advertising for $m$ months previously.} \end{align*}\]
Periodic seasonality can be handled using pairs of Fourier terms:
\[s_{k}(t) = \sin\left(\frac{2\pi k t}{m}\right)\qquad c_{k}(t) = \cos\left(\frac{2\pi k t}{m}\right)\]
\[y_t = a + bt + \sum_{k=1}^K \left[\alpha_k s_k(t) + \beta_k c_k(t)\right] + \varepsilon_t\] where \(m\) is the seasonal period.
TSLM(y ~ trend() + fourier(K))
fourier_beer <- recent_production |> model(TSLM(Beer ~ trend() + fourier(K = 2)))
report(fourier_beer)
Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9029 -7.5995 -0.4594 7.9908 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 446.87920 2.87321 155.533 < 2e-16 ***
trend() -0.34027 0.06657 -5.111 2.73e-06 ***
fourier(K = 2)C1_4 8.91082 2.01125 4.430 3.45e-05 ***
fourier(K = 2)S1_4 -53.72807 2.01125 -26.714 < 2e-16 ***
fourier(K = 2)C2_4 -13.98958 1.42256 -9.834 9.26e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16
fit <- aus_cafe |>
model(
K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6))
)
Periodic seasonality can be handled using pairs of Fourier terms: \[s_{k}(t) = \sin\left(\frac{2\pi k t}{m}\right)\qquad c_{k}(t) = \cos\left(\frac{2\pi k t}{m}\right)\] \[y_t = a + bt + \sum_{k=1}^K \left[\alpha_k s_k(t) + \beta_k c_k(t)\right] + \varepsilon_t\]
Computer output for regression will always give the \(R^2\) value. This is a useful summary of the model.
\[R^2 = \frac{\sum(\hat{y}_t - \bar{y})^2}{\sum(y_t-\bar{y})^2}\]
However …
\(R^2\) does not allow for “degrees of freedom”.
Adding any variable tends to increase the value of \(R^2\), even if that variable is irrelevant.
To overcome this problem, we can use adjusted \(R^2\):
\[\bar{R}^2 = 1-(1-R^2)\frac{T-1}{T-k-1}\]
where \(k=\) no. predictors and \(T=\) no. observations.
Note
Maximizing \(\bar{R}^2\) is equivalent to minimizing \(\hat\sigma^2\).
\[\hat{\sigma}^2 = \frac{1}{T-k-1}\sum_{t=1}^T \varepsilon_t^2\]
\[\text{AIC} = -2\log(L) + 2(k+2)\]
where \(L\) is the likelihood and \(k\) is the number of predictors in the model.
For small values of \(T\), the AIC tends to select too many predictors, and so a bias-corrected version of the AIC has been developed.
\[\text{AIC}_{\text{C}} = \text{AIC} + \frac{2(k+2)(k+3)}{T-k-3}\]
As with the AIC, the AIC\(_{\text{C}}\) should be minimized.
\[\text{BIC} = -2\log(L) + (k+2)\log(T)\]
where \(L\) is the likelihood and \(k\) is the number of predictors in the model.
BIC penalizes terms more heavily than AIC
Also called SBIC and SC.
Minimizing BIC is asymptotically equivalent to leave-\(v\)-out cross-validation when \(v = T[1-1/(log(T)-1)]\).
fit <- aus_cafe |>
model(
K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6))
)
glance(fit) |> select(.model, r_squared, adj_r_squared, CV, AICc)
# A tibble: 6 × 5
.model r_squared adj_r_squared CV AICc
<chr> <dbl> <dbl> <dbl> <dbl>
1 K1 0.962 0.962 0.00238 -1085.
2 K2 0.966 0.965 0.00220 -1099.
3 K3 0.976 0.975 0.00157 -1160.
4 K4 0.980 0.979 0.00138 -1183.
5 K5 0.985 0.984 0.00104 -1234.
6 K6 0.985 0.984 0.00105 -1232.
fit_consBest <- us_change |>
model(
TSLM(Consumption ~ Income + Savings + Unemployment)
)
future_scenarios <- scenarios(
Increase = new_data(us_change, 4) |>
mutate(Income = 1, Savings = 0.5, Unemployment = 0),
Decrease = new_data(us_change, 4) |>
mutate(Income = -1, Savings = -0.5, Unemployment = 0),
names_to = "Scenario"
)
fc <- forecast(fit_consBest, new_data = future_scenarios)
\[y_{t+h}=\beta_0+\beta_1x_{1,t}+\dots+\beta_kx_{k,t}+\varepsilon_{t+h}\]
A log-log functional form
\[\log y=\beta_0+\beta_1 \log x +\varepsilon\]
where \(\beta_1\) is interpreted as an elasticity (the average percentage change in \(y\) resulting from a \(1\%\) increase in \(x\)).
\[y=f(x) +\varepsilon\] where \(f\) is a non-linear function.
\[\begin{align*} x_{2} = (x-c)_+ &= \left\{ \begin{array}{ll} 0 & \text{if } x < c\\ x-c & \text{if } x \ge c. \end{array}\right. \end{align*}\]
\[x_1=x~~~x_2=(x-c_1)_+~~~\ldots~~~x_k=(x-c_{k-1})_+\]
where \(c_1,\ldots,c_{k-1}\) are knots.
Warning
Better fit but forecasting outside the range of the historical data is even more unreliable.
Piecewise linear trend with bend at \(\tau\)
\[\begin{align*} x_{1,t} &= t \\ x_{2,t} &= \left\{ \begin{array}{ll} 0 & t <\tau\\ (t-\tau) & t \ge \tau \end{array}\right. \end{align*}\]
Quadratic or higher order trend
\[x_{1,t} =t,\quad x_{2,t}=t^2,\quad \dots\]
Warning
NOT RECOMMENDED
When \(x\) is useful for predicting \(y\), it is not necessarily causing \(y\).
e.g., predict number of swimmers \(y\) using number of ice-creams sold \(x\).
Correlations are useful for forecasting, even when there is no causality.
Better models usually involve causal relationships (e.g., temperature \(x\) and people \(z\) to predict swimmers \(y\)).
In regression analysis, multicollinearity occurs when:
If multicollinearity exists