Chapter 7 Time Series Linear Models

Tyler George

Cornell College
STA 364 Spring 2025 Block 5

Setup

library(fpp3)

Multiple regression and forecasting

\[y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t.\]

$y_t$ is the variable we want to predict: the “response” variable
Each $x_{j,t}$ is numerical and is called a “predictor”. They are usually assumed to be known for all past and future times.
The coefficients $\beta_1,\dots,\beta_k$ measure the effect of each predictor after taking account of the effect of all other predictors in the model.

That is, the coefficients measure the .

$\varepsilon_t$ is a white noise error term

Example: US consumption expenditure

us_change |> autoplot(Consumption) +
  geom_line(aes(y = Income, color = "Income")) +
  geom_line(aes(y = Consumption, color = "Consumption")) +
  scale_color_manual(values = c("Income" = "blue", "Consumption" = "red")) +
  labs(y = "% change", color = "Legend")

Example: US consumption expenditure

us_change |> ggplot(aes(x=Income, y=Consumption)) +
  labs(y = "Consumption (quarterly % change)",
       x = "Income (quarterly % change)") +
  geom_point() + geom_smooth(method="lm", se=FALSE)

Example: US consumption expenditure

fit_cons <- us_change |>
  model(lm = TSLM(Consumption ~ Income))
report(fit_cons)

Series: Consumption 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-2.58236 -0.27777  0.01862  0.32330  1.42229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.54454    0.05403  10.079  < 2e-16 ***
Income       0.27183    0.04673   5.817  2.4e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08

Example: US consumption expenditure

us_change |>
  gather("Measure", "Change", Consumption, Income, Production, Savings, Unemployment) |>
  ggplot(aes(x = Quarter, y = Change, colour = Measure)) +
  geom_line() +
  facet_grid(vars(Measure), scales = "free_y") +
  labs(y = "") +
  guides(colour = "none")

Example: US consumption expenditure

us_change |>
  as_tibble() |>
  select(-Quarter) |>
  GGally::ggpairs()

Assumptions for the linear model

For forecasting purposes, we require the following assumptions:

$\varepsilon_t$ have mean zero and are uncorrelated.
$\varepsilon_t$ are uncorrelated with each $x_{j,t}$.

It is useful to also have $\varepsilon_t \sim \text{N}(0,\sigma^2)$ when producing prediction intervals or doing statistical tests.

Least squares estimation

In practice we need to estimate the coefficients: $\beta_0,\beta_1, \dots, \beta_k$.

\[\sum_{t=1}^T \varepsilon_t^2 = \sum_{t=1}^T (y_t - \beta_{0} - \beta_{1} x_{1,t} - \beta_{2} x_{2,t} - \cdots - \beta_{k} x_{k,t})^2\]

model(TSLM(y ~ x_1 + x_2 + ... + x_k))

Estimated coefficients: $\hat\beta_0, \dots, \hat\beta_k$

Example: US consumption expenditure

fit_consMR <- us_change |>
  model(lm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)

Series: Consumption 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-0.90555 -0.15821 -0.03608  0.13618  1.15471 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   0.253105   0.034470   7.343 5.71e-12 ***
Income        0.740583   0.040115  18.461  < 2e-16 ***
Production    0.047173   0.023142   2.038   0.0429 *  
Unemployment -0.174685   0.095511  -1.829   0.0689 .  
Savings      -0.052890   0.002924 -18.088  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3102 on 193 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
F-statistic:   160 on 4 and 193 DF, p-value: < 2.22e-16

Fitted values

\[\hat{y}_t = \hat\beta_{0} + \hat\beta_{1} x_{1,t} + \hat\beta_{2} x_{2,t} + \cdots + \hat\beta_{k} x_{k,t}\]

Code
Plot

augment(fit_consMR) |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Consumption, colour = "Data")) +
  geom_line(aes(y = .fitted, colour = "Fitted")) +
  labs(y = NULL, title = "Percent change in US consumption expenditure") +
  scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00")) +
  guides(colour = guide_legend(title = NULL))

augment(fit_consMR) |>
  ggplot(aes(y = .fitted, x = Consumption)) +
  geom_point() +
  labs(
    y = "Fitted (predicted values)",
    x = "Data (actual values)",
    title = "Percentage change in US consumption expenditure"
  ) +
  geom_abline(intercept = 0, slope = 1)

Goodness of fit

Coefficient of determination

\[R^2 = \frac{\sum(\hat{y}_{t} - \bar{y})^2}{\sum(y_{t}-\bar{y})^2}\]

Standard error of the regression

\[\hat{\sigma}_e=\sqrt{\frac{1}{T-k-1}\sum_{t=1}^{T}{e_t^2}}\]

where $k$ is the number of predictors in the model.

Multiple regression and forecasting

For forecasting purposes, we require the following assumptions:

$\varepsilon_t$ are uncorrelated and zero mean
$\varepsilon_t$ are uncorrelated with each $x_{j,t}$.

It is useful to also have $\varepsilon_t \sim \text{N}(0,\sigma^2)$ when producing prediction intervals or doing statistical tests.

Residual plots

Useful for spotting outliers and whether the linear model was appropriate.

Scatterplot of residuals $\varepsilon_t$ against each predictor $x_{j,t}$.
Scatterplot residuals against the fitted values $\hat y_t$
Expect to see scatterplots resembling a horizontal band with no values too far from the band and no patterns such as curvature or increasing spread.

Residual patterns

If a plot of the residuals vs any predictor in the model shows a pattern, then the relationship is nonlinear.
If a plot of the residuals vs any predictor not in the model shows a pattern, then the predictor should be added to the model.
If a plot of the residuals vs fitted values shows a pattern, then there is heteroscedasticity in the errors. (Could try a transformation.)

Trend

Linear trend

\[x_t = t\]

$t=1,2,\dots,T$
Strong assumption that trend will continue.

Dummy variables

If a categorical variable takes only two values (e.g., Yes, or No), then an equivalent numerical variable can be constructed taking value 1 if yes and 0 if no. This is called a dummy variable.

Variable	dummy
Yes	1
Yes	1
No	0
Yes	1
No	0
No	0
Yes	1
Yes	1
No	0
No	0

Dummy variables

If there are more than two categories, then the variable can be coded using several dummy variables (one fewer than the total number of categories).

Day	d1	d2	d3	d4
Monday	1	0	0	0
Tuesday	0	1	0	0
Wednesday	0	0	1	0
Thursday	0	0	0	1
Friday	0	0	0	0
Monday	1	0	0	0
Tuesday	0	1	0	0
Wednesday	0	0	1	0
Thursday	0	0	0	1
Friday	0	0	0	0

Beware of the dummy variable trap!

Using one dummy for each category gives too many dummy variables!
The regression will then be singular and inestimable.
Either omit the constant, or omit the dummy for one category.
The coefficients of the dummies are relative to the omitted category.

Uses of dummy variables

Seasonal dummies

For quarterly data: use 3 dummies
For monthly data: use 11 dummies
For daily data: use 6 dummies
What to do with weekly data?

Outliers

If there is an outlier, you can use a dummy variable to remove its effect.

Public holidays

For daily data: if it is a public holiday, dummy=1, otherwise dummy=0.

Beer production revisited (1/6)

recent_production <- aus_production |> filter(year(Quarter) >= 1992)
recent_production |> autoplot(Beer) +
  labs(y = "Megalitres", title = "Australian quarterly beer production")

Regression model

\[y_t = \beta_0 + \beta_1 t + \beta_2d_{2,t} + \beta_3 d_{3,t} + \beta_4 d_{4,t} + \varepsilon_t\]

$d_{i,t} = 1$ if $t$ is quarter $i$ and 0 otherwise.

Beer production revisited (2/6)

fit_beer <- recent_production |> model(TSLM(Beer ~ trend() + season()))
report(fit_beer)

Series: Beer 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-42.9029  -7.5995  -0.4594   7.9908  21.7895 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   441.80044    3.73353 118.333  < 2e-16 ***
trend()        -0.34027    0.06657  -5.111 2.73e-06 ***
season()year2 -34.65973    3.96832  -8.734 9.10e-13 ***
season()year3 -17.82164    4.02249  -4.430 3.45e-05 ***
season()year4  72.79641    4.02305  18.095  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16

Beer production revisited (3/6)

augment(fit_beer) |>
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Beer, colour = "Data")) +
  geom_line(aes(y = .fitted, colour = "Fitted")) +
  labs(y = "Megalitres", title = "Australian quarterly beer production") +
  scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00"))

Beer production revisited (4/6)

augment(fit_beer) |>
  ggplot(aes(x = Beer, y = .fitted, colour = factor(quarter(Quarter)))) +
  geom_point() +
  labs(y = "Fitted", x = "Actual values", title = "Quarterly beer production") +
  scale_colour_brewer(palette = "Dark2", name = "Quarter") +
  geom_abline(intercept = 0, slope = 1)

Beer production revisited (5/6)

fit_beer |> gg_tsresiduals()

Beer production revisited (6/6)

fit_beer |>
  forecast() |>
  autoplot(recent_production)

Intervention variables

Spikes

Equivalent to a dummy variable for handling an outlier.

Steps

Variable takes value 0 before the intervention and 1 afterwards.

Change of slope

Variables take values 0 before the intervention and values $\{1,2,3,\dots\}$ afterwards.

Holidays

For monthly data

Christmas: always in December so part of monthly seasonal effect
Easter: use a dummy variable $v_t=1$ if any part of Easter is in that month, $v_t=0$ otherwise.
Ramadan and Chinese new year similar.

Distributed lags

Lagged values of a predictor.

Example: $x$ is advertising which has a delayed effect

\[\begin{align*} x_{1} &= \text{advertising for previous month;} \\ x_{2} &= \text{advertising for two months previously;} \\ & \vdots \\ x_{m} &= \text{advertising for $m$ months previously.} \end{align*}\]

Fourier series

Periodic seasonality can be handled using pairs of Fourier terms:

\[s_{k}(t) = \sin\left(\frac{2\pi k t}{m}\right)\qquad c_{k}(t) = \cos\left(\frac{2\pi k t}{m}\right)\]

\[y_t = a + bt + \sum_{k=1}^K \left[\alpha_k s_k(t) + \beta_k c_k(t)\right] + \varepsilon_t\] where $m$ is the seasonal period.

Every periodic function can be approximated by sums of sin and cos terms for large enough $K$.
Choose $K$ by minimizing AICc.
Called “harmonic regression”

TSLM(y ~ trend() + fourier(K))

Harmonic regression: beer production

fourier_beer <- recent_production |> model(TSLM(Beer ~ trend() + fourier(K = 2)))
report(fourier_beer)

Series: Beer 
Model: TSLM 

Residuals:
     Min       1Q   Median       3Q      Max 
-42.9029  -7.5995  -0.4594   7.9908  21.7895 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        446.87920    2.87321 155.533  < 2e-16 ***
trend()             -0.34027    0.06657  -5.111 2.73e-06 ***
fourier(K = 2)C1_4   8.91082    2.01125   4.430 3.45e-05 ***
fourier(K = 2)S1_4 -53.72807    2.01125 -26.714  < 2e-16 ***
fourier(K = 2)C2_4 -13.98958    1.42256  -9.834 9.26e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16

Harmonic regression: eating-out expenditure

aus_cafe <- aus_retail |>
  filter(Industry == "Cafes, restaurants and takeaway food services",
         year(Month) %in% 2004:2018) |>
  summarise(Turnover = sum(Turnover))
aus_cafe |> autoplot(Turnover)

Harmonic regression: eating-out expenditure (1/7)

fit <- aus_cafe |>
  model(
    K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
    K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
    K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
    K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
    K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
    K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6))
  )

Harmonic regression: eating-out expenditure (2/7)

Harmonic regression: eating-out expenditure (3/7)

Harmonic regression: eating-out expenditure (4/7)

Harmonic regression: eating-out expenditure (5/7)

Harmonic regression: eating-out expenditure (6/7)

Harmonic regression: eating-out expenditure (7/7)

Fourier series

Periodic seasonality can be handled using pairs of Fourier terms: \[s_{k}(t) = \sin\left(\frac{2\pi k t}{m}\right)\qquad c_{k}(t) = \cos\left(\frac{2\pi k t}{m}\right)\] \[y_t = a + bt + \sum_{k=1}^K \left[\alpha_k s_k(t) + \beta_k c_k(t)\right] + \varepsilon_t\]

Every periodic function can be approximated by sums of sin and cos terms for large enough $K$.
$K \le m/2$
$m$ can be non-integer
Particularly useful for large $m$.

Comparing regression models

Computer output for regression will always give the $R^2$ value. This is a useful summary of the model.

It is equal to the square of the correlation between $y$ and $\hat y$.
It is often called the “coefficient of determination”.
It can also be calculated as follows:

\[R^2 = \frac{\sum(\hat{y}_t - \bar{y})^2}{\sum(y_t-\bar{y})^2}\]

It is the proportion of variance accounted for (explained) by the predictors.

Comparing regression models

However …

$R^2$ does not allow for “degrees of freedom”.
Adding any variable tends to increase the value of $R^2$, even if that variable is irrelevant.

To overcome this problem, we can use adjusted $R^2$:

\[\bar{R}^2 = 1-(1-R^2)\frac{T-1}{T-k-1}\]

where $k=$ no. predictors and $T=$ no. observations.

Note

Maximizing $\bar{R}^2$ is equivalent to minimizing $\hat\sigma^2$.

\[\hat{\sigma}^2 = \frac{1}{T-k-1}\sum_{t=1}^T \varepsilon_t^2\]

Akaike’s Information Criterion

\[\text{AIC} = -2\log(L) + 2(k+2)\]

where $L$ is the likelihood and $k$ is the number of predictors in the model.

AIC penalizes terms more heavily than $\bar{R}^2$.
Minimizing the AIC is asymptotically equivalent to minimizing MSE via leave-one-out cross-validation (for any linear regression).

Corrected AIC

For small values of $T$, the AIC tends to select too many predictors, and so a bias-corrected version of the AIC has been developed.

\[\text{AIC}_{\text{C}} = \text{AIC} + \frac{2(k+2)(k+3)}{T-k-3}\]

As with the AIC, the AIC$_{\text{C}}$ should be minimized.

Bayesian Information Criterion

\[\text{BIC} = -2\log(L) + (k+2)\log(T)\]

where $L$ is the likelihood and $k$ is the number of predictors in the model.

BIC penalizes terms more heavily than AIC
Also called SBIC and SC.
Minimizing BIC is asymptotically equivalent to leave-$v$-out cross-validation when $v = T[1-1/(log(T)-1)]$.

Harmonic regression: eating-out expenditure again (1/8)

aus_cafe <- aus_retail |>
  filter(Industry == "Cafes, restaurants and takeaway food services",
         year(Month) %in% 2004:2018) |>
  summarise(Turnover = sum(Turnover))
aus_cafe |> autoplot(Turnover)

Harmonic regression: eating-out expenditure again (2/8)

fit <- aus_cafe |>
  model(
    K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
    K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
    K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
    K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
    K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
    K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6))
  )
glance(fit) |> select(.model, r_squared, adj_r_squared, CV, AICc)

# A tibble: 6 × 5
  .model r_squared adj_r_squared      CV   AICc
  <chr>      <dbl>         <dbl>   <dbl>  <dbl>
1 K1         0.962         0.962 0.00238 -1085.
2 K2         0.966         0.965 0.00220 -1099.
3 K3         0.976         0.975 0.00157 -1160.
4 K4         0.980         0.979 0.00138 -1183.
5 K5         0.985         0.984 0.00104 -1234.
6 K6         0.985         0.984 0.00105 -1232.

Harmonic regression: eating-out expenditure again (3/8)

Harmonic regression: eating-out expenditure again (4/8)

Harmonic regression: eating-out expenditure again (5/8)

Harmonic regression: eating-out expenditure again (6/8)

Harmonic regression: eating-out expenditure again (7/8)

Harmonic regression: eating-out expenditure again (8/8)

Ex-ante versus ex-post forecasts

Ex ante forecasts are made using only information available in advance.
- require forecasts of predictors
Ex post forecasts are made using later information on the predictors.
- useful for studying behavior of forecasting models.
trend, seasonal and calendar variables are all known in advance, so these don’t need to be forecast.

Beer production

recent_production <- aus_production |> filter(year(Quarter) >= 1992)
recent_production |> model(TSLM(Beer ~ trend() + season())) |> 
   forecast() |> autoplot(recent_production)

Scenario based forecasting

Assumes possible scenarios for the predictor variables
Prediction intervals for scenario based forecasts do not include the uncertainty associated with the future values of the predictor variables.

US Consumption

fit_consBest <- us_change |>
  model(
    TSLM(Consumption ~ Income + Savings + Unemployment)
  )

future_scenarios <- scenarios(
  Increase = new_data(us_change, 4) |>
    mutate(Income = 1, Savings = 0.5, Unemployment = 0),
  Decrease = new_data(us_change, 4) |>
    mutate(Income = -1, Savings = -0.5, Unemployment = 0),
  names_to = "Scenario"
)

fc <- forecast(fit_consBest, new_data = future_scenarios)

US Consumption

us_change |> autoplot(Consumption) +
  labs(y = "% change in US consumption") +
  autolayer(fc) +
  labs(title = "US consumption", y = "% change")

Building a predictive regression model

If getting forecasts of predictors is difficult, you can use lagged predictors instead.

\[y_{t+h}=\beta_0+\beta_1x_{1,t}+\dots+\beta_kx_{k,t}+\varepsilon_{t+h}\]

A different model for each forecast horizon $h$.

Nonlinear regression

A log-log functional form

\[\log y=\beta_0+\beta_1 \log x +\varepsilon\]

where $\beta_1$ is interpreted as an elasticity (the average percentage change in $y$ resulting from a $1\%$ increase in $x$).

alternative specifications: log-linear, linear-log.
use $\log(x+1)$ if required.

Piecewise linear and regression splines (1/2)

\[y=f(x) +\varepsilon\] where $f$ is a non-linear function.

For piecewise linear let $x_1=x$ and

\[\begin{align*} x_{2} = (x-c)_+ &= \left\{ \begin{array}{ll} 0 & \text{if } x < c\\ x-c & \text{if } x \ge c. \end{array}\right. \end{align*}\]

Piecewise linear and regression splines (2/2)

In general, Linear Regression Splines

\[x_1=x~~~x_2=(x-c_1)_+~~~\ldots~~~x_k=(x-c_{k-1})_+\]

where $c_1,\ldots,c_{k-1}$ are knots.

Need to select knots: can be difficult and arbitrary.
Automatic knot selection algorithms very slow.
Using piecewise cubics achieves a smoother result.

Warning

Better fit but forecasting outside the range of the historical data is even more unreliable.

Nonlinear trends

Piecewise linear trend with bend at $\tau$

\[\begin{align*} x_{1,t} &= t \\ x_{2,t} &= \left\{ \begin{array}{ll} 0 & t <\tau\\ (t-\tau) & t \ge \tau \end{array}\right. \end{align*}\]

Quadratic or higher order trend

\[x_{1,t} =t,\quad x_{2,t}=t^2,\quad \dots\]

Warning

NOT RECOMMENDED

Example: Boston marathon winning times (1/4)

marathon <- boston_marathon |>
  filter(Event == "Men's open division") |>
  select(-Event) |>
  mutate(Minutes = as.numeric(Time) / 60)
marathon |> autoplot(Minutes) + labs(y = "Winning times in minutes")

Example: Boston marathon winning times (2/4)

fit_trends <- marathon |>
  model(
    # Linear trend
    linear = TSLM(Minutes ~ trend()),
    # Exponential trend
    exponential = TSLM(log(Minutes) ~ trend()),
    # Piecewise linear trend
    piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
  )

fit_trends

# A mable: 1 x 3
   linear exponential piecewise
  <model>     <model>   <model>
1  <TSLM>      <TSLM>    <TSLM>

Example: Boston marathon winning times (3/4)

fit_trends |>
  forecast(h = 10) |>
  autoplot(marathon)

Code
Plot

fc_trends <- fit_trends |> forecast(h = 10)
marathon |>
  autoplot(Minutes) +
  geom_line(
    data = fitted(fit_trends),
    aes(y = .fitted, colour = .model)
  ) +
  autolayer(fc_trends, alpha = 0.5, level = 95) +
  labs(
    y = "Minutes",
    title = "Boston marathon winning times"
  )

Example: Boston marathon winning times (4/4)

fit_trends |>
  select(piecewise) |>
  gg_tsresiduals()

Correlation is not causation

When $x$ is useful for predicting $y$, it is not necessarily causing $y$.
e.g., predict number of swimmers $y$ using number of ice-creams sold $x$.
Correlations are useful for forecasting, even when there is no causality.
Better models usually involve causal relationships (e.g., temperature $x$ and people $z$ to predict swimmers $y$).

Multicollinearity

In regression analysis, multicollinearity occurs when:

Two predictors are highly correlated (i.e., the correlation between them is close to $\pm1$).
A linear combination of some of the predictors is highly correlated with another predictor.
A linear combination of one subset of predictors is highly correlated with a linear combination of another subset of predictors.

Multicollinearity

If multicollinearity exists

the numerical estimates of coefficients may be wrong (worse in Excel than in a statistics package)
don’t rely on the $p$-values to determine significance.
there is no problem with model predictions provided the predictors used for forecasting are within the range used for fitting.
omitting variables can help.
combining variables can help.