Chapter 8 ETS

Tyler George

Cornell College
STA 364 Spring 2025 Block 5

Setup

library(patchwork) #only used for notes
library(gganimate) # only used for notes
library(purrr)
library(rlang)
library(magick)
library(fpp3)

Exponential smoothing

Historical perspective

Developed in the 1950s and 1960s as methods (algorithms) to produce point forecasts.
Combine a “level”, “trend” (slope) and “seasonal” component to describe a time series.
The rate of change of the components are controlled by “smoothing parameters”: \(\alpha\), \(\beta\) and \(\gamma\) respectively.
Need to choose best values for the smoothing parameters (and initial states).
Equivalent ETS state space models developed in the 1990s and 2000s.

Big idea: control the rate of change (1/2)

\(\alpha\) controls the flexibility of the level (Level - the average value for a specific time period)

If \(\alpha = 0\), the level never updates (mean)
If \(\alpha = 1\), the level updates completely (naive)

\(\beta\) controls the flexibility of the trend

If \(\beta = 0\), the trend is linear
If \(\beta = 1\), the trend changes suddenly every observation

Big idea: control the rate of change (2/2)

\(\gamma\) controls the flexibility of the seasonality

If \(\gamma = 0\), the seasonality is fixed (seasonal means)
If \(\gamma = 1\), the seasonality updates completely (seasonal naive)

A model for levels, trends, and seasonalities

We want a model that captures the level (\(\ell_t\)), trend (\(b_t\)) and seasonality (\(s_t\)).

How do we combine these elements?

Additively? \[ y_t = \ell_{t-1} + b_{t-1} + s_{t-m} + \varepsilon_t \]

Multiplicatively? \[ y_t = \ell_{t-1}b_{t-1}s_{t-m}(1 + \varepsilon_t) \]

Perhaps a mix of both? \[ y_t = (\ell_{t-1} + b_{t-1}) s_{t-m} + \varepsilon_t \]

How do the level, trend and seasonal components evolve over time?

ETS models

General notation

General notation		E T S	ExponenTial Smoothing
	\(\nearrow\)	\(\uparrow\)	\(\nwarrow\)
Error		Trend	Season

Error: Additive (A) or multiplicative (M)

Trend: None (N), additive (A), multiplicative (M), or damped (\(A_d\) or \(M_d\)).

Seasonality: None (N), additive (A) or multiplicative (M)

Simple exponential smoothing

Simple methods

Time series \(y_1, y_2, \dots, y_T\).

Naive (Random walk forecasts)

\[\hat{y}_{T+h|T} = y_T\]

Average forecasts

\[ \hat{y}_{T+h|T} = \frac{1}{T} \sum_{t=1}^T y_t \]

Want something in between these methods.
Most recent data should have more weight.

Simple exponential smoothing uses a weighted moving average with weights that decrease exponentially.

Simple Exponential Smoothing

Forecast equation

\[ \hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2} + \cdots \]

where \(0 \le \alpha \le 1\).

Weights assigned to observations for:

Observation	\(\alpha = 0.2\)	\(\alpha = 0.4\)	\(\alpha = 0.6\)	\(\alpha = 0.8\)
\(y_{T}\)	0.2	0.4	0.6	0.8
\(y_{T-1}\)	0.16	0.24	0.24	0.16
\(y_{T-2}\)	0.128	0.144	0.096	0.032
\(y_{T-3}\)	0.1024	0.0864	0.0384	0.0064
\(y_{T-4}\)	\((0.2)(0.8)^4\)	\((0.4)(0.6)^4\)	\((0.6)(0.4)^4\)	\((0.8)(0.2)^4\)
\(y_{T-5}\)	\((0.2)(0.8)^5\)	\((0.4)(0.6)^5\)	\((0.6)(0.4)^5\)	\((0.8)(0.2)^5\)

Simple Exponential Smoothing

algeria_economy <- global_economy |>
  filter(Country == "Algeria")

Simple Exponential Smoothing

Component form

\[\begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} &= \ell_{t} \\ \text{Smoothing equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} \end{align*}\]

\(\ell_t\) is the level (or the smoothed value) of the series at time t.
\(\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha) \hat{y}_{t|t-1}\)
The process has to start somewhere, so we let the first fitted value at time 1 be denoted by \(\ell_0\) (which we will have to estimate) . . .

Simple Exponential Smoothing

Iterate to get exponentially weighted moving average form.

Weighted average form

\[ \hat{y}_{T+1|T} = \sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j} + (1-\alpha)^T \ell_{0} \]

Optimizing smoothing parameters

Need to choose best values for \(\alpha\) and \(\ell_0\).
Similarly to regression, choose optimal parameters by minimizing SSE:

\[ \text{SSE}=\sum_{t=1}^T(y_t - \hat{y}{t}{t-1})^2. \]

Unlike regression there is no closed form solution — use numerical optimization.

For Algerian Exports example:
- \(\hat\alpha = 0.8400\)
- \(\hat\ell_0 = 39.54\)

Simple Exponential Smoothing

Models and methods

Methods

Algorithms that return point forecasts.

Models

Generate same point forecasts but can also generate forecast distributions.
A stochastic (or random) data generating process that can generate an entire forecast distribution.
Allow for “proper” model selection.

ETS(A,N,N): SES with additive errors

Component form

\[\begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} &= \ell_{t} \\ \text{Smoothing equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} \end{align*}\]

Forecast error: \(e_t = y_t - \hat{y}_{t|t-1} = y_t - \ell_{t-1}\).

Error correction form

\[\begin{align*} y_t &= \ell_{t-1} + e_t \\ \ell_{t} &= \ell_{t-1} + \alpha (y_t - \ell_{t-1}) \\ &= \ell_{t-1} + \alpha e_t \end{align*}\]

Specify probability distribution for \(e_t\), we assume \(e_t = \varepsilon_t \sim \text{NID}(0, \sigma^2)\).

ETS(A,N,N): SES with additive errors

\[\begin{align*} \text{Measurement equation} && y_t &= \ell_{t-1} + \varepsilon_t \\ \text{State equation} && \ell_t &= \ell_{t-1} + \alpha \varepsilon_t \end{align*}\]

where \(\varepsilon_t \sim \text{NID}(0, \sigma^2)\).

“Innovations” or “single source of error” because equations have the same error process, \(\varepsilon_t\).
Measurement equation: relationship between observations and states.
State equation(s): evolution of the state(s) through time.

ETS(M,N,N): SES with multiplicative errors

Specify relative errors \(\varepsilon_t = \frac{y_t - \hat{y}_{t|t-1}}{\hat{y}_{t|t-1}} \sim \text{NID}(0, \sigma^2)\)
Substituting \(\hat{y}_{t|t-1} = \ell_{t-1}\) gives:
- \(y_t = \ell_{t-1} + \ell_{t-1} \varepsilon_t\)
- \(e_t = y_t - \hat{y}_{t|t-1} = \ell_{t-1} \varepsilon_t\) . . .

ETS(M,N,N): SES with multiplicative errors

Measurement equation

\[\begin{align*} \text{Measurement equation} && y_t &= \ell_{t-1} (1 + \varepsilon_t) \\ \text{State equation} && \ell_t &= \ell_{t-1} (1 + \alpha \varepsilon_t) \end{align*}\]

Models with additive and multiplicative errors with the same parameters generate the same point forecasts but different prediction intervals.

ETS(A,N,N): Specifying the model

ETS(y ~ error("A") + trend("N") + season("N"))

By default, an optimal value for \(\alpha\) and \(\ell_0\) is used.

\(\alpha\) can be chosen manually in trend().

trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))

Example: Algerian Exports (1/4)

algeria_economy <- global_economy |>
  filter(Country == "Algeria")
fit <- algeria_economy |>
  model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))
report(fit)

Series: Exports 
Model: ETS(A,N,N) 
  Smoothing parameters:
    alpha = 0.8399875 

  Initial states:
   l[0]
 39.539

  sigma^2:  35.6301

     AIC     AICc      BIC 
446.7154 447.1599 452.8968

Example: Algerian Exports (2/4)

components(fit) |> autoplot()

Example: Algerian Exports (3/4)

components(fit) |>
  left_join(fitted(fit), by = c("Country", ".model", "Year"))

# A dable: 59 x 7 [1Y]
# Key:     Country, .model [1]
# :        Exports = lag(level, 1) + remainder
   Country .model  Year Exports level remainder .fitted
   <fct>   <chr>  <dbl>   <dbl> <dbl>     <dbl>   <dbl>
 1 Algeria ANN     1959    NA    39.5    NA        NA  
 2 Algeria ANN     1960    39.0  39.1    -0.496    39.5
 3 Algeria ANN     1961    46.2  45.1     7.12     39.1
 4 Algeria ANN     1962    19.8  23.8   -25.3      45.1
 5 Algeria ANN     1963    24.7  24.6     0.841    23.8
 6 Algeria ANN     1964    25.1  25.0     0.534    24.6
 7 Algeria ANN     1965    22.6  23.0    -2.39     25.0
 8 Algeria ANN     1966    26.0  25.5     3.00     23.0
 9 Algeria ANN     1967    23.4  23.8    -2.07     25.5
10 Algeria ANN     1968    23.1  23.2    -0.630    23.8
# ℹ 49 more rows

Example: Algerian Exports (4/4)

fit |>
  forecast(h = 5) |>
  autoplot(algeria_economy) +
  labs(y = "% of GDP", title = "Exports: Algeria")

Models with trend

Holt’s linear trend

Component form

\[\begin{align*} \text{Forecast} && \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ \text{Trend} && b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1} \end{align*}\]

Two smoothing parameters \(\alpha\) and \(\beta^*\) (\(0 \le \alpha, \beta^* \le 1\)).
\(\ell_t\) level: weighted average between \(y_t\) and one-step ahead forecast for time \(t\), \((\ell_{t-1} + b_{t-1} = \hat{y}_{t|t-1})\)
\(b_t\) slope: weighted average of \((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of slope.
Choose \(\alpha, \beta^*, \ell_0, b_0\) to minimize SSE.

ETS(A,A,N)

Holt’s linear method with additive errors.

Assume \(\varepsilon_t = y_t - \ell_{t-1} - b_{t-1} \sim \text{NID}(0, \sigma^2)\).
Substituting into the error correction equations for Holt’s linear method:

\[\begin{align*} y_t &= \ell_{t-1} + b_{t-1} + \varepsilon_t \\ \ell_t &= \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t \\ b_t &= b_{t-1} + \alpha \beta^* \varepsilon_t \end{align*}\]

For simplicity, set \(\beta = \alpha \beta^*\).

Exponential smoothing: trend/slope

ETS(M,A,N)

Holt’s linear method with multiplicative errors.

Assume \(\varepsilon_t=\frac{y_t-(\ell_{t-1}+b_{t-1})}{(\ell_{t-1}+b_{t-1})}\)
Following a similar approach as above, the innovations state space model underlying Holt’s linear method with multiplicative errors is specified as

\[\begin{align*} y_t&=(\ell_{t-1}+b_{t-1})(1+\varepsilon_t)\\ \ell_t&=(\ell_{t-1}+b_{t-1})(1+\alpha \varepsilon_t)\\ b_t&=b_{t-1}+\beta(\ell_{t-1}+b_{t-1}) \varepsilon_t \end{align*}\]

where again \(\beta=\alpha \beta^*\) and \(\varepsilon_t \sim \text{NID}(0,\sigma^2)\).

ETS(A,A,N): Specifying the model

ETS(y ~ error("A") + trend("A") + season("N"))

By default, optimal values for \(\beta\) and \(b_0\) are used.

\(\beta\) can be chosen manually in trend().

trend("A", beta = 0.004)
trend("A", beta_range = c(0, 0.1))

Example: Australian population (1/4)

aus_economy <- global_economy |> filter(Code == "AUS") |>
  mutate(Pop = Population / 1e6)
fit <- aus_economy |>
  model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N")))
report(fit)

Series: Pop 
Model: ETS(A,A,N) 
  Smoothing parameters:
    alpha = 0.9999 
    beta  = 0.3266366 

  Initial states:
     l[0]      b[0]
 10.05414 0.2224818

  sigma^2:  0.0041

      AIC      AICc       BIC 
-76.98569 -75.83184 -66.68347

Example: Australian population (2/4)

components(fit) |> autoplot()

Example: Australian population (3/4)

components(fit) |>
  left_join(fitted(fit), by = c("Country", ".model", "Year"))

# A dable: 59 x 8 [1Y]
# Key:     Country, .model [1]
# :        Pop = lag(level, 1) + lag(slope, 1) + remainder
   Country   .model  Year   Pop level slope remainder .fitted
   <fct>     <chr>  <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl>
 1 Australia AAN     1959  NA    10.1 0.222 NA           NA  
 2 Australia AAN     1960  10.3  10.3 0.222 -0.000145    10.3
 3 Australia AAN     1961  10.5  10.5 0.217 -0.0159      10.5
 4 Australia AAN     1962  10.7  10.7 0.231  0.0418      10.7
 5 Australia AAN     1963  11.0  11.0 0.223 -0.0229      11.0
 6 Australia AAN     1964  11.2  11.2 0.221 -0.00641     11.2
 7 Australia AAN     1965  11.4  11.4 0.221 -0.000314    11.4
 8 Australia AAN     1966  11.7  11.7 0.235  0.0418      11.6
 9 Australia AAN     1967  11.8  11.8 0.206 -0.0869      11.9
10 Australia AAN     1968  12.0  12.0 0.208  0.00350     12.0
# ℹ 49 more rows

Example: Australian population (4/4)

fit |>
  forecast(h = 10) |>
  autoplot(aus_economy) +
  labs(y = "Millions", title = "Population: Australia")

Damped trend method

Component form

\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi + \phi^2 + \dots + \phi^{h})b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \end{align*}\]

Damping parameter \(0<\phi<1\).
If \(\phi=1\), identical to Holt’s linear trend.
As \(h\rightarrow\infty\), \(\hat{y}{T+h}{T}\rightarrow \ell_T+\phi b_T/(1-\phi)\).
Short-run forecasts trended, long-run forecasts constant.

Your turn

Write down the model for ETS(A,\(A_d\),N)

Example: Australian population ct. (1/3)

aus_economy |>
  model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) |>
  forecast(h = 20) |>
  autoplot(aus_economy)

Example: Australian population ct. (2/3)

fit <- aus_economy |>
  filter(Year <= 2010) |>
  model(
    ses = ETS(Pop ~ error("A") + trend("N") + season("N")),
    holt = ETS(Pop ~ error("A") + trend("A") + season("N")),
    damped = ETS(Pop ~ error("A") + trend("Ad") + season("N"))
  )

tidy(fit)
accuracy(fit)

Example: Australian population ct. (3/3)

term	SES	Linear trend	Damped trend
\(\alpha\)	1.00	1.00	1.00
\(\beta^*\)		0.30	0.40
\(\phi\)			0.98
NA		0.22	0.25
NA	10.28	10.05	10.04
Training RMSE	0.24	0.06	0.07
Test RMSE	1.63	0.15	0.21
Test MASE	6.18	0.55	0.75
Test MAPE	6.09	0.55	0.74
Test MAE	1.45	0.13	0.18

Your turn

prices contains the price of a dozen eggs in the United States from 1900–1993

Use SES and Holt’s method (with and without damping) to forecast “future” data.

Tip

Use h=100 so you can clearly see the differences between the options when plotting the forecasts.

Which method gives the best training RMSE?
Are these RMSE values comparable?
Do the residuals from the best fitting method look like white noise?

Models with seasonality

Holt-Winters additive method

Holt and Winters extended Holt’s method to capture seasonality.

Component form

\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma (y_{t} - \ell_{t-1} - b_{t-1}) + (1 - \gamma)s_{t-m} \end{align*}\]

\(k=\) integer part of \((h-1)/m\). Ensures estimates from the final year are used for forecasting.
Parameters: \(0\le \alpha\le 1\), \(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m=\) period of seasonality (e.g. \(m=4\) for quarterly data).

Holt-Winters additive method

Seasonal component is usually expressed as \(s_{t} = \gamma^* (y_{t}-\ell_{t})+ (1-\gamma^*)s_{t-m}.\)
Substitute in for \(\ell_t\): \(s_{t} = \gamma^*(1-\alpha) (y_{t}-\ell_{t-1}-b_{t-1})+ [1-\gamma^*(1-\alpha)]s_{t-m}\)
We set \(\gamma=\gamma^*(1-\alpha)\).
The usual parameter restriction is \(0\le\gamma^*\le1\), which translates to \(0\le\gamma\le(1-\alpha)\).

Exponential smoothing: seasonality

ETS(A,A,A)

Holt-Winters additive method with additive errors.

\[\begin{array}{ll} \text{Forecast equation} & \hat{y}_{t+h|t} = \ell_{t} + hb_{t} + s_{t+h-m(k+1)} \\ \text{Observation equation} & y_t = \ell_{t-1} + b_{t-1} + s_{t-m} + \varepsilon_t \\ \text{State equations} & \ell_t = \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t \\ & b_t = b_{t-1} + \beta \varepsilon_t \\ & s_t = s_{t-m} + \gamma \varepsilon_t \end{array}\]

Forecast errors: \(\varepsilon_{t} = y_t - \hat{y}_{t|t-1}\)
\(k\) is integer part of \((h-1)/m\).

Your turn

Write down the model for ETS(A,N,A)

Holt-Winters multiplicative method

Seasonal variations change in proportion to the level of the series.

Component form

\[\begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*}\]

\(k\) is integer part of \((h-1)/m\).
Additive method: \(s_t\) in absolute terms — within each year \(\sum_i s_i \approx 0\).
Multiplicative method: \(s_t\) in relative terms — within each year \(\sum_i s_i \approx m\).

ETS(M,A,M)

Holt-Winters multiplicative method with multiplicative errors.

\[\begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t}) s_{t+h-m(k+1)} \\ \text{Observation equation} && y_t &= (\ell_{t-1} + b_{t-1})s_{t-m}(1 + \varepsilon_t) \\ \text{State equations} && \ell_t &= (\ell_{t-1} + b_{t-1})(1 + \alpha \varepsilon_t) \\ && b_t &= b_{t-1} + \beta(\ell_{t-1} + b_{t-1}) \varepsilon_t \\ && s_t &= s_{t-m}(1 + \gamma \varepsilon_t) \end{align*}\]

Forecast errors: \(\varepsilon_{t} = (y_t - \hat{y}_{t|t-1})/\hat{y}_{t|t-1}\)
\(k\) is integer part of \((h-1)/m\).

Example: Australian holiday tourism (1/2)

aus_holidays <- tourism |>
  filter(Purpose == "Holiday") |>
  summarise(Trips = sum(Trips))
fit <- aus_holidays |>
  model(
    additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
  )
fc <- fit |> forecast()

Example: Australian holiday tourism (2/2)

fc |>
  autoplot(aus_holidays, level = NULL) +
  labs(y = "Thousands", title = "Overnight trips")

Estimated components (1/2)

components(fit)

# A dable: 168 x 7 [1Q]
# Key:     .model [2]
# :        Trips = lag(level, 1) + lag(slope, 1) + lag(season, 4) + remainder
   .model   Quarter  Trips level slope season remainder
   <chr>      <qtr>  <dbl> <dbl> <dbl>  <dbl>     <dbl>
 1 additive 1997 Q1    NA    NA   NA    1512.      NA  
 2 additive 1997 Q2    NA    NA   NA    -290.      NA  
 3 additive 1997 Q3    NA    NA   NA    -684.      NA  
 4 additive 1997 Q4    NA  9899. -37.4  -538.      NA  
 5 additive 1998 Q1 11806. 9964. -24.5  1512.     433. 
 6 additive 1998 Q2  9276. 9851. -35.6  -290.    -374. 
 7 additive 1998 Q3  8642. 9700. -50.2  -684.    -489. 
 8 additive 1998 Q4  9300. 9694. -44.6  -538.     188. 
 9 additive 1999 Q1 11172. 9652. -44.3  1512.      10.7
10 additive 1999 Q2  9608. 9676. -35.6  -290.     290. 
# ℹ 158 more rows

Estimated components (2/2)

Figure 1

Holt-Winters damped method

Often the single most accurate forecasting method for seasonal data:

\[\begin{align*} \hat{y}_{t+h|t} &= [\ell_{t} + (\phi + \phi^2 + \dots + \phi^{h})b_{t}]s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \left(\frac{y_{t}}{s_{t-m}}\right) + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \\ s_{t} &= \gamma \left(\frac{y_{t}}{(\ell_{t-1} + \phi b_{t-1})}\right) + (1 - \gamma)s_{t-m} \end{align*}\]

Your turn

Apply Holt-Winters’ multiplicative method to the Gas data from aus_production.

Why is multiplicative seasonality necessary here?
Experiment with making the trend damped.
Check that the residuals from the best method look like white noise.

Holt-Winters with daily data

sth_cross_ped <- pedestrian |>
  filter(
    Date >= "2016-07-01",
    Sensor == "Southern Cross Station"
  ) |>
  index_by(Date) |>
  summarise(Count = sum(Count) / 1000)
sth_cross_ped |>
  filter(Date <= "2016-07-31") |>
  model(
    hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
  ) |>
  forecast(h = "2 weeks") |>
  autoplot(sth_cross_ped |> filter(Date <= "2016-08-14")) +
  labs(
    title = "Daily traffic: Southern Cross",
    y = "Pedestrians ('000)"
  )

Holt-Winters with daily data

Innovations state space models

Exponential smoothing methods

Trend Component	Seasonal	Component
	N (None)	A (Additive)	M (Multiplicative)
N (None)	(N,N)	(N,A)	(N,M)
A (Additive)	(A,N)	(A,A)	(A,M)
\(A_d\) (Additive damped)	(\(A_d\),N)	(\(A_d\),A)	(\(A_d\),M)

Exponential smoothing methods

Method	Description
(N,N)	Simple exponential smoothing
(A,N)	Holt’s linear method
(\(A_d\),N)	Additive damped trend method
(A,A)	Additive Holt-Winters’ method
(A,M)	Multiplicative Holt-Winters’ method
(\(A_d\),M)	Damped multiplicative Holt-Winters’ method

ETS models

Additive Error

Trend Component	Seasonal	Component
	N (None)	A (Additive)	M (Multiplicative)
N (None)	A,N,N	A,N,A	A,N,M
A (Additive)	A,A,N	A,A,A	A,A,M
\(A_d\) (Additive damped)	A,\(A_d\),N	A,\(A_d\),A	A,\(A_d\),M

Multiplicative Error

Trend Component	Seasonal	Component
	N (None)	A (Additive)	M (Multiplicative)
N (None)	M,N,N	M,N,A	M,N,M
A (Additive)	M,A,N	M,A,A	M,A,M
\(A_d\) (Additive damped)	M,\(A_d\),N	M,\(A_d\),A	M,\(A_d\),M

Additive error models

Multiplicative error models

Estimating ETS models

Smoothing parameters \(\alpha\), \(\beta\), \(\gamma\) and \(\phi\), and the initial states \(\ell_0\), \(b_0\), \(s_0,s_{-1},\dots,s_{-m+1}\) are estimated by maximizing the “likelihood” = the probability of the data arising from the specified model.
For models with additive errors equivalent to minimizing SSE.
For models with multiplicative errors, not equivalent to minimizing SSE.

Innovations state space models

Estimation

\[L^*(\theta,\textbf{x}_0) = T\log\!\left(\sum_{t=1}^T \varepsilon^2_t\!\right) + 2\sum_{t=1}^T \log|k(\textbf{x}_{t-1})|\]

\[= -2\log(\text{Likelihood}) + \text{constant}\]

Estimate parameters \(\theta = (\alpha,\beta,\gamma,\phi)\) and initial states \(\textbf{x}_0 = (\ell_0,b_0,s_0,s_{-1},\dots,s_{-m+1})\) by minimizing \(L^*\).

Parameter restrictions

Usual region

Traditional restrictions in the methods \(0< \alpha,\beta^*,\gamma^*,\phi<1\)(equations interpreted as weighted averages).
In models we set \(\beta=\alpha\beta^*\) and \(\gamma=(1-\alpha)\gamma^*\).
Therefore \(0< \alpha <1\), \(0 < \beta < \alpha\) and \(0< \gamma < 1-\alpha\).
\(0.8<\phi<0.98\) — to prevent numerical difficulties.

Admissible region

To prevent observations in the distant past having a continuing effect on current forecasts.
Usually (but not always) less restrictive than traditional region.
For example for ETS(A,N,N): traditional* \(0< \alpha <1\) while admissible \(0< \alpha <2\).

Model selection

Akaike’s Information Criterion

\[AIC = -2\log(\text{L}) + 2k\]

where \(L\) is the likelihood and \(k\) is the number of parameters initial states estimated in the model.

Corrected AIC

\[\text{AIC}_{\text{c}} = \text{AIC} + \frac{2k(k+1)}{T-k-1}\]

which is the AIC corrected (for small sample bias).

Bayesian Information Criterion

\[\text{BIC} = \text{AIC} + k[\log(T)-2].\]

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each model that is appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion).
Select best method using AICc:
Produce forecasts using best method.
Obtain forecast intervals using underlying state space model.

Note

Method performed very well in M3 competition.

Example: National populations (1/2)

fit <- global_economy |>
  mutate(Pop = Population / 1e6) |>
  model(ets = ETS(Pop))
fit

# A mable: 263 x 2
# Key:     Country [263]
   Country                      ets
   <fct>                    <model>
 1 Afghanistan         <ETS(A,A,N)>
 2 Albania             <ETS(M,A,N)>
 3 Algeria             <ETS(M,A,N)>
 4 American Samoa      <ETS(M,A,N)>
 5 Andorra             <ETS(M,A,N)>
 6 Angola              <ETS(M,A,N)>
 7 Antigua and Barbuda <ETS(M,A,N)>
 8 Arab World          <ETS(M,A,N)>
 9 Argentina           <ETS(A,A,N)>
10 Armenia             <ETS(M,A,N)>
# ℹ 253 more rows

Example: National populations (2/2)

fit |>
  forecast(h = 5)

# A fable: 1,315 x 5 [1Y]
# Key:     Country, .model [263]
   Country  .model  Year             Pop
   <fct>    <chr>  <dbl>          <dist>
 1 Afghani… ets     2018    N(36, 0.012)
 2 Afghani… ets     2019    N(37, 0.059)
 3 Afghani… ets     2020     N(38, 0.16)
 4 Afghani… ets     2021     N(39, 0.35)
 5 Afghani… ets     2022     N(40, 0.64)
 6 Albania  ets     2018 N(2.9, 0.00012)
 7 Albania  ets     2019   N(2.9, 6e-04)
 8 Albania  ets     2020  N(2.9, 0.0017)
 9 Albania  ets     2021  N(2.9, 0.0036)
10 Albania  ets     2022  N(2.9, 0.0066)
# ℹ 1,305 more rows
# ℹ 1 more variable: .mean <dbl>

Example: Australian holiday tourism (1/9)

holidays <- tourism |>
  filter(Purpose == "Holiday")
fit <- holidays |> model(ets = ETS(Trips))
fit

# A mable: 76 x 4
# Key:     Region, State, Purpose [76]
   Region                     State              Purpose          ets
   <chr>                      <chr>              <chr>        <model>
 1 Adelaide                   South Australia    Holiday <ETS(A,N,A)>
 2 Adelaide Hills             South Australia    Holiday <ETS(A,A,N)>
 3 Alice Springs              Northern Territory Holiday <ETS(M,N,A)>
 4 Australia's Coral Coast    Western Australia  Holiday <ETS(M,N,A)>
 5 Australia's Golden Outback Western Australia  Holiday <ETS(M,N,M)>
 6 Australia's North West     Western Australia  Holiday <ETS(A,N,A)>
 7 Australia's South West     Western Australia  Holiday <ETS(M,N,M)>
 8 Ballarat                   Victoria           Holiday <ETS(M,N,A)>
 9 Barkly                     Northern Territory Holiday <ETS(A,N,A)>
10 Barossa                    South Australia    Holiday <ETS(A,N,N)>
# ℹ 66 more rows

Example: Australian holiday tourism (2/9)

fit |>
  filter(Region == "Snowy Mountains") |>
  report()

Series: Trips 
Model: ETS(M,N,A) 
  Smoothing parameters:
    alpha = 0.1571013 
    gamma = 0.0001000991 

  Initial states:
     l[0]      s[0]    s[-1]     s[-2]     s[-3]
 141.6782 -60.95904 130.8567 -42.23776 -27.65986

  sigma^2:  0.0388

     AIC     AICc      BIC 
852.0452 853.6008 868.7194

Example: Australian holiday tourism (3/9)

fit |>
  filter(Region == "Snowy Mountains") |>
  components(fit)

# A dable: 84 x 9 [1Q]
# Key:     Region, State, Purpose, .model [1]
# :        Trips = (lag(level, 1) + lag(season, 4)) * (1 + remainder)
   Region          State           Purpose .model Quarter Trips level season remainder
   <chr>           <chr>           <chr>   <chr>    <qtr> <dbl> <dbl>  <dbl>     <dbl>
 1 Snowy Mountains New South Wales Holiday ets    1997 Q1  NA     NA   -27.7   NA     
 2 Snowy Mountains New South Wales Holiday ets    1997 Q2  NA     NA   -42.2   NA     
 3 Snowy Mountains New South Wales Holiday ets    1997 Q3  NA     NA   131.    NA     
 4 Snowy Mountains New South Wales Holiday ets    1997 Q4  NA    142.  -61.0   NA     
 5 Snowy Mountains New South Wales Holiday ets    1998 Q1 101.   140.  -27.7   -0.113 
 6 Snowy Mountains New South Wales Holiday ets    1998 Q2 112.   142.  -42.2    0.154 
 7 Snowy Mountains New South Wales Holiday ets    1998 Q3 310.   148.  131.     0.137 
 8 Snowy Mountains New South Wales Holiday ets    1998 Q4  89.8  148.  -61.0    0.0335
 9 Snowy Mountains New South Wales Holiday ets    1999 Q1 112.   147.  -27.7   -0.0687
10 Snowy Mountains New South Wales Holiday ets    1999 Q2 103.   147.  -42.2   -0.0199
# ℹ 74 more rows

Example: Australian holiday tourism (4/9)

fit |>
  filter(Region == "Snowy Mountains") |>
  components(fit) |>
  autoplot()

Example: Australian holiday tourism (5/9)

fit |> forecast()

# A fable: 608 x 7 [1Q]
# Key:     Region, State, Purpose, .model [76]
   Region         State           Purpose .model Quarter
   <chr>          <chr>           <chr>   <chr>    <qtr>
 1 Adelaide       South Australia Holiday ets    2018 Q1
 2 Adelaide       South Australia Holiday ets    2018 Q2
 3 Adelaide       South Australia Holiday ets    2018 Q3
 4 Adelaide       South Australia Holiday ets    2018 Q4
 5 Adelaide       South Australia Holiday ets    2019 Q1
 6 Adelaide       South Australia Holiday ets    2019 Q2
 7 Adelaide       South Australia Holiday ets    2019 Q3
 8 Adelaide       South Australia Holiday ets    2019 Q4
 9 Adelaide Hills South Australia Holiday ets    2018 Q1
10 Adelaide Hills South Australia Holiday ets    2018 Q2
# ℹ 598 more rows
# ℹ 2 more variables: Trips <dist>, .mean <dbl>

Example: Australian holiday tourism (6/9)

fit |> forecast() |>
  filter(Region == "Snowy Mountains") |>
  autoplot(holidays) +
  labs(y = "Thousands", title = "Overnight trips")

Residuals

Response residuals \[\hat{e}_t = y_t - \hat{y}_{t|t-1}\]

Innovation residuals Additive error model: \[\hat\varepsilon_t = y_t - \hat{y}_{t|t-1}\]

Multiplicative error model: \[\hat\varepsilon_t = \frac{y_t - \hat{y}_{t|t-1}}{\hat{y}_{t|t-1}}\]

Example: Australian holiday tourism (7/9)

aus_holidays <- tourism |>
  filter(Purpose == "Holiday") |>
  summarise(Trips = sum(Trips))
fit <- aus_holidays |>
  model(ets = ETS(Trips)) |>
  report()

Series: Trips 
Model: ETS(M,N,M) 
  Smoothing parameters:
    alpha = 0.3578226 
    gamma = 0.0009685565 

  Initial states:
     l[0]      s[0]     s[-1]    s[-2]    s[-3]
 9666.501 0.9430367 0.9268433 0.968352 1.161768

  sigma^2:  0.0022

     AIC     AICc      BIC 
1331.372 1332.928 1348.046

Example: Australian holiday tourism (8/9)

Code
Plot

residuals(fit)
residuals(fit, type = "response")

bind_rows(
  residuals(fit) |> mutate(Type = "Innovation residuals") |> as_tibble(),
  residuals(fit, type = "response") |> 
    mutate(Type = "Response residuals") |> 
    as_tibble()) |>
  ggplot(aes(x = Quarter, y = .resid)) +
  geom_line() +
  facet_grid(Type ~ ., scales = "free_y") +
  labs(y = "")

Example: Australian holiday tourism (9/9)

fit |>
  augment()

# A tsibble: 80 x 6 [1Q]
# Key:       .model [1]
   .model Quarter  Trips .fitted .resid   .innov
   <chr>    <qtr>  <dbl>   <dbl>  <dbl>    <dbl>
 1 ets    1998 Q1 11806.  11230.  576.   0.0513 
 2 ets    1998 Q2  9276.   9532. -257.  -0.0269 
 3 ets    1998 Q3  8642.   9036. -393.  -0.0435 
 4 ets    1998 Q4  9300.   9050.  249.   0.0275 
 5 ets    1999 Q1 11172.  11260.  -88.0 -0.00781
 6 ets    1999 Q2  9608.   9358.  249.   0.0266 
 7 ets    1999 Q3  8914.   9042. -129.  -0.0142 
 8 ets    1999 Q4  9026.   9154. -129.  -0.0140 
 9 ets    2000 Q1 11071.  11221. -150.  -0.0134 
10 ets    2000 Q2  9196.   9308. -111.  -0.0120 
# ℹ 70 more rows

Innovation residuals (.innov) are given by \(\hat{\varepsilon}_t\) while regular residuals (.resid) are \(y_t - \hat{y}_{t-1}\). They are different when the model has multiplicative errors.

Some unstable models

Some of the combinations of (Error, Trend, Seasonal) can lead to numerical difficulties; see equations with division by a state.
These are: ETS(A,N,M), ETS(A,A,M), ETS(A,\(A_d\),M).
Models with multiplicative errors are useful for strictly positive data, but are not numerically stable with data containing zeros or negative values. In that case only the six fully additive models will be applied.

Exponential smoothing models

Additive Error

Trend	N (None)	A (Additive)	M (Multiplicative)
N (None)	A,N,N	A,N,A	~~A,N,M~~
A (Additive)	A,A,N	A,A,A	~~A,A,M~~
\(A_d\) (Additive damped)	A,\(A_d\),N	A,\(A_d\),A	~~A,\(A_d\),M~~

Multiplicative Error

Trend	N (None)	A (Additive)	M (Multiplicative)
N (None)	M,N,N	M,N,A	M,N,M
A (Additive)	M,A,N	M,A,A	M,A,M
\(A_d\) (Additive damped)	M,\(A_d\),N	M,\(A_d\),A	M,\(A_d\),M

Forecasting with exponential smoothing

Forecasting with ETS models

Traditional point forecasts: iterate the equations for \(t=T+1,T+2,\dots,T+h\) and set all \(\varepsilon_t=0\) for \(t>T\).

Not the same as \(\text{E}(y_{t+h} | \textbf{x}_t)\) unless seasonality is additive.
fable uses \(\text{E}(y_{t+h} | \textbf{x}_t)\).
Point forecasts for ETS(A,*,*) are identical to ETS(M,*,*) if the parameters are the same.

Forecasting with ETS models

Prediction intervals: can only be generated using the models.

The prediction intervals will differ between models with additive and multiplicative errors.
Exact formulae for some models.
More general to simulate future sample paths, conditional on the last estimate of the states, and to obtain prediction intervals from the percentiles of these simulated future paths.

Prediction intervals

PI for most ETS models: \(\hat{y}_{T+h|T} \pm c \sigma_h\), where \(c\) depends on coverage probability and \(\sigma_h\) is forecast standard deviation.

We can see them in the book HERE.

Example: Corticosteroid drug sales (1/5)

h02 <- PBS |>
  filter(ATC2 == "H02") |>
  summarise(Cost = sum(Cost))
h02 |> autoplot(Cost)

Example: Corticosteroid drug sales (2/5)

h02 |>
  model(ETS(Cost)) |>
  report()

Series: Cost 
Model: ETS(M,Ad,M) 
  Smoothing parameters:
    alpha = 0.3071016 
    beta  = 0.0001006793 
    gamma = 0.0001007181 
    phi   = 0.977528 

  Initial states:
     l[0]    b[0]      s[0]     s[-1]     s[-2]     s[-3]     s[-4]    s[-5]    s[-6]
 417268.7 8205.82 0.8716807 0.8259747 0.7562808 0.7733338 0.6872373 1.283821 1.324616
    s[-7]    s[-8]    s[-9]   s[-10]    s[-11]
 1.180067 1.163601 1.104801 1.047963 0.9806235

  sigma^2:  0.0046

     AIC     AICc      BIC 
5515.212 5518.909 5574.938

Example: Corticosteroid drug sales (3/5)

h02 |>
  model(ETS(Cost ~ error("A") + trend("A") + season("A"))) |>
  report()

Series: Cost 
Model: ETS(A,A,A) 
  Smoothing parameters:
    alpha = 0.1702163 
    beta  = 0.006310854 
    gamma = 0.4545987 

  Initial states:
     l[0]     b[0]      s[0]     s[-1]     s[-2]     s[-3]     s[-4]    s[-5]    s[-6]
 409705.9 9097.111 -99075.37 -136602.3 -191496.1 -174530.8 -241436.7 210643.8 244644.2
    s[-7]    s[-8]    s[-9]  s[-10]    s[-11]
 145368.2 130569.6 84457.69 39131.7 -11673.71

  sigma^2:  3498869384

     AIC     AICc      BIC 
5585.278 5588.568 5641.686

Example: Corticosteroid drug sales (4/5)

h02 |>
  model(ETS(Cost)) |>
  forecast() |>
  autoplot(h02)

Example: Corticosteroid drug sales (5/5)

h02 |>
  model(
    auto = ETS(Cost),
    AAA = ETS(Cost ~ error("A") + trend("A") + season("A"))
  ) |>
  accuracy()

Model	MAE	RMSE	MAPE	MASE	RMSSE
auto	38649.04	51102.24	4.988983	0.6375806	0.6891173
AAA	43378.40	56784.23	6.047574	0.7155993	0.7657394

Your turn

Use ETS() on some of these series:

tourism, gafa_stock, pelt

Does it always give good forecasts?
Find an example where it does not work well. Can you figure out why?