library(patchwork) #only used for notes
library(gganimate) # only used for notes
library(purrr)
library(rlang)
library(magick)
library(fpp3)
Chapter 8 ETS
Setup
Exponential smoothing
Historical perspective
- Developed in the 1950s and 1960s as methods (algorithms) to produce point forecasts.
- Combine a “level”, “trend” (slope) and “seasonal” component to describe a time series.
- The rate of change of the components are controlled by “smoothing parameters”: \(\alpha\), \(\beta\) and \(\gamma\) respectively.
- Need to choose best values for the smoothing parameters (and initial states).
- Equivalent ETS state space models developed in the 1990s and 2000s.
Big idea: control the rate of change (1/2)
\(\alpha\) controls the flexibility of the level (Level - the average value for a specific time period)
- If \(\alpha = 0\), the level never updates (mean)
- If \(\alpha = 1\), the level updates completely (naive)
. . .
\(\beta\) controls the flexibility of the trend
- If \(\beta = 0\), the trend is linear
- If \(\beta = 1\), the trend changes suddenly every observation
Big idea: control the rate of change (2/2)
\(\gamma\) controls the flexibility of the seasonality
- If \(\gamma = 0\), the seasonality is fixed (seasonal means)
- If \(\gamma = 1\), the seasonality updates completely (seasonal naive)
A model for levels, trends, and seasonalities
We want a model that captures the level (\(\ell_t\)), trend (\(b_t\)) and seasonality (\(s_t\)).
. . .
How do we combine these elements?
. . .
Additively? \[ y_t = \ell_{t-1} + b_{t-1} + s_{t-m} + \varepsilon_t \]
. . .
Multiplicatively? \[ y_t = \ell_{t-1}b_{t-1}s_{t-m}(1 + \varepsilon_t) \]
. . .
Perhaps a mix of both? \[ y_t = (\ell_{t-1} + b_{t-1}) s_{t-m} + \varepsilon_t \]
. . .
How do the level, trend and seasonal components evolve over time?
ETS models
General notation
General notation | E T S | ExponenTial Smoothing | |
\(\nearrow\) | \(\uparrow\) | \(\nwarrow\) | |
Error | Trend | Season |
. . .
Error: Additive (A) or multiplicative (M)
Trend: None (N), additive (A), multiplicative (M), or damped (\(A_d\) or \(M_d\)).
Seasonality: None (N), additive (A) or multiplicative (M)
Simple exponential smoothing
Simple methods
Time series \(y_1, y_2, \dots, y_T\).
Naive (Random walk forecasts)
\[\hat{y}_{T+h|T} = y_T\]
. . .
Average forecasts
\[ \hat{y}_{T+h|T} = \frac{1}{T} \sum_{t=1}^T y_t \]
- Want something in between these methods.
- Most recent data should have more weight.
. . .
Simple exponential smoothing uses a weighted moving average with weights that decrease exponentially.
Simple Exponential Smoothing
Forecast equation
\[ \hat{y}_{T+1|T} = \alpha y_T + \alpha(1-\alpha) y_{T-1} + \alpha(1-\alpha)^2 y_{T-2} + \cdots \]
where \(0 \le \alpha \le 1\).
. . .
Weights assigned to observations for:
Observation | \(\alpha = 0.2\) | \(\alpha = 0.4\) | \(\alpha = 0.6\) | \(\alpha = 0.8\) |
---|---|---|---|---|
\(y_{T}\) | 0.2 | 0.4 | 0.6 | 0.8 |
\(y_{T-1}\) | 0.16 | 0.24 | 0.24 | 0.16 |
\(y_{T-2}\) | 0.128 | 0.144 | 0.096 | 0.032 |
\(y_{T-3}\) | 0.1024 | 0.0864 | 0.0384 | 0.0064 |
\(y_{T-4}\) | \((0.2)(0.8)^4\) | \((0.4)(0.6)^4\) | \((0.6)(0.4)^4\) | \((0.8)(0.2)^4\) |
\(y_{T-5}\) | \((0.2)(0.8)^5\) | \((0.4)(0.6)^5\) | \((0.6)(0.4)^5\) | \((0.8)(0.2)^5\) |
Simple Exponential Smoothing
<- global_economy |>
algeria_economy filter(Country == "Algeria")
Simple Exponential Smoothing
Component form
\[\begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} &= \ell_{t} \\ \text{Smoothing equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} \end{align*}\]
- \(\ell_t\) is the level (or the smoothed value) of the series at time t.
- \(\hat{y}_{t+1|t} = \alpha y_t + (1-\alpha) \hat{y}_{t|t-1}\)
- The process has to start somewhere, so we let the first fitted value at time 1 be denoted by \(\ell_0\) (which we will have to estimate) . . .
Simple Exponential Smoothing
Iterate to get exponentially weighted moving average form.
Weighted average form
\[ \hat{y}_{T+1|T} = \sum_{j=0}^{T-1} \alpha(1-\alpha)^j y_{T-j} + (1-\alpha)^T \ell_{0} \]
Optimizing smoothing parameters
- Need to choose best values for \(\alpha\) and \(\ell_0\).
- Similarly to regression, choose optimal parameters by minimizing SSE:
. . .
\[ \text{SSE}=\sum_{t=1}^T(y_t - \hat{y}{t}{t-1})^2. \]
- Unlike regression there is no closed form solution — use numerical optimization.
- For Algerian Exports example:
- \(\hat\alpha = 0.8400\)
- \(\hat\ell_0 = 39.54\)
Simple Exponential Smoothing
Models and methods
Methods
- Algorithms that return point forecasts.
. . .
Models
- Generate same point forecasts but can also generate forecast distributions.
- A stochastic (or random) data generating process that can generate an entire forecast distribution.
- Allow for “proper” model selection.
ETS(A,N,N): SES with additive errors
Component form
\[\begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} &= \ell_{t} \\ \text{Smoothing equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)\ell_{t-1} \end{align*}\]
Forecast error: \(e_t = y_t - \hat{y}_{t|t-1} = y_t - \ell_{t-1}\).
Error correction form
\[\begin{align*} y_t &= \ell_{t-1} + e_t \\ \ell_{t} &= \ell_{t-1} + \alpha (y_t - \ell_{t-1}) \\ &= \ell_{t-1} + \alpha e_t \end{align*}\]
Specify probability distribution for \(e_t\), we assume \(e_t = \varepsilon_t \sim \text{NID}(0, \sigma^2)\).
ETS(A,N,N): SES with additive errors
\[\begin{align*} \text{Measurement equation} && y_t &= \ell_{t-1} + \varepsilon_t \\ \text{State equation} && \ell_t &= \ell_{t-1} + \alpha \varepsilon_t \end{align*}\]
where \(\varepsilon_t \sim \text{NID}(0, \sigma^2)\).
- “Innovations” or “single source of error” because equations have the same error process, \(\varepsilon_t\).
- Measurement equation: relationship between observations and states.
- State equation(s): evolution of the state(s) through time.
ETS(M,N,N): SES with multiplicative errors
- Specify relative errors \(\varepsilon_t = \frac{y_t - \hat{y}_{t|t-1}}{\hat{y}_{t|t-1}} \sim \text{NID}(0, \sigma^2)\)
- Substituting \(\hat{y}_{t|t-1} = \ell_{t-1}\) gives:
- \(y_t = \ell_{t-1} + \ell_{t-1} \varepsilon_t\)
- \(e_t = y_t - \hat{y}_{t|t-1} = \ell_{t-1} \varepsilon_t\) . . .
ETS(M,N,N): SES with multiplicative errors
Measurement equation
\[\begin{align*} \text{Measurement equation} && y_t &= \ell_{t-1} (1 + \varepsilon_t) \\ \text{State equation} && \ell_t &= \ell_{t-1} (1 + \alpha \varepsilon_t) \end{align*}\]
- Models with additive and multiplicative errors with the same parameters generate the same point forecasts but different prediction intervals.
ETS(A,N,N): Specifying the model
ETS(y ~ error("A") + trend("N") + season("N"))
By default, an optimal value for \(\alpha\) and \(\ell_0\) is used.
\(\alpha\) can be chosen manually in trend()
.
trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))
Example: Algerian Exports (1/4)
<- global_economy |>
algeria_economy filter(Country == "Algeria")
<- algeria_economy |>
fit model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))
report(fit)
Series: Exports
Model: ETS(A,N,N)
Smoothing parameters:
alpha = 0.8399875
Initial states:
l[0]
39.539
sigma^2: 35.6301
AIC AICc BIC
446.7154 447.1599 452.8968
Example: Algerian Exports (2/4)
components(fit) |> autoplot()
Example: Algerian Exports (3/4)
components(fit) |>
left_join(fitted(fit), by = c("Country", ".model", "Year"))
# A dable: 59 x 7 [1Y]
# Key: Country, .model [1]
# : Exports = lag(level, 1) + remainder
Country .model Year Exports level remainder .fitted
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Algeria ANN 1959 NA 39.5 NA NA
2 Algeria ANN 1960 39.0 39.1 -0.496 39.5
3 Algeria ANN 1961 46.2 45.1 7.12 39.1
4 Algeria ANN 1962 19.8 23.8 -25.3 45.1
5 Algeria ANN 1963 24.7 24.6 0.841 23.8
6 Algeria ANN 1964 25.1 25.0 0.534 24.6
7 Algeria ANN 1965 22.6 23.0 -2.39 25.0
8 Algeria ANN 1966 26.0 25.5 3.00 23.0
9 Algeria ANN 1967 23.4 23.8 -2.07 25.5
10 Algeria ANN 1968 23.1 23.2 -0.630 23.8
# ℹ 49 more rows
Example: Algerian Exports (4/4)
|>
fit forecast(h = 5) |>
autoplot(algeria_economy) +
labs(y = "% of GDP", title = "Exports: Algeria")
Models with trend
Holt’s linear trend
Component form
\[\begin{align*} \text{Forecast} && \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ \text{Trend} && b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1} \end{align*}\]
- Two smoothing parameters \(\alpha\) and \(\beta^*\) (\(0 \le \alpha, \beta^* \le 1\)).
- \(\ell_t\) level: weighted average between \(y_t\) and one-step ahead forecast for time \(t\), \((\ell_{t-1} + b_{t-1} = \hat{y}_{t|t-1})\)
- \(b_t\) slope: weighted average of \((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of slope.
- Choose \(\alpha, \beta^*, \ell_0, b_0\) to minimize SSE.
ETS(A,A,N)
Holt’s linear method with additive errors.
- Assume \(\varepsilon_t = y_t - \ell_{t-1} - b_{t-1} \sim \text{NID}(0, \sigma^2)\).
- Substituting into the error correction equations for Holt’s linear method:
. . .
\[\begin{align*} y_t &= \ell_{t-1} + b_{t-1} + \varepsilon_t \\ \ell_t &= \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t \\ b_t &= b_{t-1} + \alpha \beta^* \varepsilon_t \end{align*}\]
- For simplicity, set \(\beta = \alpha \beta^*\).
Exponential smoothing: trend/slope
ETS(M,A,N)
Holt’s linear method with multiplicative errors.
- Assume \(\varepsilon_t=\frac{y_t-(\ell_{t-1}+b_{t-1})}{(\ell_{t-1}+b_{t-1})}\)
- Following a similar approach as above, the innovations state space model underlying Holt’s linear method with multiplicative errors is specified as
. . .
\[\begin{align*} y_t&=(\ell_{t-1}+b_{t-1})(1+\varepsilon_t)\\ \ell_t&=(\ell_{t-1}+b_{t-1})(1+\alpha \varepsilon_t)\\ b_t&=b_{t-1}+\beta(\ell_{t-1}+b_{t-1}) \varepsilon_t \end{align*}\]
where again \(\beta=\alpha \beta^*\) and \(\varepsilon_t \sim \text{NID}(0,\sigma^2)\).
ETS(A,A,N): Specifying the model
ETS(y ~ error("A") + trend("A") + season("N"))
. . .
By default, optimal values for \(\beta\) and \(b_0\) are used.
\(\beta\) can be chosen manually in trend()
.
trend("A", beta = 0.004)
trend("A", beta_range = c(0, 0.1))
Example: Australian population (1/4)
<- global_economy |> filter(Code == "AUS") |>
aus_economy mutate(Pop = Population / 1e6)
<- aus_economy |>
fit model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N")))
report(fit)
Series: Pop
Model: ETS(A,A,N)
Smoothing parameters:
alpha = 0.9999
beta = 0.3266366
Initial states:
l[0] b[0]
10.05414 0.2224818
sigma^2: 0.0041
AIC AICc BIC
-76.98569 -75.83184 -66.68347
Example: Australian population (2/4)
components(fit) |> autoplot()
Example: Australian population (3/4)
components(fit) |>
left_join(fitted(fit), by = c("Country", ".model", "Year"))
# A dable: 59 x 8 [1Y]
# Key: Country, .model [1]
# : Pop = lag(level, 1) + lag(slope, 1) + remainder
Country .model Year Pop level slope remainder .fitted
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Australia AAN 1959 NA 10.1 0.222 NA NA
2 Australia AAN 1960 10.3 10.3 0.222 -0.000145 10.3
3 Australia AAN 1961 10.5 10.5 0.217 -0.0159 10.5
4 Australia AAN 1962 10.7 10.7 0.231 0.0418 10.7
5 Australia AAN 1963 11.0 11.0 0.223 -0.0229 11.0
6 Australia AAN 1964 11.2 11.2 0.221 -0.00641 11.2
7 Australia AAN 1965 11.4 11.4 0.221 -0.000314 11.4
8 Australia AAN 1966 11.7 11.7 0.235 0.0418 11.6
9 Australia AAN 1967 11.8 11.8 0.206 -0.0869 11.9
10 Australia AAN 1968 12.0 12.0 0.208 0.00350 12.0
# ℹ 49 more rows
Example: Australian population (4/4)
|>
fit forecast(h = 10) |>
autoplot(aus_economy) +
labs(y = "Millions", title = "Population: Australia")
Damped trend method
Component form
\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi + \phi^2 + \dots + \phi^{h})b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \end{align*}\]
- Damping parameter \(0<\phi<1\).
- If \(\phi=1\), identical to Holt’s linear trend.
- As \(h\rightarrow\infty\), \(\hat{y}{T+h}{T}\rightarrow \ell_T+\phi b_T/(1-\phi)\).
- Short-run forecasts trended, long-run forecasts constant.
Your turn
Write down the model for ETS(A,\(A_d\),N)
Example: Australian population ct. (1/3)
|>
aus_economy model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) |>
forecast(h = 20) |>
autoplot(aus_economy)
Example: Australian population ct. (2/3)
<- aus_economy |>
fit filter(Year <= 2010) |>
model(
ses = ETS(Pop ~ error("A") + trend("N") + season("N")),
holt = ETS(Pop ~ error("A") + trend("A") + season("N")),
damped = ETS(Pop ~ error("A") + trend("Ad") + season("N"))
)
tidy(fit)
accuracy(fit)
Example: Australian population ct. (3/3)
term | SES | Linear trend | Damped trend |
---|---|---|---|
\(\alpha\) | 1.00 | 1.00 | 1.00 |
\(\beta^*\) | 0.30 | 0.40 | |
\(\phi\) | 0.98 | ||
NA | 0.22 | 0.25 | |
NA | 10.28 | 10.05 | 10.04 |
Training RMSE | 0.24 | 0.06 | 0.07 |
Test RMSE | 1.63 | 0.15 | 0.21 |
Test MASE | 6.18 | 0.55 | 0.75 |
Test MAPE | 6.09 | 0.55 | 0.74 |
Test MAE | 1.45 | 0.13 | 0.18 |
Your turn
prices
contains the price of a dozen eggs in the United States from 1900–1993
- Use SES and Holt’s method (with and without damping) to forecast “future” data.
Use h=100 so you can clearly see the differences between the options when plotting the forecasts.
- Which method gives the best training RMSE?
- Are these RMSE values comparable?
- Do the residuals from the best fitting method look like white noise?
Models with seasonality
Holt-Winters additive method
Holt and Winters extended Holt’s method to capture seasonality.
Component form
\[\begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma (y_{t} - \ell_{t-1} - b_{t-1}) + (1 - \gamma)s_{t-m} \end{align*}\]
- \(k=\) integer part of \((h-1)/m\). Ensures estimates from the final year are used for forecasting.
- Parameters: \(0\le \alpha\le 1\), \(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m=\) period of seasonality (e.g. \(m=4\) for quarterly data).
Holt-Winters additive method
- Seasonal component is usually expressed as \(s_{t} = \gamma^* (y_{t}-\ell_{t})+ (1-\gamma^*)s_{t-m}.\)
- Substitute in for \(\ell_t\): \(s_{t} = \gamma^*(1-\alpha) (y_{t}-\ell_{t-1}-b_{t-1})+ [1-\gamma^*(1-\alpha)]s_{t-m}\)
- We set \(\gamma=\gamma^*(1-\alpha)\).
- The usual parameter restriction is \(0\le\gamma^*\le1\), which translates to \(0\le\gamma\le(1-\alpha)\).
Exponential smoothing: seasonality
ETS(A,A,A)
Holt-Winters additive method with additive errors.
\[\begin{array}{ll} \text{Forecast equation} & \hat{y}_{t+h|t} = \ell_{t} + hb_{t} + s_{t+h-m(k+1)} \\ \text{Observation equation} & y_t = \ell_{t-1} + b_{t-1} + s_{t-m} + \varepsilon_t \\ \text{State equations} & \ell_t = \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t \\ & b_t = b_{t-1} + \beta \varepsilon_t \\ & s_t = s_{t-m} + \gamma \varepsilon_t \end{array}\]- Forecast errors: \(\varepsilon_{t} = y_t - \hat{y}_{t|t-1}\)
- \(k\) is integer part of \((h-1)/m\).
Your turn
Write down the model for ETS(A,N,A)
Holt-Winters multiplicative method
Seasonal variations change in proportion to the level of the series.
Component form
\[\begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m} \end{align*}\]
- \(k\) is integer part of \((h-1)/m\).
- Additive method: \(s_t\) in absolute terms — within each year \(\sum_i s_i \approx 0\).
- Multiplicative method: \(s_t\) in relative terms — within each year \(\sum_i s_i \approx m\).
ETS(M,A,M)
Holt-Winters multiplicative method with multiplicative errors.
\[\begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t}) s_{t+h-m(k+1)} \\ \text{Observation equation} && y_t &= (\ell_{t-1} + b_{t-1})s_{t-m}(1 + \varepsilon_t) \\ \text{State equations} && \ell_t &= (\ell_{t-1} + b_{t-1})(1 + \alpha \varepsilon_t) \\ && b_t &= b_{t-1} + \beta(\ell_{t-1} + b_{t-1}) \varepsilon_t \\ && s_t &= s_{t-m}(1 + \gamma \varepsilon_t) \end{align*}\]
- Forecast errors: \(\varepsilon_{t} = (y_t - \hat{y}_{t|t-1})/\hat{y}_{t|t-1}\)
- \(k\) is integer part of \((h-1)/m\).
Example: Australian holiday tourism (1/2)
<- tourism |>
aus_holidays filter(Purpose == "Holiday") |>
summarise(Trips = sum(Trips))
<- aus_holidays |>
fit model(
additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
)<- fit |> forecast() fc
Example: Australian holiday tourism (2/2)
|>
fc autoplot(aus_holidays, level = NULL) +
labs(y = "Thousands", title = "Overnight trips")
Estimated components (1/2)
components(fit)
# A dable: 168 x 7 [1Q]
# Key: .model [2]
# : Trips = lag(level, 1) + lag(slope, 1) + lag(season, 4) + remainder
.model Quarter Trips level slope season remainder
<chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 additive 1997 Q1 NA NA NA 1512. NA
2 additive 1997 Q2 NA NA NA -290. NA
3 additive 1997 Q3 NA NA NA -684. NA
4 additive 1997 Q4 NA 9899. -37.4 -538. NA
5 additive 1998 Q1 11806. 9964. -24.5 1512. 433.
6 additive 1998 Q2 9276. 9851. -35.6 -290. -374.
7 additive 1998 Q3 8642. 9700. -50.2 -684. -489.
8 additive 1998 Q4 9300. 9694. -44.6 -538. 188.
9 additive 1999 Q1 11172. 9652. -44.3 1512. 10.7
10 additive 1999 Q2 9608. 9676. -35.6 -290. 290.
# ℹ 158 more rows
Estimated components (2/2)
Holt-Winters damped method
Often the single most accurate forecasting method for seasonal data:
\[\begin{align*} \hat{y}_{t+h|t} &= [\ell_{t} + (\phi + \phi^2 + \dots + \phi^{h})b_{t}]s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \left(\frac{y_{t}}{s_{t-m}}\right) + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1}) \\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \\ s_{t} &= \gamma \left(\frac{y_{t}}{(\ell_{t-1} + \phi b_{t-1})}\right) + (1 - \gamma)s_{t-m} \end{align*}\]
Your turn
Apply Holt-Winters’ multiplicative method to the Gas data from aus_production
.
- Why is multiplicative seasonality necessary here?
- Experiment with making the trend damped.
- Check that the residuals from the best method look like white noise.
Holt-Winters with daily data
<- pedestrian |>
sth_cross_ped filter(
>= "2016-07-01",
Date == "Southern Cross Station"
Sensor |>
) index_by(Date) |>
summarise(Count = sum(Count) / 1000)
|>
sth_cross_ped filter(Date <= "2016-07-31") |>
model(
hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
|>
) forecast(h = "2 weeks") |>
autoplot(sth_cross_ped |> filter(Date <= "2016-08-14")) +
labs(
title = "Daily traffic: Southern Cross",
y = "Pedestrians ('000)"
)
Holt-Winters with daily data
Innovations state space models
Exponential smoothing methods
Trend Component | Seasonal | Component | |
---|---|---|---|
N (None) | A (Additive) | M (Multiplicative) | |
N (None) | (N,N) | (N,A) | (N,M) |
A (Additive) | (A,N) | (A,A) | (A,M) |
\(A_d\) (Additive damped) | (\(A_d\),N) | (\(A_d\),A) | (\(A_d\),M) |
Exponential smoothing methods
Method | Description |
---|---|
(N,N) | Simple exponential smoothing |
(A,N) | Holt’s linear method |
(\(A_d\),N) | Additive damped trend method |
(A,A) | Additive Holt-Winters’ method |
(A,M) | Multiplicative Holt-Winters’ method |
(\(A_d\),M) | Damped multiplicative Holt-Winters’ method |
. . .
ETS models
Additive Error
Trend Component | Seasonal | Component | |
---|---|---|---|
N (None) | A (Additive) | M (Multiplicative) | |
N (None) | A,N,N | A,N,A | A,N,M |
A (Additive) | A,A,N | A,A,A | A,A,M |
\(A_d\) (Additive damped) | A,\(A_d\),N | A,\(A_d\),A | A,\(A_d\),M |
. . .
Multiplicative Error
Trend Component | Seasonal | Component | |
---|---|---|---|
N (None) | A (Additive) | M (Multiplicative) | |
N (None) | M,N,N | M,N,A | M,N,M |
A (Additive) | M,A,N | M,A,A | M,A,M |
\(A_d\) (Additive damped) | M,\(A_d\),N | M,\(A_d\),A | M,\(A_d\),M |
Additive error models
Multiplicative error models
Estimating ETS models
- Smoothing parameters \(\alpha\), \(\beta\), \(\gamma\) and \(\phi\), and the initial states \(\ell_0\), \(b_0\), \(s_0,s_{-1},\dots,s_{-m+1}\) are estimated by maximizing the “likelihood” = the probability of the data arising from the specified model.
- For models with additive errors equivalent to minimizing SSE.
- For models with multiplicative errors, not equivalent to minimizing SSE.
Innovations state space models
Estimation
\[L^*(\theta,\textbf{x}_0) = T\log\!\left(\sum_{t=1}^T \varepsilon^2_t\!\right) + 2\sum_{t=1}^T \log|k(\textbf{x}_{t-1})|\]
\[= -2\log(\text{Likelihood}) + \text{constant}\]
- Estimate parameters \(\theta = (\alpha,\beta,\gamma,\phi)\) and initial states \(\textbf{x}_0 = (\ell_0,b_0,s_0,s_{-1},\dots,s_{-m+1})\) by minimizing \(L^*\).
Parameter restrictions
Usual region
- Traditional restrictions in the methods \(0< \alpha,\beta^*,\gamma^*,\phi<1\)(equations interpreted as weighted averages).
- In models we set \(\beta=\alpha\beta^*\) and \(\gamma=(1-\alpha)\gamma^*\).
- Therefore \(0< \alpha <1\), \(0 < \beta < \alpha\) and \(0< \gamma < 1-\alpha\).
- \(0.8<\phi<0.98\) — to prevent numerical difficulties.
. . .
Admissible region
- To prevent observations in the distant past having a continuing effect on current forecasts.
- Usually (but not always) less restrictive than traditional region.
- For example for ETS(A,N,N): traditional* \(0< \alpha <1\) while admissible \(0< \alpha <2\).
Model selection
Akaike’s Information Criterion
\[AIC = -2\log(\text{L}) + 2k\]
where \(L\) is the likelihood and \(k\) is the number of parameters initial states estimated in the model.
. . .
Corrected AIC
\[\text{AIC}_{\text{c}} = \text{AIC} + \frac{2k(k+1)}{T-k-1}\]
which is the AIC corrected (for small sample bias).
. . .
Bayesian Information Criterion
\[\text{BIC} = \text{AIC} + k[\log(T)-2].\]
Automatic forecasting
From Hyndman et al. (IJF, 2002):
- Apply each model that is appropriate to the data. Optimize parameters and initial values using MLE (or some other criterion).
- Select best method using AICc:
- Produce forecasts using best method.
- Obtain forecast intervals using underlying state space model.
. . .
Method performed very well in M3 competition.
Example: National populations (1/2)
<- global_economy |>
fit mutate(Pop = Population / 1e6) |>
model(ets = ETS(Pop))
fit
# A mable: 263 x 2
# Key: Country [263]
Country ets
<fct> <model>
1 Afghanistan <ETS(A,A,N)>
2 Albania <ETS(M,A,N)>
3 Algeria <ETS(M,A,N)>
4 American Samoa <ETS(M,A,N)>
5 Andorra <ETS(M,A,N)>
6 Angola <ETS(M,A,N)>
7 Antigua and Barbuda <ETS(M,A,N)>
8 Arab World <ETS(M,A,N)>
9 Argentina <ETS(A,A,N)>
10 Armenia <ETS(M,A,N)>
# ℹ 253 more rows
Example: National populations (2/2)
|>
fit forecast(h = 5)
# A fable: 1,315 x 5 [1Y]
# Key: Country, .model [263]
Country .model Year Pop
<fct> <chr> <dbl> <dist>
1 Afghani… ets 2018 N(36, 0.012)
2 Afghani… ets 2019 N(37, 0.059)
3 Afghani… ets 2020 N(38, 0.16)
4 Afghani… ets 2021 N(39, 0.35)
5 Afghani… ets 2022 N(40, 0.64)
6 Albania ets 2018 N(2.9, 0.00012)
7 Albania ets 2019 N(2.9, 6e-04)
8 Albania ets 2020 N(2.9, 0.0017)
9 Albania ets 2021 N(2.9, 0.0036)
10 Albania ets 2022 N(2.9, 0.0066)
# ℹ 1,305 more rows
# ℹ 1 more variable: .mean <dbl>
Example: Australian holiday tourism (1/9)
<- tourism |>
holidays filter(Purpose == "Holiday")
<- holidays |> model(ets = ETS(Trips))
fit fit
# A mable: 76 x 4
# Key: Region, State, Purpose [76]
Region State Purpose ets
<chr> <chr> <chr> <model>
1 Adelaide South Australia Holiday <ETS(A,N,A)>
2 Adelaide Hills South Australia Holiday <ETS(A,A,N)>
3 Alice Springs Northern Territory Holiday <ETS(M,N,A)>
4 Australia's Coral Coast Western Australia Holiday <ETS(M,N,A)>
5 Australia's Golden Outback Western Australia Holiday <ETS(M,N,M)>
6 Australia's North West Western Australia Holiday <ETS(A,N,A)>
7 Australia's South West Western Australia Holiday <ETS(M,N,M)>
8 Ballarat Victoria Holiday <ETS(M,N,A)>
9 Barkly Northern Territory Holiday <ETS(A,N,A)>
10 Barossa South Australia Holiday <ETS(A,N,N)>
# ℹ 66 more rows
Example: Australian holiday tourism (2/9)
|>
fit filter(Region == "Snowy Mountains") |>
report()
Series: Trips
Model: ETS(M,N,A)
Smoothing parameters:
alpha = 0.1571013
gamma = 0.0001000991
Initial states:
l[0] s[0] s[-1] s[-2] s[-3]
141.6782 -60.95904 130.8567 -42.23776 -27.65986
sigma^2: 0.0388
AIC AICc BIC
852.0452 853.6008 868.7194
Example: Australian holiday tourism (3/9)
|>
fit filter(Region == "Snowy Mountains") |>
components(fit)
# A dable: 84 x 9 [1Q]
# Key: Region, State, Purpose, .model [1]
# : Trips = (lag(level, 1) + lag(season, 4)) * (1 + remainder)
Region State Purpose .model Quarter Trips level season remainder
<chr> <chr> <chr> <chr> <qtr> <dbl> <dbl> <dbl> <dbl>
1 Snowy Mountains New South Wales Holiday ets 1997 Q1 NA NA -27.7 NA
2 Snowy Mountains New South Wales Holiday ets 1997 Q2 NA NA -42.2 NA
3 Snowy Mountains New South Wales Holiday ets 1997 Q3 NA NA 131. NA
4 Snowy Mountains New South Wales Holiday ets 1997 Q4 NA 142. -61.0 NA
5 Snowy Mountains New South Wales Holiday ets 1998 Q1 101. 140. -27.7 -0.113
6 Snowy Mountains New South Wales Holiday ets 1998 Q2 112. 142. -42.2 0.154
7 Snowy Mountains New South Wales Holiday ets 1998 Q3 310. 148. 131. 0.137
8 Snowy Mountains New South Wales Holiday ets 1998 Q4 89.8 148. -61.0 0.0335
9 Snowy Mountains New South Wales Holiday ets 1999 Q1 112. 147. -27.7 -0.0687
10 Snowy Mountains New South Wales Holiday ets 1999 Q2 103. 147. -42.2 -0.0199
# ℹ 74 more rows
Example: Australian holiday tourism (4/9)
|>
fit filter(Region == "Snowy Mountains") |>
components(fit) |>
autoplot()
Example: Australian holiday tourism (5/9)
|> forecast() fit
# A fable: 608 x 7 [1Q]
# Key: Region, State, Purpose, .model [76]
Region State Purpose .model Quarter
<chr> <chr> <chr> <chr> <qtr>
1 Adelaide South Australia Holiday ets 2018 Q1
2 Adelaide South Australia Holiday ets 2018 Q2
3 Adelaide South Australia Holiday ets 2018 Q3
4 Adelaide South Australia Holiday ets 2018 Q4
5 Adelaide South Australia Holiday ets 2019 Q1
6 Adelaide South Australia Holiday ets 2019 Q2
7 Adelaide South Australia Holiday ets 2019 Q3
8 Adelaide South Australia Holiday ets 2019 Q4
9 Adelaide Hills South Australia Holiday ets 2018 Q1
10 Adelaide Hills South Australia Holiday ets 2018 Q2
# ℹ 598 more rows
# ℹ 2 more variables: Trips <dist>, .mean <dbl>
Example: Australian holiday tourism (6/9)
|> forecast() |>
fit filter(Region == "Snowy Mountains") |>
autoplot(holidays) +
labs(y = "Thousands", title = "Overnight trips")
Residuals
Response residuals \[\hat{e}_t = y_t - \hat{y}_{t|t-1}\]
. . .
Innovation residuals Additive error model: \[\hat\varepsilon_t = y_t - \hat{y}_{t|t-1}\]
Multiplicative error model: \[\hat\varepsilon_t = \frac{y_t - \hat{y}_{t|t-1}}{\hat{y}_{t|t-1}}\]
Example: Australian holiday tourism (7/9)
<- tourism |>
aus_holidays filter(Purpose == "Holiday") |>
summarise(Trips = sum(Trips))
<- aus_holidays |>
fit model(ets = ETS(Trips)) |>
report()
Series: Trips
Model: ETS(M,N,M)
Smoothing parameters:
alpha = 0.3578226
gamma = 0.0009685565
Initial states:
l[0] s[0] s[-1] s[-2] s[-3]
9666.501 0.9430367 0.9268433 0.968352 1.161768
sigma^2: 0.0022
AIC AICc BIC
1331.372 1332.928 1348.046
Example: Australian holiday tourism (8/9)
residuals(fit)
residuals(fit, type = "response")
bind_rows(
residuals(fit) |> mutate(Type = "Innovation residuals") |> as_tibble(),
residuals(fit, type = "response") |>
mutate(Type = "Response residuals") |>
as_tibble()) |>
ggplot(aes(x = Quarter, y = .resid)) +
geom_line() +
facet_grid(Type ~ ., scales = "free_y") +
labs(y = "")
Example: Australian holiday tourism (9/9)
|>
fit augment()
# A tsibble: 80 x 6 [1Q]
# Key: .model [1]
.model Quarter Trips .fitted .resid .innov
<chr> <qtr> <dbl> <dbl> <dbl> <dbl>
1 ets 1998 Q1 11806. 11230. 576. 0.0513
2 ets 1998 Q2 9276. 9532. -257. -0.0269
3 ets 1998 Q3 8642. 9036. -393. -0.0435
4 ets 1998 Q4 9300. 9050. 249. 0.0275
5 ets 1999 Q1 11172. 11260. -88.0 -0.00781
6 ets 1999 Q2 9608. 9358. 249. 0.0266
7 ets 1999 Q3 8914. 9042. -129. -0.0142
8 ets 1999 Q4 9026. 9154. -129. -0.0140
9 ets 2000 Q1 11071. 11221. -150. -0.0134
10 ets 2000 Q2 9196. 9308. -111. -0.0120
# ℹ 70 more rows
. . .
Innovation residuals (.innov
) are given by \(\hat{\varepsilon}_t\) while regular residuals (.resid
) are \(y_t - \hat{y}_{t-1}\). They are different when the model has multiplicative errors.
Some unstable models
- Some of the combinations of (Error, Trend, Seasonal) can lead to numerical difficulties; see equations with division by a state.
- These are: ETS(A,N,M), ETS(A,A,M), ETS(A,\(A_d\),M).
- Models with multiplicative errors are useful for strictly positive data, but are not numerically stable with data containing zeros or negative values. In that case only the six fully additive models will be applied.
Exponential smoothing models
Additive Error
Trend | N (None) | A (Additive) | M (Multiplicative) |
---|---|---|---|
N (None) | A,N,N | A,N,A | |
A (Additive) | A,A,N | A,A,A | |
\(A_d\) (Additive damped) | A,\(A_d\),N | A,\(A_d\),A |
Multiplicative Error
Trend | N (None) | A (Additive) | M (Multiplicative) |
---|---|---|---|
N (None) | M,N,N | M,N,A | M,N,M |
A (Additive) | M,A,N | M,A,A | M,A,M |
\(A_d\) (Additive damped) | M,\(A_d\),N | M,\(A_d\),A | M,\(A_d\),M |
Forecasting with exponential smoothing
Forecasting with ETS models
Traditional point forecasts: iterate the equations for \(t=T+1,T+2,\dots,T+h\) and set all \(\varepsilon_t=0\) for \(t>T\).
- Not the same as \(\text{E}(y_{t+h} | \textbf{x}_t)\) unless seasonality is additive.
fable
uses \(\text{E}(y_{t+h} | \textbf{x}_t)\).- Point forecasts for ETS(A,*,*) are identical to ETS(M,*,*) if the parameters are the same.
Forecasting with ETS models
Prediction intervals: can only be generated using the models.
- The prediction intervals will differ between models with additive and multiplicative errors.
- Exact formulae for some models.
- More general to simulate future sample paths, conditional on the last estimate of the states, and to obtain prediction intervals from the percentiles of these simulated future paths.
Prediction intervals
PI for most ETS models: \(\hat{y}_{T+h|T} \pm c \sigma_h\), where \(c\) depends on coverage probability and \(\sigma_h\) is forecast standard deviation.
We can see them in the book HERE.
Example: Corticosteroid drug sales (1/5)
<- PBS |>
h02 filter(ATC2 == "H02") |>
summarise(Cost = sum(Cost))
|> autoplot(Cost) h02
Example: Corticosteroid drug sales (2/5)
|>
h02 model(ETS(Cost)) |>
report()
Series: Cost
Model: ETS(M,Ad,M)
Smoothing parameters:
alpha = 0.3071016
beta = 0.0001006793
gamma = 0.0001007181
phi = 0.977528
Initial states:
l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6]
417268.7 8205.82 0.8716807 0.8259747 0.7562808 0.7733338 0.6872373 1.283821 1.324616
s[-7] s[-8] s[-9] s[-10] s[-11]
1.180067 1.163601 1.104801 1.047963 0.9806235
sigma^2: 0.0046
AIC AICc BIC
5515.212 5518.909 5574.938
Example: Corticosteroid drug sales (3/5)
|>
h02 model(ETS(Cost ~ error("A") + trend("A") + season("A"))) |>
report()
Series: Cost
Model: ETS(A,A,A)
Smoothing parameters:
alpha = 0.1702163
beta = 0.006310854
gamma = 0.4545987
Initial states:
l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6]
409705.9 9097.111 -99075.37 -136602.3 -191496.1 -174530.8 -241436.7 210643.8 244644.2
s[-7] s[-8] s[-9] s[-10] s[-11]
145368.2 130569.6 84457.69 39131.7 -11673.71
sigma^2: 3498869384
AIC AICc BIC
5585.278 5588.568 5641.686
Example: Corticosteroid drug sales (4/5)
|>
h02 model(ETS(Cost)) |>
forecast() |>
autoplot(h02)
Example: Corticosteroid drug sales (5/5)
|>
h02 model(
auto = ETS(Cost),
AAA = ETS(Cost ~ error("A") + trend("A") + season("A"))
|>
) accuracy()
Model | MAE | RMSE | MAPE | MASE | RMSSE |
---|---|---|---|---|---|
auto | 38649.04 | 51102.24 | 4.988983 | 0.6375806 | 0.6891173 |
AAA | 43378.40 | 56784.23 | 6.047574 | 0.7155993 | 0.7657394 |
Your turn
- Use
ETS()
on some of these series:
tourism
,gafa_stock
,pelt
Does it always give good forecasts?
Find an example where it does not work well. Can you figure out why?