library(patchwork)
library(purrr)
library(fpp3) # Primary
Chapter 9 ARIMA
Setup
ARIMA models
Component | Description |
---|---|
AR | autoregressive (lagged observations as inputs) |
I | integrated (differencing to make series stationary) |
MA | moving average (lagged errors as inputs) |
. . .
An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns.
Stationarity and differencing
Stationarity
Definition If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).
. . .
A stationary series is:
- roughly horizontal
- constant variance
- no patterns predictable in the long-term
Stationary? (1/9)
|>
gafa_stock filter(Symbol == "GOOG", year(Date) == 2018) |>
autoplot(Close) +
labs(y = "Google closing stock price", x = "Day")
Stationary? (2/9)
|>
gafa_stock filter(Symbol == "GOOG", year(Date) == 2018) |>
autoplot(difference(Close)) +
labs(y = "Google closing stock price", x = "Day")
Stationary? (3/9)
|>
global_economy filter(Country == "Algeria") |>
autoplot(Exports) +
labs(y = "% of GDP", title = "Algerian Exports")
Stationary? (4/9)
|>
aus_production autoplot(Bricks) +
labs(title = "Clay brick production in Australia")
Stationary? (5/9)
|>
prices filter(year >= 1900) |>
autoplot(eggs) +
labs(y="$US (1993)", title="Price of a dozen eggs")
Stationary? (6/9)
|>
aus_livestock filter(Animal == "Pigs", State == "Victoria") |>
autoplot(Count/1e3) +
labs(y = "thousands", title = "Total pigs slaughtered in Victoria")
Stationary? (7/9)
|>
aus_livestock filter(Animal == "Pigs", State == "Victoria", year(Month) >= 2010) |>
autoplot(Count/1e3) +
labs(y = "thousands", title = "Total pigs slaughtered in Victoria")
Stationary? (8/9)
|>
aus_livestock filter(Animal == "Pigs", State == "Victoria", year(Month) >= 2015) |>
autoplot(Count/1e3) +
labs(y = "thousands", title = "Total pigs slaughtered in Victoria")
Stationary? (9/9)
|>
pelt autoplot(Lynx) +
labs(y = "Number trapped", title = "Annual Canadian Lynx Trappings")
Stationarity
Definition
If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).
Transformations help to stabilize the variance.
For ARIMA modelling, we also need to stabilize the mean.
Non-stationarity in the mean
Identifying non-stationary series
- time plot.
- The ACF of stationary data drops to zero relatively quickly
- The ACF of non-stationary data decreases slowly.
- For non-stationary data, the value of \(r_1\) is often large and positive.
Example: Google stock price (1/5)
<- gafa_stock |>
google_2018 filter(Symbol == "GOOG", year(Date) == 2018)
Example: Google stock price (2/5)
|>
google_2018 autoplot(Close) +
labs(y = "Closing stock price ($USD)")
Example: Google stock price (3/5)
|> ACF(Close) |> autoplot() google_2018
Example: Google stock price (4/5)
|>
google_2018 autoplot(difference(Close)) +
labs(y = "Change in Google closing stock price ($USD)")
Example: Google stock price (5/5)
|> ACF(difference(Close)) |> autoplot() google_2018
Differencing
- Differencing helps to stabilize the mean.
- The differenced series is the change between each observation in the original series: \(y'_t = y_t - y_{t-1}\).
- The differenced series will have only \(T-1\) values since it is not possible to calculate a difference \(y_1'\) for the first observation.
Random walk model
If differenced series is white noise with zero mean:
\(y_t-y_{t-1}=\epsilon_t\) or \(y_t=y_{t-1}+\epsilon_t\)
where \(\epsilon_t \sim NID(0,\sigma^2)\).
- Very widely used for non-stationary data.
- This is the model behind the naïve method.
- Random walks typically have:
- long periods of apparent trends up or down
- Sudden/unpredictable changes in direction
- Forecast are equal to the last observation
- future movements up or down are equally likely.
Random walk with drift model
If differenced series is white noise with non-zero mean:
\(y_t-y_{t-1}=c+\epsilon_t\) or \(y_t=c+y_{t-1}+\epsilon_t\)}
where \(\epsilon_t \sim NID(0,\sigma^2)\).
- \(c\) is the average change between consecutive observations.
- If \(c>0\), \(y_t\) will tend to drift upwards and vice versa.
- This is the model behind the drift method.
Second-order differencing
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
. . .
\[ \begin{align*} y''_{t} & = y'_{t} - y'_{t - 1} \\ & = (y_t - y_{t-1}) - (y_{t-1}-y_{t-2})\\ & = y_t - 2y_{t-1} +y_{t-2}. \end{align*} \]
- \(y_t''\) will have \(T-2\) values.
- In practice, it is almost never necessary to go beyond second-order differences.
Seasonal differencing
A seasonal difference is the difference between an observation and the corresponding observation from the previous year.
\[ y'_t = y_t - y_{t-m} \]
where \(m=\) number of seasons.
- For monthly data \(m=12\).
- For quarterly data \(m=4\).
Antidiabetic drug sales (1/4)
<- PBS |>
a10 filter(ATC2 == "A10") |>
summarise(Cost = sum(Cost)/1e6)
Antidiabetic drug sales (2/4)
|> autoplot(
a10
Cost )
Antidiabetic drug sales (3/4)
|> autoplot(
a10 log(Cost)
)
Antidiabetic drug sales (4/4)
|> autoplot(
a10 log(Cost) |> difference(12)
)
Cortecosteroid drug sales (1/6)
<- PBS |>
h02 filter(ATC2 == "H02") |>
summarise(Cost = sum(Cost)/1e6)
Cortecosteroid drug sales (2/6)
|> autoplot(
h02
Cost )
Cortecosteroid drug sales (3/6)
|> autoplot(
h02 log(Cost)
)
Cortecosteroid drug sales (4/6)
|> autoplot(
h02 log(Cost) |> difference(12)
)
Cortecosteroid drug sales (5/6)
|> autoplot(
h02 log(Cost) |> difference(12) |> difference(1)
)
Cortecosteroid drug sales (6/6)
- Seasonally differenced series is closer to being stationary.
- Remaining non-stationarity can be removed with further first difference.
. . .
If \(y'_t = y_t - y_{t-12}\) denotes seasonally differenced series, then twice-differenced series is
\[\begin{align*} y^*_t &= y'_t - y'_{t-1} \\ &= (y_t - y_{t-12}) - (y_{t-1} - y_{t-13}) \\ &= y_t - y_{t-1} - y_{t-12} + y_{t-13}\: . \end{align*}\]
Seasonal differencing
When both seasonal and first differences are applied
- it makes no difference which is done first—the result will be the same.
- If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.
. . .
It is important that if differencing is used, the differences are interpretable.
Interpretation of differencing
- first differences are the change between one observation and the next;
- seasonal differences are the change between one year to the next.
. . .
But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.
Unit root tests
Statistical tests to determine the required order of differencing.
- Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
- Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
- Other tests available for seasonal data.
We will use the KPSS test.
KPSS test
|>
google_2018 features(Close, unitroot_kpss)
# A tibble: 1 × 3
Symbol kpss_stat kpss_pvalue
<chr> <dbl> <dbl>
1 GOOG 0.573 0.0252
|>
google_2018 features(Close, unitroot_ndiffs)
# A tibble: 1 × 2
Symbol ndiffs
<chr> <int>
1 GOOG 1
Automatically selecting differences (1/3)
STL decomposition: \(y_t = T_t+S_t+R_t\)
Seasonal strength \(F_s = \max\big(0, 1-\frac{\text{Var}(R_t)}{\text{Var}(S_t+R_t)}\big)\)
If \(F_s > 0.64\), do one seasonal difference.
Automatically selecting differences (2/3)
|> mutate(log_sales = log(Cost)) |>
h02 features(log_sales, list(unitroot_nsdiffs, feat_stl))
# A tibble: 1 × 10
nsdiffs trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0.957 0.955 6 8
# ℹ 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
# stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
Automatically selecting differences (3/3)
|> mutate(log_sales = log(Cost)) |>
h02 features(log_sales, unitroot_nsdiffs)
# A tibble: 1 × 1
nsdiffs
<int>
1 1
|> mutate(d_log_sales = difference(log(Cost), 12)) |>
h02 features(d_log_sales, unitroot_ndiffs)
# A tibble: 1 × 1
ndiffs
<int>
1 1
Your turn
For the tourism
dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.
Backshift notation (1/4)
A very useful notational device is the backward shift operator, \(B\), which is used as follows:
\[ B y_{t} = y_{t - 1} \]
. . .
In other words, \(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.
. . .
Two applications of \(B\) to \(y_{t}\) shifts the data back two periods: \[ B(By_{t}) = B^{2}y_{t} = y_{t-2} \]
. . .
For monthly data, if we wish to shift attention to “the same month last year”, then \(B^{12}\) is used, and the notation is \(B^{12}y_{t} = y_{t-12}\).
Backshift notation (2/4)
The backward shift operator is convenient for describing the process of differencing.
A first-order difference can be written as \[ y'_{t} = y_{t} - y_{t-1} = y_t - By_{t} = (1 - B)y_{t} \]
. . .
Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:
\[y''_{t} = y_{t} - 2y_{t - 1} + y_{t - 2} = (1 - B)^{2} y_{t}\]
Backshift notation (3/4)
- Second-order difference is denoted \((1- B)^{2}\).
- Second-order difference is not the same as a second difference, which would be denoted \(1- B^{2}\);
- In general, a \(d\)th-order difference can be written as
. . .
\[ (1 - B)^{d} y_{t} \]
- A seasonal difference followed by a first difference can be written as
. . .
\[ (1-B)(1-B^m)y_t \]
Backshift notation (4/4)
The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.
\[\begin{align*} (1-B)(1-B^m)y_t & = (1 - B - B^m + B^{m+1})y_t \\ & = y_t-y_{t-1}-y_{t-m}+y_{t-m-1}. \end{align*}\]
For monthly data, \(m=12\) and we obtain the same result as earlier.
Non-seasonal ARIMA models
Autoregressive models
Autoregressive (AR) models:
\[ y_{t} = c + \phi_{1}y_{t - 1} + \phi_{2}y_{t - 2} + \cdots + \phi_{p}y_{t - p} + \epsilon_{t}, \] where \(\epsilon_t\) is white noise. This is a multiple regression with lagged values of \(y_t\) as predictors.
AR(1) model
\(y_{t} = 18 -0.8 y_{t - 1} + \epsilon_{t},\) \(\epsilon_t\sim N(0,1)\), \(T=100\).
AR(1) model
\(y_{t} = c + \phi_1 y_{t - 1} + \epsilon_{t}\)
- When \(\phi_1=0\), \(y_t\) is equivalent to WN
- When \(\phi_1=1\) and \(c=0\), \(y_t\) is equivalent to a RW
- When \(\phi_1=1\) and \(c\ne0\), \(y_t\) is equivalent to a RW with drift
- When \(\phi_1<0\), \(y_t\) tends to oscillate between positive and negative values.
AR(2) model
\(y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \epsilon_t\) where \(\epsilon_t\sim N(0,1)\) & \(T=100\).
Stationarity conditions
We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.
- For \(p=1\): \(-1<\phi_1<1\).
- For \(p=2\):\(-1<\phi_2<1\qquad \phi_2+\phi_1 < 1 \qquad \phi_2 -\phi_1 < 1\).
- More complicated conditions hold for \(p\ge3\).
- Estimation software takes care of this.
Moving Average (MA) models
Moving Average (MA) models:
\[ y_{t} = c + \epsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} + \cdots + \theta_{q}\varepsilon_{t - q}, \]
where \(\epsilon_t\) is white noise. This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing!
MA(1) model
\(y_t = 20 + \epsilon_t + 0.8 \varepsilon_{t-1}\) where \(\epsilon_t\sim N(0,1)\) & \(T=100\).
MA(2) model
\(y_t = \epsilon_t -\varepsilon_{t-1} + 0.8 \varepsilon_{t-2}\) where \(\epsilon_t\sim N(0,1)\) & \(T=100\).
MA(\(\infty\)) models
It is possible to write any stationary AR(\(p\)) process as an MA(\(\infty\)) process.
Example: AR(1) \[\begin{align*} y_t &= \phi_1y_{t-1} + \epsilon_t\\ &= \phi_1(\phi_1y_{t-2} + \varepsilon_{t-1}) + \epsilon_t\\ &= \phi_1^2y_{t-2} + \phi_1 \varepsilon_{t-1} + \epsilon_t\\ &= \phi_1^3y_{t-3} + \phi_1^2\varepsilon_{t-2} + \phi_1 \varepsilon_{t-1} + \epsilon_t\\ &\dots \end{align*}\] Provided \(-1<\phi_1<1\): \[y_t = \epsilon_t + \phi_1\varepsilon_{t-1}+ \phi_1^2\varepsilon_{t-2}+ \phi_1^3\varepsilon_{t-3} + \cdots\]
Invertibility
- Any MA(\(q\)) process can be written as an AR(\(\infty\)) process if we impose some constraints on the MA parameters.
- Then the MA model is called “invertible”.
- Invertible models have some mathematical properties that make them easier to use in practice.
Invertibility
General condition for invertibility Complex roots of \(1+\theta_1 z + \theta_2 z^2 + \dots + \theta_qz^q\) lie outside the unit circle on the complex plane.
- For \(q=1\): \(-1<\theta_1<1\).
- For \(q=2\):\(-1<\theta_2<1\qquad \theta_2+\theta_1 >-1 \qquad \theta_1 -\theta_2 < 1\).
- More complicated conditions hold for \(q\ge3\).
- Estimation software takes care of this.
ARMA models
Autoregressive Moving Average models:
\[y_{t} = c + \phi_{1}y_{t - 1} + \cdots + \phi_{p}y_{t - p} + \theta_{1}\varepsilon_{t - 1} + \cdots + \theta_{q}\varepsilon_{t - q} + \epsilon_{t}.\]
- Predictors include both lagged values of \(y_t\) and lagged errors.
- Conditions on AR coefficients ensure stationarity.
- Conditions on MA coefficients ensure invertibility.
Autoregressive Integrated Moving Average models
- Combine ARMA model with differencing.
- \((1-B)^d y_t\) follows an ARMA model.
ARIMA models
Autoregressive Integrated Moving Average models
ARIMA(\(p, d, q\)) model
Component | Description |
---|---|
AR | \(p =\) order of the autoregressive part |
I | \(d =\) degree of first differencing involved |
MA | \(q =\) order of the moving average part |
- White noise model: ARIMA(0,0,0)
- Random walk: ARIMA(0,1,0) with no constant
- Random walk with drift: ARIMA(0,1,0) with a constant (c)
- AR(\(p\)): ARIMA(\(p\),0,0)
- MA(\(q\)): ARIMA(0,0,\(q\))
Backshift notation for ARIMA
ARMA model
\[ \begin{aligned} y_{t} &= c + \phi_{1}By_{t} + \cdots + \phi_pB^py_{t} + \epsilon_{t} + \theta_{1}B\epsilon_{t} + \cdots + \theta_qB^q\epsilon_{t} \\ \text{or}\quad & (1-\phi_1B - \cdots - \phi_p B^p) y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\epsilon_t \end{aligned} \]
ARIMA(1,1,1) model
\[ \begin{array}{c c c c} (1 - \phi_{1} B) & (1 - B) y_{t} &= &c + (1 + \theta_{1} B) \epsilon_{t}\\ {\uparrow} & {\uparrow} & &{\uparrow}\\ {\text{AR(1)}} & {\text{First}} & &{\text{MA(1)}}\\ & \text{difference}\\ \end{array} \]
. . .
\[ y_t = c + y_{t-1} + \phi_1 y_{t-1}- \phi_1 y_{t-2} + \theta_1\varepsilon_{t-1} + \epsilon_t \]
General ARIMA Equation
\[ \underbrace{(1 - \phi_1 B - \cdots - \phi_p B^p)}_{\text{AR}(p)} \quad \underbrace{(1 - B)^d}_{d \text{ differences}}y_t = c+ \quad \underbrace{(1 + \theta_1 B + \cdots + \theta_q B^q)}_{\text{MA}(q)}\epsilon_t \]
R model
Intercept Form
\((1-\phi_1B - \cdots - \phi_p B^p) y_t' = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\epsilon_t\)
. . .
Mean Form
\((1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\epsilon_t\)
- \(y_t' = (1-B)^d y_t\)
- \(\mu\) is the mean of \(y_t'\).
- \(c = \mu(1-\phi_1 - \cdots - \phi_p )\).
fable
uses the intercept form
Egyptian exports
|>
global_economy filter(Code == "EGY") |>
autoplot(Exports) +
labs(y = "% of GDP", title = "Egyptian Exports")
Egyptian exports
<- global_economy |> filter(Code == "EGY") |>
fit model(ARIMA(Exports))
report(fit)
Series: Exports
Model: ARIMA(2,0,1) w/ mean
Coefficients:
ar1 ar2 ma1 constant
1.6764 -0.8034 -0.6896 2.5623
s.e. 0.1111 0.0928 0.1492 0.1161
sigma^2 estimated as 8.046: log likelihood=-141.57
AIC=293.13 AICc=294.29 BIC=303.43
ARIMA(2,0,1) model:
\[ y_t = 2.56 + 1.68 y_{t-1} -0.80 y_{t-2} -0.69 \varepsilon_{t-1} + \epsilon_{t}, \]
where \(\epsilon_t\) is white noise with a standard deviation of \(2.837 = \sqrt{8.046}\).
Egyptian exports
gg_tsresiduals(fit)
Egyptian exports
augment(fit) |>
features(.innov, ljung_box, lag = 10, dof = 4)
# A tibble: 1 × 4
Country .model lb_stat lb_pvalue
<fct> <chr> <dbl> <dbl>
1 Egypt, Arab Rep. ARIMA(Exports) 5.78 0.448
Egyptian exports
|> forecast(h=10) |>
fit autoplot(global_economy) +
labs(y = "% of GDP", title = "Egyptian Exports")
Understanding ARIMA models
The constant c has an important effect on the long-term forecasts obtained from these models.
If \(c=0\) and \(d=0\), the long-term forecasts will go to zero.
If \(c=0\) and \(d=1\), the long-term forecasts will go to a non-zero constant.
If \(c=0\) and \(d=2\), the long-term forecasts will follow a straight line.
If \(c\ne0\) and \(d=0\), the long-term forecasts will go to the mean of the data.
If \(c\ne0\) and \(d=1\), the long-term forecasts will follow a straight line.
If \(c\ne0\) and \(d=2\), the long-term forecasts will follow a quadratic trend. (This is not recommended, and
fable
will not permit it.)
Understanding ARIMA models
Forecast variance and \(d\)
- The higher the value of \(d\), the more rapidly the prediction intervals increase in size.
- For \(d=0\), the long-term forecast standard deviation will go to the standard deviation of the historical data.
. . .
Cyclic behaviour
- For cyclic forecasts, \(p\ge2\) and some restrictions on coefficients are required.
- If \(p=2\), we need \(\phi_1^2+4\phi_2<0\). Then average cycle of length \[(2\pi)/\left[\text{arc cos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].\]
Estimation and order selection
Maximum likelihood estimation
Having identified the model order, we need to estimate the parameters \(c\), \(\phi_1,\dots,\phi_p\), \(\theta_1,\dots,\theta_q\).
- MLE is very similar to least squares estimation obtained by minimizing
. . .
\[ \sum_{t-1}^T e_t^2 \]
- The
ARIMA()
function allows LS or MLE estimation. - Non-linear optimization must be used in either case.
- Different software will give different estimates.
Partial autocorrelations
Partial autocorrelations measure relationship
between \(y_{t}\) and \(y_{t - k}\), when the effects of other time lags — \(1, 2, 3, \dots, k - 1\) — are removed.
\[ \begin{aligned} \alpha_k &= k\text{th partial autocorrelation coefficient}\\ &= \text{equal to the estimate of } \phi_k \text{ in regression:}\\ & y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k} + \epsilon_t. \end{aligned} \]
- Varying number of terms on RHS gives \(\alpha_k\) for different values of \(k\).
- \(\alpha_1=\rho_1\)
- same critical values of \(\pm 1.96/\sqrt{T}\) as for ACF.
- Last significant \(\alpha_k\) indicates the order of an AR model.
Egyptian exports
<- global_economy |> filter(Code == "EGY")
egypt |> ACF(Exports) |> autoplot()
egypt |> PACF(Exports) |> autoplot() egypt
Egyptian exports
|> filter(Code == "EGY") |>
global_economy gg_tsdisplay(Exports, plot_type='partial')
ACF and PACF interpretation (1/4)
AR(1)
\[ \begin{aligned} \rho_k &= \phi_1^k \text{ for } k=1,2,\dots; \\ \alpha_1 &= \phi_1 \\ \alpha_k &= 0 \text{ for } k=2,3,\dots. \end{aligned} \]
So we have an AR(1) model when
- autocorrelations exponentially decay
- there is a single significant partial autocorrelation.
ACF and PACF interpretation (2/4)
AR(\(p\))
- ACF dies out in an exponential or damped sine-wave manner
- PACF has all zero spikes beyond the \(p\)th spike
. . .
So we have an AR(\(p\)) model when
- the ACF is exponentially decaying or sinusoidal
- there is a significant spike at lag \(p\) in PACF, but none beyond \(p\)
ACF and PACF interpretation (3/4)
MA(1)
\[ \begin{aligned} \rho_1 &= \theta_1/(1 + \theta_1^2) \qquad \rho_k = 0 \qquad \text{for } k=2,3,\dots; \\ \alpha_k &= -(-\theta_1)^k/(1 + \theta_1^2 + \dots + \theta_1^{2k}) \end{aligned} \]
So we have an MA(1) model when
- the PACF is exponentially decaying and
- there is a single significant spike in ACF
ACF and PACF interpretation (4/4)
MA(\(q\))
- PACF dies out in an exponential or damped sine-wave manner
- ACF has all zero spikes beyond the \(q\)th spike
. . .
So we have an MA(\(q\)) model when
- the PACF is exponentially decaying or sinusoidal
- there is a significant spike at lag \(q\) in ACF, but none beyond \(q\)
Information criteria
Akaike’s Information Criterion (AIC):
\[ \text{AIC} = -2 \log(L) + 2(p+q+k+1), \]
where \(L\) is the likelihood of the data, \(k=1\) if \(c \ne 0\) and \(k=0\) if \(c=0\).
Corrected AIC:
\[ \text{AICc} = \text{AIC} + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2}. \]
Bayesian Information Criterion:
\[ \text{BIC} = \text{AIC} + \log(T)-2. \]
. . .
Good models are obtained by minimizing either the AIC, AICc or BIC. Book authors’ preference is to use the AICc.
ARIMA modelling in R
How does ARIMA() work? (1/7)
A non-seasonal ARIMA process
\[ \phi(B)(1-B)^d y_{t} = c + \theta(B) \epsilon_t \]
. . .
Need to select appropriate orders: \(p, q, d\)
. . .
Hyndman and Khandakar (JSS, 2008) algorithm:
- Select no. differences \(d\) and \(D\) via KPSS test and seasonal strength measure.
- Select \(p, q\) by minimizing AICc.
- Use stepwise search to traverse model space.
How does ARIMA() work? (2/7)
AICc Calculation
\[ \text{AICc} = -2 \log(L) + 2(p+q+k+1)\left[1 + \frac{(p+q+k+2)}{T-p-q-k-2}\right]. \]
where \(L\) is the maximized likelihood fitted to the differenced data, \(k=1\) if \(c \ne 0\) and \(k=0\) otherwise.
Step 1:
Select current model (with smallest AICc) from:
- ARIMA\((2,d,2)\)
- ARIMA\((0,d,0)\)
- ARIMA\((1,d,0)\)
- ARIMA\((0,d,1)\)
How does ARIMA() work? (3/7)
Step 2:
Consider variations of current model: - Vary one of \(p, q\) from current model by \(\pm1\) - \(p, q\) both vary from current model by \(\pm1\) - Include/exclude \(c\) from current model
Model with lowest AICc becomes current model.
Repeat Step 2 until no lower AICc can be found.
How does ARIMA() work? (4/7)
How does ARIMA() work? (5/7)
How does ARIMA() work? (6/7)
How does ARIMA() work? (7/7)
Egyptian exports
|> filter(Code == "EGY") |>
global_economy gg_tsdisplay(Exports, plot_type='partial')
Egyptian exports
<- global_economy |>
fit1 filter(Code == "EGY") |>
model(ARIMA(Exports ~ pdq(4,0,0)))
report(fit1)
Series: Exports
Model: ARIMA(4,0,0) w/ mean
Coefficients:
ar1 ar2 ar3 ar4 constant
0.9861 -0.1715 0.1807 -0.3283 6.6922
s.e. 0.1247 0.1865 0.1865 0.1273 0.3562
sigma^2 estimated as 7.885: log likelihood=-140.53
AIC=293.05 AICc=294.7 BIC=305.41
Egyptian exports
<- global_economy |>
fit2 filter(Code == "EGY") |>
model(ARIMA(Exports))
report(fit2)
Series: Exports
Model: ARIMA(2,0,1) w/ mean
Coefficients:
ar1 ar2 ma1 constant
1.6764 -0.8034 -0.6896 2.5623
s.e. 0.1111 0.0928 0.1492 0.1161
sigma^2 estimated as 8.046: log likelihood=-141.57
AIC=293.13 AICc=294.29 BIC=303.43
Central African Republic exports (1/8)
|>
global_economy filter(Code == "CAF") |>
autoplot(Exports) +
labs(title="Central African Republic exports", y="% of GDP")
Central African Republic exports (2/8)
|>
global_economy filter(Code == "CAF") |>
gg_tsdisplay(difference(Exports), plot_type='partial')
Central African Republic exports (3/8)
<- global_economy |>
caf_fit filter(Code == "CAF") |>
model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
arima013 = ARIMA(Exports ~ pdq(0,1,3)),
stepwise = ARIMA(Exports),
search = ARIMA(Exports, stepwise=FALSE))
Central African Republic exports (4/8)
|> pivot_longer(!Country, names_to = "Model name",
caf_fit values_to = "Orders")
# A mable: 4 x 3
# Key: Country, Model name [4]
Country `Model name` Orders
<fct> <chr> <model>
1 Central African Republic arima210 <ARIMA(2,1,0)>
2 Central African Republic arima013 <ARIMA(0,1,3)>
3 Central African Republic stepwise <ARIMA(2,1,2)>
4 Central African Republic search <ARIMA(3,1,0)>
Central African Republic exports (5/8)
glance(caf_fit) |> arrange(AICc) |> select(.model:BIC)
# A tibble: 4 × 6
.model sigma2 log_lik AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 search 6.52 -133. 274. 275. 282.
2 arima210 6.71 -134. 275. 275. 281.
3 arima013 6.54 -133. 274. 275. 282.
4 stepwise 6.42 -132. 274. 275. 284.
Central African Republic exports (6/8)
|> select(search) |> gg_tsresiduals() caf_fit
Central African Republic exports (7/8)
augment(caf_fit) |>
filter(.model=='search') |>
features(.innov, ljung_box, lag = 10, dof = 3)
# A tibble: 1 × 4
Country .model lb_stat lb_pvalue
<fct> <chr> <dbl> <dbl>
1 Central African Republic search 5.75 0.569
Central African Republic exports (8/8)
|>
caf_fit forecast(h=5) |>
filter(.model=='search') |>
autoplot(global_economy)
Modelling procedure with ARIMA()
- Plot the data. Identify any unusual observations.
- If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
- If the data are non-stationary: take first differences of the data until the data are stationary.
- Examine the ACF/PACF: Is an AR(\(p\)) or MA(\(q\)) model appropriate?
- Try your chosen model(s), and use the AICc to search for a better model.
- Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
- Once the residuals look like white noise, calculate forecasts.
Automatic modelling procedure with ARIMA()
- Plot the data. Identify any unusual observations.
- If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
. . .
- Use
ARIMA
to automatically select a model.
. . .
- Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
- Once the residuals look like white noise, calculate forecasts.
Modelling procedure
Forecasting
Prediction intervals
95% prediction interval
\[ \hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}} \]
where \(v_{T+h|T}\) is estimated forecast variance.
Prediction intervals
Multi-step prediction intervals for ARIMA(0,0,\(q\)):
\[ y_t = \epsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i}. \]
\[ v_{T|T+h} = \hat{\sigma}^2 \left[ 1 + \sum_{i=1}^{h-1} \theta_i^2\right], \qquad \text{for } h=2,3,\dots. \]
. . .
- AR(1): Can be rewritten as MA(\(\infty\)) and use above result.
- Other models beyond scope of this subject.
Prediction intervals
- Prediction intervals increase in size with forecast horizon.
- Prediction intervals can be difficult to calculate by hand
- Calculations assume residuals are uncorrelated and normally distributed.
- Prediction intervals tend to be too narrow.
- the uncertainty in the parameter estimates has not been accounted for.
- the ARIMA model assumes historical patterns will not change during the forecast period.
- the ARIMA model assumes uncorrelated future errors
Your turn
For the GDP data (from global_economy
):
- fit a suitable ARIMA model to the logged data for all countries
- check the residual diagnostics for Australia;
- produce forecasts of your fitted model for Australia.
Backshift notation revisited
Recall Backshift notation
Backward shift operator, \(B\), which is used as follows: \[ B y_{t} = y_{t - 1} \: . \]
. . .
\(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.
. . .
For monthly data, if we wish to shift attention to “the same month last year,” then \(B^{12}\) is used, and the notation is \(B^{12}y_{t} = y_{t-12}\).
Seasonal ARIMA models
Seasonal ARIMA models
ARIMA | \(~\underbrace{(p, d, q)}\) | \(\underbrace{(P, D, Q)_{m}}\) |
---|---|---|
\({\uparrow}\) | \({\uparrow}\) | |
Non-seasonal part | Seasonal part of | |
of the model | of the model |
where \(m =\) number of observations per year.
ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant)
\[ \begin{aligned} &\underbrace{(1 - \phi_1 B)}_{\text{Non-seasonal AR(1)} \phantom{X}} \underbrace{(1 - \Phi_1 B^4)}_{\text{Seasonal AR(1)} \phantom{X}} \underbrace{(1 - B)}_{\text{Non-seasonal difference} \phantom{X}} \underbrace{(1 - B^4)}_{\text{Seasonal difference} \phantom{X}} y_t = \\ &\underbrace{(1 + \theta_1 B)}_{\text{Non-seasonal MA(1)} \phantom{X}} \underbrace{(1 + \Theta_1 B^4)}_{\text{Seasonal MA(1)} \phantom{X}} \varepsilon_t \end{aligned} \]
Seasonal ARIMA models
E.g., ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant)
\[(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} =(1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\epsilon_{t}.\]
Common ARIMA models
The US Census Bureau uses the following models most often:
Model | Transformation |
---|---|
ARIMA(0,1,1)(0,1,1)\(_m\) | with log transformation |
ARIMA(0,1,2)(0,1,1)\(_m\) | with log transformation |
ARIMA(2,1,0)(0,1,1)\(_m\) | with log transformation |
ARIMA(0,2,2)(0,1,1)\(_m\) | with log transformation |
ARIMA(2,1,2)(0,1,1)\(_m\) | with no transformation |
Seasonal ARIMA models
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.
ARIMA(0,0,0)(0,0,1)\(_{12}\) will show:
- a spike at lag 12 in the ACF but no other significant spikes.
- The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, .
. . .
ARIMA(0,0,0)(1,0,0)\(_{12}\) will show:
- exponential decay in the seasonal lags of the ACF
- a single significant spike at lag 12 in the PACF.
US leisure employment (1/8)
<- us_employment |>
leisure filter(Title == "Leisure and Hospitality", year(Month) > 2000) |>
mutate(Employed = Employed/1000) |>
select(Month, Employed)
autoplot(leisure, Employed) +
labs(title = "US employment: leisure & hospitality", y="People (millions)")
US leisure employment (2/8)
|>
leisure gg_tsdisplay(difference(Employed, 12), plot_type='partial', lag=36) +
labs(title="Seasonally differenced", y="")
US leisure employment (3/8)
|>
leisure gg_tsdisplay(difference(Employed, 12) |> difference(),
plot_type='partial', lag=36) +
labs(title = "Double differenced", y="")
US leisure employment (4/8)
<- leisure |>
fit model(
arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
)|> pivot_longer(everything(), names_to = "Model name",
fit values_to = "Orders")
# A mable: 3 x 2
# Key: Model name [3]
`Model name` Orders
<chr> <model>
1 arima012011 <ARIMA(0,1,2)(0,1,1)[12]>
2 arima210011 <ARIMA(2,1,0)(0,1,1)[12]>
3 auto <ARIMA(2,1,0)(1,1,1)[12]>
US leisure employment (5/8)
glance(fit) |> arrange(AICc) |> select(.model:BIC)
# A tibble: 3 × 6
.model sigma2 log_lik AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 auto 0.00142 395. -780. -780. -763.
2 arima210011 0.00145 392. -776. -776. -763.
3 arima012011 0.00146 391. -775. -775. -761.
US leisure employment (6/8)
|> select(auto) |> gg_tsresiduals(lag=36) fit
US leisure employment (7/8)
augment(fit) |> features(.innov, ljung_box, lag=24, dof=4)
# A tibble: 3 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 arima012011 22.4 0.320
2 arima210011 18.9 0.527
3 auto 16.6 0.680
US leisure employment (8/8)
forecast(fit, h=36) |>
filter(.model=='auto') |>
autoplot(leisure) +
labs(title = "US employment: leisure and hospitality", y="Number of people (millions)")
Cortecosteroid drug sales (1/20)
<- PBS |>
h02 filter(ATC2 == "H02") |>
summarise(Cost = sum(Cost)/1e6)
Cortecosteroid drug sales (2/20)
|> autoplot(
h02
Cost )
Cortecosteroid drug sales (3/20)
|> autoplot(
h02 log(Cost)
)
Cortecosteroid drug sales (4/20)
|> autoplot(
h02 log(Cost) |> difference(12)
)
Cortecosteroid drug sales (5/20)
|> gg_tsdisplay(difference(log(Cost),12),
h02 lag_max = 36, plot_type = 'partial')
Cortecosteroid drug sales (6/20)
- Choose \(D=1\) and \(d=0\).
- Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term.
- Spikes in PACF suggests possible non-seasonal AR(3) term.
- Initial candidate model: ARIMA(3,0,0)(2,1,0)\(_{12}\).
Cortecosteroid drug sales (7/20)
.model | AICc |
---|---|
ARIMA(3,0,1)(0,1,2)[12] | -485.48 |
ARIMA(3,0,1)(1,1,1)[12] | -484.25 |
ARIMA(3,0,1)(0,1,1)[12] | -483.67 |
ARIMA(3,0,1)(2,1,0)[12] | -476.31 |
ARIMA(3,0,0)(2,1,0)[12] | -475.12 |
ARIMA(3,0,2)(2,1,0)[12] | -474.88 |
ARIMA(3,0,1)(1,1,0)[12] | -463.40 |
Cortecosteroid drug sales (8/20)
<- h02 |>
fit model(best = ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
report(fit)
Series: Cost
Model: ARIMA(3,0,1)(0,1,2)[12]
Transformation: log(Cost)
Coefficients:
ar1 ar2 ar3 ma1 sma1 sma2
-0.1603 0.5481 0.5678 0.3827 -0.5222 -0.1768
s.e. 0.1636 0.0878 0.0942 0.1895 0.0861 0.0872
sigma^2 estimated as 0.004278: log likelihood=250.04
AIC=-486.08 AICc=-485.48 BIC=-463.28
Cortecosteroid drug sales (9/20)
gg_tsresiduals(fit)
Cortecosteroid drug sales (10/20)
augment(fit) |>
features(.innov, ljung_box, lag = 36, dof = 6)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 best 50.7 0.0104
Cortecosteroid drug sales (11/20)
<- h02 |> model(auto = ARIMA(log(Cost)))
fit report(fit)
Series: Cost
Model: ARIMA(2,1,0)(0,1,1)[12]
Transformation: log(Cost)
Coefficients:
ar1 ar2 sma1
-0.8491 -0.4207 -0.6401
s.e. 0.0712 0.0714 0.0694
sigma^2 estimated as 0.004387: log likelihood=245.39
AIC=-482.78 AICc=-482.56 BIC=-469.77
Cortecosteroid drug sales (12/20)
gg_tsresiduals(fit)
Cortecosteroid drug sales (13/20)
augment(fit) |>
features(.innov, ljung_box, lag = 36, dof = 3)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 auto 59.3 0.00332
Cortecosteroid drug sales (14/20)
<- h02 |>
fit model(best = ARIMA(log(Cost), stepwise = FALSE,
approximation = FALSE,
order_constraint = p + q + P + Q <= 9))
report(fit)
Series: Cost
Model: ARIMA(4,1,1)(2,1,2)[12]
Transformation: log(Cost)
Coefficients:
ar1 ar2 ar3 ar4 ma1 sar1 sar2 sma1 sma2
-0.0425 0.2098 0.2017 -0.2273 -0.7424 0.6213 -0.3832 -1.2019 0.4959
s.e. 0.2167 0.1813 0.1144 0.0810 0.2074 0.2421 0.1185 0.2491 0.2135
sigma^2 estimated as 0.004049: log likelihood=254.31
AIC=-488.63 AICc=-487.4 BIC=-456.1
Cortecosteroid drug sales (15/20)
gg_tsresiduals(fit)
Cortecosteroid drug sales (16/20)
augment(fit) |>
features(.innov, ljung_box, lag = 36, dof = 9)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 best 36.5 0.106
Cortecosteroid drug sales (17/20)
Training data: July 1991 to June 2006
Test data: July 2006–June 2008
<- h02 |>
fit filter_index(~ "2006 Jun") |>
model(
ARIMA(log(Cost) ~ 0 + pdq(3, 0, 0) + PDQ(2, 1, 0)),
ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(2, 1, 0)),
ARIMA(log(Cost) ~ 0 + pdq(3, 0, 2) + PDQ(2, 1, 0)),
ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(1, 1, 0))
# ... #
)
|>
fit forecast(h = "2 years") |>
accuracy(h02)
Cortecosteroid drug sales (18/20)
.model | RMSE |
---|---|
ARIMA(3,0,1)(1,1,1)[12] | 0.0619 |
ARIMA(3,0,1)(0,1,2)[12] | 0.0621 |
ARIMA(3,0,1)(0,1,1)[12] | 0.0630 |
ARIMA(2,1,0)(0,1,1)[12] | 0.0630 |
ARIMA(4,1,1)(2,1,2)[12] | 0.0631 |
ARIMA(3,0,2)(2,1,0)[12] | 0.0651 |
ARIMA(3,0,1)(2,1,0)[12] | 0.0653 |
ARIMA(3,0,1)(1,1,0)[12] | 0.0666 |
ARIMA(3,0,0)(2,1,0)[12] | 0.0668 |
Cortecosteroid drug sales (19/20)
- Models with lowest AICc values tend to give slightly better results than the other models.
- AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models.
- Use the best model available, even if it does not pass all tests.
Cortecosteroid drug sales (20/20)
<- h02 |>
fit model(ARIMA(Cost ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
|> forecast() |> autoplot(h02) +
fit labs(y = "H02 Expenditure ($AUD)")
ARIMA vs ETS
ARIMA vs ETS
- Myth that ARIMA models are more general than exponential smoothing.
- Linear exponential smoothing models all special cases of ARIMA models.
- Non-linear exponential smoothing models have no equivalent ARIMA counterparts.
- Many ARIMA models have no exponential smoothing counterparts.
- ETS models all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit root.
ARIMA vs ETS
Equivalences
ETS model | ARIMA model | Parameters |
---|---|---|
ETS(A,N,N) | ARIMA(0,1,1) | \(\theta_1 = \alpha-1\) |
ETS(A,A,N) | ARIMA(0,2,2) | \(\theta_1 = \alpha+\beta-2\) |
\(\theta_2 = 1-\alpha\) | ||
ETS(A,A,N) | ARIMA(1,1,2) | \(\phi_1=\phi\) |
\(\theta_1 = \alpha+\phi\beta-1-\phi\) | ||
\(\theta_2 = (1-\alpha)\phi\) | ||
ETS(A,N,A) | ARIMA(0,0,\(m\))(0,1,0)\(_m\) | |
ETS(A,A,A) | ARIMA(0,1,\(m+1\))(0,1,0)\(_m\) | |
ETS(A,A,A) | ARIMA(1,0,\(m+1\))(0,1,0)\(_m\) |
Example: Australian population
<- global_economy |> filter(Code == "AUS") |>
aus_economy mutate(Population = Population/1e6)
|>
aus_economy slice(-n()) |>
stretch_tsibble(.init = 10) |>
model(ets = ETS(Population),
arima = ARIMA(Population)
|>
) forecast(h = 1) |>
accuracy(aus_economy) |>
select(.model, ME:RMSSE)
# A tibble: 2 × 8
.model ME RMSE MAE MPE MAPE MASE RMSSE
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima 0.0420 0.194 0.0789 0.277 0.509 0.317 0.746
2 ets 0.0202 0.0774 0.0543 0.112 0.327 0.218 0.298
Example: Australian population
|>
aus_economy model(ETS(Population)) |>
forecast(h = "5 years") |>
autoplot(aus_economy) +
labs(title = "Australian population", y = "People (millions)")
Example: Cement production (1/9)
<- aus_production |>
cement select(Cement) |>
filter_index("1988 Q1" ~ .)
<- cement |> filter_index(. ~ "2007 Q4")
train <- train |>
fit model(
arima = ARIMA(Cement),
ets = ETS(Cement)
)
Example: Cement production (2/9)
|>
fit select(arima) |>
report()
Series: Cement
Model: ARIMA(1,0,1)(2,1,1)[4] w/ drift
Coefficients:
ar1 ma1 sar1 sar2 sma1 constant
0.8886 -0.2366 0.081 -0.2345 -0.8979 5.3884
s.e. 0.0842 0.1334 0.157 0.1392 0.1780 1.4844
sigma^2 estimated as 11456: log likelihood=-463.52
AIC=941.03 AICc=942.68 BIC=957.35
Example: Cement production (3/9)
|> select(ets) |> report() fit
Series: Cement
Model: ETS(M,N,M)
Smoothing parameters:
alpha = 0.7533714
gamma = 0.0001000093
Initial states:
l[0] s[0] s[-1] s[-2] s[-3]
1694.712 1.031179 1.045209 1.011424 0.9121874
sigma^2: 0.0034
AIC AICc BIC
1104.095 1105.650 1120.769
Example: Cement production (4/9)
gg_tsresiduals(fit |> select(arima), lag_max = 16)
Example: Cement production (5/9)
gg_tsresiduals(fit |> select(ets), lag_max = 16)
Example: Cement production (6/9)
|>
fit select(arima) |>
augment() |>
features(.innov, ljung_box, lag = 16, dof = 6)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 arima 6.37 0.783
Example: Cement production (7/9)
|>
fit select(ets) |>
augment() |>
features(.innov, ljung_box, lag = 16, dof = 6)
# A tibble: 1 × 3
.model lb_stat lb_pvalue
<chr> <dbl> <dbl>
1 ets 10.0 0.438
Example: Cement production (8/9)
|>
fit forecast(h = "2 years 6 months") |>
accuracy(cement) |>
select(-ME, -MPE, -ACF1)
# A tibble: 2 × 7
.model .type RMSE MAE MAPE MASE RMSSE
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima Test 216. 186. 8.68 1.27 1.26
2 ets Test 222. 191. 8.85 1.30 1.29
Example: Cement production (9/9)
|>
fit select(arima) |>
forecast(h="3 years") |>
autoplot(cement) +
labs(title = "Cement production in Australia", y="Tonnes ('000)")