Chapter 9 ARIMA

Author

Affiliation

Tyler George

Cornell College
STA 364 Spring 2025 Block 5

Setup

library(patchwork)
library(purrr)
library(fpp3) # Primary

ARIMA models

Component	Description
AR	autoregressive (lagged observations as inputs)
I	integrated (differencing to make series stationary)
MA	moving average (lagged errors as inputs)

. . .

An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns.

Stationarity and differencing

Stationarity

Definition If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).

. . .

A stationary series is:

roughly horizontal
constant variance
no patterns predictable in the long-term

Stationary? (1/9)

gafa_stock |>
  filter(Symbol == "GOOG", year(Date) == 2018) |>
  autoplot(Close) +
  labs(y = "Google closing stock price", x = "Day")

Stationary? (2/9)

gafa_stock |>
  filter(Symbol == "GOOG", year(Date) == 2018) |>
  autoplot(difference(Close)) +
  labs(y = "Google closing stock price", x = "Day")

Stationary? (3/9)

global_economy |>
  filter(Country == "Algeria") |>
  autoplot(Exports) +
  labs(y = "% of GDP", title = "Algerian Exports")

Stationary? (4/9)

aus_production |>
  autoplot(Bricks) +
  labs(title = "Clay brick production in Australia")

Stationary? (5/9)

prices |>
  filter(year >= 1900) |>
  autoplot(eggs) +
  labs(y="$US (1993)", title="Price of a dozen eggs")

Stationary? (6/9)

aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria") |>
  autoplot(Count/1e3) +
  labs(y = "thousands", title = "Total pigs slaughtered in Victoria")

Stationary? (7/9)

aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria", year(Month) >= 2010) |>
  autoplot(Count/1e3) +
  labs(y = "thousands", title = "Total pigs slaughtered in Victoria")

Stationary? (8/9)

aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria", year(Month) >= 2015) |>
  autoplot(Count/1e3) +
  labs(y = "thousands", title = "Total pigs slaughtered in Victoria")

Stationary? (9/9)

pelt |>
  autoplot(Lynx) +
  labs(y = "Number trapped", title = "Annual Canadian Lynx Trappings")

Stationarity

Definition

If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).

Transformations help to stabilize the variance.
For ARIMA modelling, we also need to stabilize the mean.

Non-stationarity in the mean

Identifying non-stationary series

time plot.
The ACF of stationary data drops to zero relatively quickly
The ACF of non-stationary data decreases slowly.
For non-stationary data, the value of \(r_1\) is often large and positive.

Example: Google stock price (1/5)

google_2018 <- gafa_stock |>
  filter(Symbol == "GOOG", year(Date) == 2018)

Example: Google stock price (2/5)

google_2018 |>
  autoplot(Close) +
  labs(y = "Closing stock price ($USD)")

Example: Google stock price (3/5)

google_2018 |> ACF(Close) |> autoplot()

Example: Google stock price (4/5)

google_2018 |>
  autoplot(difference(Close)) +
  labs(y = "Change in Google closing stock price ($USD)")

Example: Google stock price (5/5)

google_2018 |> ACF(difference(Close)) |> autoplot()

Differencing

Differencing helps to stabilize the mean.
The differenced series is the change between each observation in the original series: \(y'_t = y_t - y_{t-1}\).
The differenced series will have only \(T-1\) values since it is not possible to calculate a difference \(y_1'\) for the first observation.

Random walk model

If differenced series is white noise with zero mean:

\(y_t-y_{t-1}=\epsilon_t\) or \(y_t=y_{t-1}+\epsilon_t\)

where \(\epsilon_t \sim NID(0,\sigma^2)\).

Very widely used for non-stationary data.
This is the model behind the naïve method.
Random walks typically have:
- long periods of apparent trends up or down
- Sudden/unpredictable changes in direction
Forecast are equal to the last observation
- future movements up or down are equally likely.

Random walk with drift model

If differenced series is white noise with non-zero mean:

\(y_t-y_{t-1}=c+\epsilon_t\) or \(y_t=c+y_{t-1}+\epsilon_t\)}

where \(\epsilon_t \sim NID(0,\sigma^2)\).

\(c\) is the average change between consecutive observations.
If \(c>0\), \(y_t\) will tend to drift upwards and vice versa.
This is the model behind the drift method.

Second-order differencing

Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:

. . .

\[ \begin{align*} y''_{t} & = y'_{t} - y'_{t - 1} \\ & = (y_t - y_{t-1}) - (y_{t-1}-y_{t-2})\\ & = y_t - 2y_{t-1} +y_{t-2}. \end{align*} \]

\(y_t''\) will have \(T-2\) values.
In practice, it is almost never necessary to go beyond second-order differences.

Seasonal differencing

A seasonal difference is the difference between an observation and the corresponding observation from the previous year.

\[ y'_t = y_t - y_{t-m} \]

where \(m=\) number of seasons.

For monthly data \(m=12\).
For quarterly data \(m=4\).

Antidiabetic drug sales (1/4)

a10 <- PBS |>
  filter(ATC2 == "A10") |>
  summarise(Cost = sum(Cost)/1e6)

Antidiabetic drug sales (2/4)

a10 |> autoplot(
  Cost
)

Antidiabetic drug sales (3/4)

a10 |> autoplot(
  log(Cost)
)

Antidiabetic drug sales (4/4)

a10 |> autoplot(
  log(Cost) |> difference(12)
)

Cortecosteroid drug sales (1/6)

h02 <- PBS |>
  filter(ATC2 == "H02") |>
  summarise(Cost = sum(Cost)/1e6)

Cortecosteroid drug sales (2/6)

h02 |> autoplot(
  Cost
)

Cortecosteroid drug sales (3/6)

h02 |> autoplot(
  log(Cost)
)

Cortecosteroid drug sales (4/6)

h02 |> autoplot(
  log(Cost) |> difference(12)
)

Cortecosteroid drug sales (5/6)

h02 |> autoplot(
  log(Cost) |> difference(12) |> difference(1)
)

Cortecosteroid drug sales (6/6)

Seasonally differenced series is closer to being stationary.
Remaining non-stationarity can be removed with further first difference.

. . .

If \(y'_t = y_t - y_{t-12}\) denotes seasonally differenced series, then twice-differenced series is

\[\begin{align*} y^*_t &= y'_t - y'_{t-1} \\ &= (y_t - y_{t-12}) - (y_{t-1} - y_{t-13}) \\ &= y_t - y_{t-1} - y_{t-12} + y_{t-13}\: . \end{align*}\]

Seasonal differencing

When both seasonal and first differences are applied

it makes no difference which is done first—the result will be the same.
If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.

. . .

It is important that if differencing is used, the differences are interpretable.

Interpretation of differencing

first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.

. . .

But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.

Unit root tests

Statistical tests to determine the required order of differencing.

Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
Other tests available for seasonal data.

We will use the KPSS test.

KPSS test

google_2018 |>
  features(Close, unitroot_kpss)

# A tibble: 1 × 3
  Symbol kpss_stat kpss_pvalue
  <chr>      <dbl>       <dbl>
1 GOOG       0.573      0.0252

google_2018 |>
  features(Close, unitroot_ndiffs)

# A tibble: 1 × 2
  Symbol ndiffs
  <chr>   <int>
1 GOOG        1

Automatically selecting differences (1/3)

STL decomposition: \(y_t = T_t+S_t+R_t\)

Seasonal strength \(F_s = \max\big(0, 1-\frac{\text{Var}(R_t)}{\text{Var}(S_t+R_t)}\big)\)

If \(F_s > 0.64\), do one seasonal difference.

Automatically selecting differences (2/3)

h02 |> mutate(log_sales = log(Cost)) |>
  features(log_sales, list(unitroot_nsdiffs, feat_stl))

# A tibble: 1 × 10
  nsdiffs trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year
    <int>          <dbl>                  <dbl>              <dbl>                <dbl>
1       1          0.957                  0.955                  6                    8
# ℹ 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
#   stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

Automatically selecting differences (3/3)

h02 |> mutate(log_sales = log(Cost)) |>
  features(log_sales, unitroot_nsdiffs)

# A tibble: 1 × 1
  nsdiffs
    <int>
1       1

h02 |> mutate(d_log_sales = difference(log(Cost), 12)) |>
  features(d_log_sales, unitroot_ndiffs)

# A tibble: 1 × 1
  ndiffs
   <int>
1      1

Your turn

For the tourism dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.

Backshift notation (1/4)

A very useful notational device is the backward shift operator, \(B\), which is used as follows:

\[ B y_{t} = y_{t - 1} \]

. . .

In other words, \(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.

. . .

Two applications of \(B\) to \(y_{t}\) shifts the data back two periods: \[ B(By_{t}) = B^{2}y_{t} = y_{t-2} \]

. . .

For monthly data, if we wish to shift attention to “the same month last year”, then \(B^{12}\) is used, and the notation is \(B^{12}y_{t} = y_{t-12}\).

Backshift notation (2/4)

The backward shift operator is convenient for describing the process of differencing.

A first-order difference can be written as \[ y'_{t} = y_{t} - y_{t-1} = y_t - By_{t} = (1 - B)y_{t} \]

. . .

Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:

\[y''_{t} = y_{t} - 2y_{t - 1} + y_{t - 2} = (1 - B)^{2} y_{t}\]

Backshift notation (3/4)

Second-order difference is denoted \((1- B)^{2}\).
Second-order difference is not the same as a second difference, which would be denoted \(1- B^{2}\);
In general, a \(d\)th-order difference can be written as

. . .

\[ (1 - B)^{d} y_{t} \]

A seasonal difference followed by a first difference can be written as

. . .

\[ (1-B)(1-B^m)y_t \]

Backshift notation (4/4)

The “backshift” notation is convenient because the terms can be multiplied together to see the combined effect.

\[\begin{align*} (1-B)(1-B^m)y_t & = (1 - B - B^m + B^{m+1})y_t \\ & = y_t-y_{t-1}-y_{t-m}+y_{t-m-1}. \end{align*}\]

For monthly data, \(m=12\) and we obtain the same result as earlier.

Non-seasonal ARIMA models

Autoregressive models

Autoregressive (AR) models:

\[ y_{t} = c + \phi_{1}y_{t - 1} + \phi_{2}y_{t - 2} + \cdots + \phi_{p}y_{t - p} + \epsilon_{t}, \] where \(\epsilon_t\) is white noise. This is a multiple regression with lagged values of \(y_t\) as predictors.

AR(1) model

\(y_{t} = 18 -0.8 y_{t - 1} + \epsilon_{t},\) \(\epsilon_t\sim N(0,1)\), \(T=100\).

AR(1) model

\(y_{t} = c + \phi_1 y_{t - 1} + \epsilon_{t}\)

When \(\phi_1=0\), \(y_t\) is equivalent to WN
When \(\phi_1=1\) and \(c=0\), \(y_t\) is equivalent to a RW
When \(\phi_1=1\) and \(c\ne0\), \(y_t\) is equivalent to a RW with drift
When \(\phi_1<0\), \(y_t\) tends to oscillate between positive and negative values.

AR(2) model

\(y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \epsilon_t\) where \(\epsilon_t\sim N(0,1)\) & \(T=100\).

Stationarity conditions

We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.

For \(p=1\): \(-1<\phi_1<1\).
For \(p=2\):\(-1<\phi_2<1\qquad \phi_2+\phi_1 < 1 \qquad \phi_2 -\phi_1 < 1\).
More complicated conditions hold for \(p\ge3\).
Estimation software takes care of this.

Moving Average (MA) models

Moving Average (MA) models:

\[ y_{t} = c + \epsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} + \cdots + \theta_{q}\varepsilon_{t - q}, \]

where \(\epsilon_t\) is white noise. This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing!

MA(1) model

\(y_t = 20 + \epsilon_t + 0.8 \varepsilon_{t-1}\) where \(\epsilon_t\sim N(0,1)\) & \(T=100\).

MA(2) model

\(y_t = \epsilon_t -\varepsilon_{t-1} + 0.8 \varepsilon_{t-2}\) where \(\epsilon_t\sim N(0,1)\) & \(T=100\).

MA(\(\infty\)) models

It is possible to write any stationary AR(\(p\)) process as an MA(\(\infty\)) process.

Example: AR(1) \[\begin{align*} y_t &= \phi_1y_{t-1} + \epsilon_t\\ &= \phi_1(\phi_1y_{t-2} + \varepsilon_{t-1}) + \epsilon_t\\ &= \phi_1^2y_{t-2} + \phi_1 \varepsilon_{t-1} + \epsilon_t\\ &= \phi_1^3y_{t-3} + \phi_1^2\varepsilon_{t-2} + \phi_1 \varepsilon_{t-1} + \epsilon_t\\ &\dots \end{align*}\] Provided \(-1<\phi_1<1\): \[y_t = \epsilon_t + \phi_1\varepsilon_{t-1}+ \phi_1^2\varepsilon_{t-2}+ \phi_1^3\varepsilon_{t-3} + \cdots\]

Invertibility

Any MA(\(q\)) process can be written as an AR(\(\infty\)) process if we impose some constraints on the MA parameters.
Then the MA model is called “invertible”.
Invertible models have some mathematical properties that make them easier to use in practice.

Invertibility

General condition for invertibility Complex roots of \(1+\theta_1 z + \theta_2 z^2 + \dots + \theta_qz^q\) lie outside the unit circle on the complex plane.

For \(q=1\): \(-1<\theta_1<1\).
For \(q=2\):\(-1<\theta_2<1\qquad \theta_2+\theta_1 >-1 \qquad \theta_1 -\theta_2 < 1\).
More complicated conditions hold for \(q\ge3\).
Estimation software takes care of this.

ARMA models

Autoregressive Moving Average models:

\[y_{t} = c + \phi_{1}y_{t - 1} + \cdots + \phi_{p}y_{t - p} + \theta_{1}\varepsilon_{t - 1} + \cdots + \theta_{q}\varepsilon_{t - q} + \epsilon_{t}.\]

Predictors include both lagged values of \(y_t\) and lagged errors.
Conditions on AR coefficients ensure stationarity.
Conditions on MA coefficients ensure invertibility.

Autoregressive Integrated Moving Average models

Combine ARMA model with differencing.
\((1-B)^d y_t\) follows an ARMA model.

ARIMA models

Autoregressive Integrated Moving Average models

ARIMA(\(p, d, q\)) model

Component	Description
AR	\(p =\) order of the autoregressive part
I	\(d =\) degree of first differencing involved
MA	\(q =\) order of the moving average part

White noise model: ARIMA(0,0,0)
Random walk: ARIMA(0,1,0) with no constant
Random walk with drift: ARIMA(0,1,0) with a constant (c)
AR(\(p\)): ARIMA(\(p\),0,0)
MA(\(q\)): ARIMA(0,0,\(q\))

Backshift notation for ARIMA

ARMA model

\[ \begin{aligned} y_{t} &= c + \phi_{1}By_{t} + \cdots + \phi_pB^py_{t} + \epsilon_{t} + \theta_{1}B\epsilon_{t} + \cdots + \theta_qB^q\epsilon_{t} \\ \text{or}\quad & (1-\phi_1B - \cdots - \phi_p B^p) y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\epsilon_t \end{aligned} \]

ARIMA(1,1,1) model

\[ \begin{array}{c c c c} (1 - \phi_{1} B) & (1 - B) y_{t} &= &c + (1 + \theta_{1} B) \epsilon_{t}\\ {\uparrow} & {\uparrow} & &{\uparrow}\\ {\text{AR(1)}} & {\text{First}} & &{\text{MA(1)}}\\ & \text{difference}\\ \end{array} \]

. . .

\[ y_t = c + y_{t-1} + \phi_1 y_{t-1}- \phi_1 y_{t-2} + \theta_1\varepsilon_{t-1} + \epsilon_t \]

General ARIMA Equation

\[ \underbrace{(1 - \phi_1 B - \cdots - \phi_p B^p)}_{\text{AR}(p)} \quad \underbrace{(1 - B)^d}_{d \text{ differences}}y_t = c+ \quad \underbrace{(1 + \theta_1 B + \cdots + \theta_q B^q)}_{\text{MA}(q)}\epsilon_t \]

R model

Intercept Form

\((1-\phi_1B - \cdots - \phi_p B^p) y_t' = c + (1 + \theta_1 B + \cdots + \theta_q B^q)\epsilon_t\)

. . .

Mean Form

\((1-\phi_1B - \cdots - \phi_p B^p)(y_t' - \mu) = (1 + \theta_1 B + \cdots + \theta_q B^q)\epsilon_t\)

\(y_t' = (1-B)^d y_t\)
\(\mu\) is the mean of \(y_t'\).
\(c = \mu(1-\phi_1 - \cdots - \phi_p )\).
fable uses the intercept form

Egyptian exports

global_economy |>
  filter(Code == "EGY") |>
  autoplot(Exports) +
  labs(y = "% of GDP", title = "Egyptian Exports")

Egyptian exports

fit <- global_economy |> filter(Code == "EGY") |>
  model(ARIMA(Exports))
report(fit)

Series: Exports 
Model: ARIMA(2,0,1) w/ mean 

Coefficients:
         ar1      ar2      ma1  constant
      1.6764  -0.8034  -0.6896    2.5623
s.e.  0.1111   0.0928   0.1492    0.1161

sigma^2 estimated as 8.046:  log likelihood=-141.57
AIC=293.13   AICc=294.29   BIC=303.43

ARIMA(2,0,1) model:

\[ y_t = 2.56 + 1.68 y_{t-1} -0.80 y_{t-2} -0.69 \varepsilon_{t-1} + \epsilon_{t}, \]

where \(\epsilon_t\) is white noise with a standard deviation of \(2.837 = \sqrt{8.046}\).

Egyptian exports

gg_tsresiduals(fit)

Egyptian exports

augment(fit) |>
  features(.innov, ljung_box, lag = 10, dof = 4)

# A tibble: 1 × 4
  Country          .model         lb_stat lb_pvalue
  <fct>            <chr>            <dbl>     <dbl>
1 Egypt, Arab Rep. ARIMA(Exports)    5.78     0.448

Egyptian exports

fit |> forecast(h=10) |>
  autoplot(global_economy) +
  labs(y = "% of GDP", title = "Egyptian Exports")

Understanding ARIMA models

The constant c has an important effect on the long-term forecasts obtained from these models.

If \(c=0\) and \(d=0\), the long-term forecasts will go to zero.
If \(c=0\) and \(d=1\), the long-term forecasts will go to a non-zero constant.
If \(c=0\) and \(d=2\), the long-term forecasts will follow a straight line.
If \(c\ne0\) and \(d=0\), the long-term forecasts will go to the mean of the data.
If \(c\ne0\) and \(d=1\), the long-term forecasts will follow a straight line.
If \(c\ne0\) and \(d=2\), the long-term forecasts will follow a quadratic trend. (This is not recommended, and fable will not permit it.)

Understanding ARIMA models

Forecast variance and \(d\)

The higher the value of \(d\), the more rapidly the prediction intervals increase in size.
For \(d=0\), the long-term forecast standard deviation will go to the standard deviation of the historical data.

. . .

Cyclic behaviour

For cyclic forecasts, \(p\ge2\) and some restrictions on coefficients are required.
If \(p=2\), we need \(\phi_1^2+4\phi_2<0\). Then average cycle of length \[(2\pi)/\left[\text{arc cos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].\]

Estimation and order selection

Maximum likelihood estimation

Having identified the model order, we need to estimate the parameters \(c\), \(\phi_1,\dots,\phi_p\), \(\theta_1,\dots,\theta_q\).

MLE is very similar to least squares estimation obtained by minimizing

. . .

\[ \sum_{t-1}^T e_t^2 \]

The ARIMA() function allows LS or MLE estimation.
Non-linear optimization must be used in either case.
Different software will give different estimates.

Partial autocorrelations

Partial autocorrelations measure relationship
between \(y_{t}\) and \(y_{t - k}\), when the effects of other time lags — \(1, 2, 3, \dots, k - 1\) — are removed.

\[ \begin{aligned} \alpha_k &= k\text{th partial autocorrelation coefficient}\\ &= \text{equal to the estimate of } \phi_k \text{ in regression:}\\ & y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k} + \epsilon_t. \end{aligned} \]

Varying number of terms on RHS gives \(\alpha_k\) for different values of \(k\).
\(\alpha_1=\rho_1\)
same critical values of \(\pm 1.96/\sqrt{T}\) as for ACF.
Last significant \(\alpha_k\) indicates the order of an AR model.

Egyptian exports

egypt <- global_economy |> filter(Code == "EGY")
egypt |> ACF(Exports) |> autoplot()
egypt |> PACF(Exports) |> autoplot()

Egyptian exports

global_economy |> filter(Code == "EGY") |>
  gg_tsdisplay(Exports, plot_type='partial')

ACF and PACF interpretation (1/4)

AR(1)

\[ \begin{aligned} \rho_k &= \phi_1^k \text{ for } k=1,2,\dots; \\ \alpha_1 &= \phi_1 \\ \alpha_k &= 0 \text{ for } k=2,3,\dots. \end{aligned} \]

So we have an AR(1) model when

autocorrelations exponentially decay
there is a single significant partial autocorrelation.

ACF and PACF interpretation (2/4)

AR(\(p\))

ACF dies out in an exponential or damped sine-wave manner
PACF has all zero spikes beyond the \(p\)th spike

. . .

So we have an AR(\(p\)) model when

the ACF is exponentially decaying or sinusoidal
there is a significant spike at lag \(p\) in PACF, but none beyond \(p\)

ACF and PACF interpretation (3/4)

MA(1)

\[ \begin{aligned} \rho_1 &= \theta_1/(1 + \theta_1^2) \qquad \rho_k = 0 \qquad \text{for } k=2,3,\dots; \\ \alpha_k &= -(-\theta_1)^k/(1 + \theta_1^2 + \dots + \theta_1^{2k}) \end{aligned} \]

So we have an MA(1) model when

the PACF is exponentially decaying and
there is a single significant spike in ACF

ACF and PACF interpretation (4/4)

MA(\(q\))

PACF dies out in an exponential or damped sine-wave manner
ACF has all zero spikes beyond the \(q\)th spike

. . .

So we have an MA(\(q\)) model when

the PACF is exponentially decaying or sinusoidal
there is a significant spike at lag \(q\) in ACF, but none beyond \(q\)

Information criteria

Akaike’s Information Criterion (AIC):

\[ \text{AIC} = -2 \log(L) + 2(p+q+k+1), \]

where \(L\) is the likelihood of the data, \(k=1\) if \(c \ne 0\) and \(k=0\) if \(c=0\).

Corrected AIC:

\[ \text{AICc} = \text{AIC} + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2}. \]

Bayesian Information Criterion:

\[ \text{BIC} = \text{AIC} + \log(T)-2. \]

. . .

Good models are obtained by minimizing either the AIC, AICc or BIC. Book authors’ preference is to use the AICc.

ARIMA modelling in R

How does ARIMA() work? (1/7)

A non-seasonal ARIMA process

\[ \phi(B)(1-B)^d y_{t} = c + \theta(B) \epsilon_t \]

. . .

Need to select appropriate orders: \(p, q, d\)

. . .

Hyndman and Khandakar (JSS, 2008) algorithm:

Select no. differences \(d\) and \(D\) via KPSS test and seasonal strength measure.
Select \(p, q\) by minimizing AICc.
Use stepwise search to traverse model space.

How does ARIMA() work? (2/7)

AICc Calculation

\[ \text{AICc} = -2 \log(L) + 2(p+q+k+1)\left[1 + \frac{(p+q+k+2)}{T-p-q-k-2}\right]. \]

where \(L\) is the maximized likelihood fitted to the differenced data, \(k=1\) if \(c \ne 0\) and \(k=0\) otherwise.

Step 1:

Select current model (with smallest AICc) from:

ARIMA\((2,d,2)\)
ARIMA\((0,d,0)\)
ARIMA\((1,d,0)\)
ARIMA\((0,d,1)\)

How does ARIMA() work? (3/7)

Step 2:

Consider variations of current model: - Vary one of \(p, q\) from current model by \(\pm1\) - \(p, q\) both vary from current model by \(\pm1\) - Include/exclude \(c\) from current model

Model with lowest AICc becomes current model.

Repeat Step 2 until no lower AICc can be found.

How does ARIMA() work? (4/7)

How does ARIMA() work? (5/7)

How does ARIMA() work? (6/7)

How does ARIMA() work? (7/7)

Egyptian exports

global_economy |> filter(Code == "EGY") |>
  gg_tsdisplay(Exports, plot_type='partial')

Egyptian exports

fit1 <- global_economy |>
  filter(Code == "EGY") |>
  model(ARIMA(Exports ~ pdq(4,0,0)))
report(fit1)

Series: Exports 
Model: ARIMA(4,0,0) w/ mean 

Coefficients:
         ar1      ar2     ar3      ar4  constant
      0.9861  -0.1715  0.1807  -0.3283    6.6922
s.e.  0.1247   0.1865  0.1865   0.1273    0.3562

sigma^2 estimated as 7.885:  log likelihood=-140.53
AIC=293.05   AICc=294.7   BIC=305.41

Egyptian exports

fit2 <- global_economy |>
  filter(Code == "EGY") |>
  model(ARIMA(Exports))
report(fit2)

Series: Exports 
Model: ARIMA(2,0,1) w/ mean 

Coefficients:
         ar1      ar2      ma1  constant
      1.6764  -0.8034  -0.6896    2.5623
s.e.  0.1111   0.0928   0.1492    0.1161

sigma^2 estimated as 8.046:  log likelihood=-141.57
AIC=293.13   AICc=294.29   BIC=303.43

Central African Republic exports (1/8)

global_economy |>
  filter(Code == "CAF") |>
  autoplot(Exports) +
  labs(title="Central African Republic exports", y="% of GDP")

Central African Republic exports (2/8)

global_economy |>
  filter(Code == "CAF") |>
  gg_tsdisplay(difference(Exports), plot_type='partial')

Central African Republic exports (3/8)

caf_fit <- global_economy |>
  filter(Code == "CAF") |>
  model(arima210 = ARIMA(Exports ~ pdq(2,1,0)),
        arima013 = ARIMA(Exports ~ pdq(0,1,3)),
        stepwise = ARIMA(Exports),
        search = ARIMA(Exports, stepwise=FALSE))

Central African Republic exports (4/8)

caf_fit |> pivot_longer(!Country, names_to = "Model name",
                         values_to = "Orders")

# A mable: 4 x 3
# Key:     Country, Model name [4]
  Country                  `Model name`         Orders
  <fct>                    <chr>               <model>
1 Central African Republic arima210     <ARIMA(2,1,0)>
2 Central African Republic arima013     <ARIMA(0,1,3)>
3 Central African Republic stepwise     <ARIMA(2,1,2)>
4 Central African Republic search       <ARIMA(3,1,0)>

Central African Republic exports (5/8)

glance(caf_fit) |> arrange(AICc) |> select(.model:BIC)

# A tibble: 4 × 6
  .model   sigma2 log_lik   AIC  AICc   BIC
  <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
1 search     6.52   -133.  274.  275.  282.
2 arima210   6.71   -134.  275.  275.  281.
3 arima013   6.54   -133.  274.  275.  282.
4 stepwise   6.42   -132.  274.  275.  284.

Central African Republic exports (6/8)

caf_fit |> select(search) |> gg_tsresiduals()

Central African Republic exports (7/8)

augment(caf_fit) |>
  filter(.model=='search') |>
  features(.innov, ljung_box, lag = 10, dof = 3)

# A tibble: 1 × 4
  Country                  .model lb_stat lb_pvalue
  <fct>                    <chr>    <dbl>     <dbl>
1 Central African Republic search    5.75     0.569

Central African Republic exports (8/8)

caf_fit |>
  forecast(h=5) |>
  filter(.model=='search') |>
  autoplot(global_economy)

Modelling procedure with `ARIMA()`

Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary.
Examine the ACF/PACF: Is an AR(\(p\)) or MA(\(q\)) model appropriate?
Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.

Automatic modelling procedure with `ARIMA()`

Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.

. . .

Use ARIMA to automatically select a model.

. . .

Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.

Modelling procedure

Forecasting

Prediction intervals

95% prediction interval

\[ \hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}} \]

where \(v_{T+h|T}\) is estimated forecast variance.

Prediction intervals

Multi-step prediction intervals for ARIMA(0,0,\(q\)):

\[ y_t = \epsilon_t + \sum_{i=1}^q \theta_i \varepsilon_{t-i}. \]

\[ v_{T|T+h} = \hat{\sigma}^2 \left[ 1 + \sum_{i=1}^{h-1} \theta_i^2\right], \qquad \text{for } h=2,3,\dots. \]

. . .

AR(1): Can be rewritten as MA(\(\infty\)) and use above result.
Other models beyond scope of this subject.

Prediction intervals

Prediction intervals increase in size with forecast horizon.
Prediction intervals can be difficult to calculate by hand
Calculations assume residuals are uncorrelated and normally distributed.
Prediction intervals tend to be too narrow.
- the uncertainty in the parameter estimates has not been accounted for.
- the ARIMA model assumes historical patterns will not change during the forecast period.
- the ARIMA model assumes uncorrelated future errors

Your turn

For the GDP data (from global_economy):

fit a suitable ARIMA model to the logged data for all countries
check the residual diagnostics for Australia;
produce forecasts of your fitted model for Australia.

Backshift notation revisited

Recall Backshift notation

Backward shift operator, \(B\), which is used as follows: \[ B y_{t} = y_{t - 1} \: . \]

. . .

\(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.

. . .

For monthly data, if we wish to shift attention to “the same month last year,” then \(B^{12}\) is used, and the notation is \(B^{12}y_{t} = y_{t-12}\).

Seasonal ARIMA models

ARIMA	\(~\underbrace{(p, d, q)}\)	\(\underbrace{(P, D, Q)_{m}}\)
	\({\uparrow}\)	\({\uparrow}\)
	Non-seasonal part	Seasonal part of
	of the model	of the model

where \(m =\) number of observations per year.

ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant)

\[ \begin{aligned} &\underbrace{(1 - \phi_1 B)}_{\text{Non-seasonal AR(1)} \phantom{X}} \underbrace{(1 - \Phi_1 B^4)}_{\text{Seasonal AR(1)} \phantom{X}} \underbrace{(1 - B)}_{\text{Non-seasonal difference} \phantom{X}} \underbrace{(1 - B^4)}_{\text{Seasonal difference} \phantom{X}} y_t = \\ &\underbrace{(1 + \theta_1 B)}_{\text{Non-seasonal MA(1)} \phantom{X}} \underbrace{(1 + \Theta_1 B^4)}_{\text{Seasonal MA(1)} \phantom{X}} \varepsilon_t \end{aligned} \]

Seasonal ARIMA models

E.g., ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant)

\[(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} =(1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\epsilon_{t}.\]

Common ARIMA models

The US Census Bureau uses the following models most often:

Model	Transformation
ARIMA(0,1,1)(0,1,1)\(_m\)	with log transformation
ARIMA(0,1,2)(0,1,1)\(_m\)	with log transformation
ARIMA(2,1,0)(0,1,1)\(_m\)	with log transformation
ARIMA(0,2,2)(0,1,1)\(_m\)	with log transformation
ARIMA(2,1,2)(0,1,1)\(_m\)	with no transformation

Seasonal ARIMA models

The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.

ARIMA(0,0,0)(0,0,1)\(_{12}\) will show:

a spike at lag 12 in the ACF but no other significant spikes.
The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, .

. . .

ARIMA(0,0,0)(1,0,0)\(_{12}\) will show:

exponential decay in the seasonal lags of the ACF
a single significant spike at lag 12 in the PACF.

US leisure employment (1/8)

leisure <- us_employment |>
  filter(Title == "Leisure and Hospitality", year(Month) > 2000) |>
  mutate(Employed = Employed/1000) |>
  select(Month, Employed)
autoplot(leisure, Employed) +
  labs(title = "US employment: leisure & hospitality", y="People (millions)")

US leisure employment (2/8)

leisure |>
  gg_tsdisplay(difference(Employed, 12), plot_type='partial', lag=36) +
  labs(title="Seasonally differenced", y="")

US leisure employment (3/8)

leisure |>
  gg_tsdisplay(difference(Employed, 12) |> difference(),
    plot_type='partial', lag=36) +
  labs(title = "Double differenced", y="")

US leisure employment (4/8)

fit <- leisure |>
  model(
    arima012011 = ARIMA(Employed ~ pdq(0,1,2) + PDQ(0,1,1)),
    arima210011 = ARIMA(Employed ~ pdq(2,1,0) + PDQ(0,1,1)),
    auto = ARIMA(Employed, stepwise = FALSE, approx = FALSE)
  )
fit |> pivot_longer(everything(), names_to = "Model name",
                     values_to = "Orders")

# A mable: 3 x 2
# Key:     Model name [3]
  `Model name`                    Orders
  <chr>                          <model>
1 arima012011  <ARIMA(0,1,2)(0,1,1)[12]>
2 arima210011  <ARIMA(2,1,0)(0,1,1)[12]>
3 auto         <ARIMA(2,1,0)(1,1,1)[12]>

US leisure employment (5/8)

glance(fit) |> arrange(AICc) |> select(.model:BIC)

# A tibble: 3 × 6
  .model       sigma2 log_lik   AIC  AICc   BIC
  <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl>
1 auto        0.00142    395. -780. -780. -763.
2 arima210011 0.00145    392. -776. -776. -763.
3 arima012011 0.00146    391. -775. -775. -761.

US leisure employment (6/8)

fit |> select(auto) |> gg_tsresiduals(lag=36)

US leisure employment (7/8)

augment(fit) |> features(.innov, ljung_box, lag=24, dof=4)

# A tibble: 3 × 3
  .model      lb_stat lb_pvalue
  <chr>         <dbl>     <dbl>
1 arima012011    22.4     0.320
2 arima210011    18.9     0.527
3 auto           16.6     0.680

US leisure employment (8/8)

forecast(fit, h=36) |>
  filter(.model=='auto') |>
  autoplot(leisure) +
  labs(title = "US employment: leisure and hospitality", y="Number of people (millions)")

Cortecosteroid drug sales (1/20)

h02 <- PBS |>
  filter(ATC2 == "H02") |>
  summarise(Cost = sum(Cost)/1e6)

Cortecosteroid drug sales (2/20)

h02 |> autoplot(
  Cost
)

Cortecosteroid drug sales (3/20)

h02 |> autoplot(
  log(Cost)
)

Cortecosteroid drug sales (4/20)

h02 |> autoplot(
  log(Cost) |> difference(12)
)

Cortecosteroid drug sales (5/20)

h02 |> gg_tsdisplay(difference(log(Cost),12),
                 lag_max = 36, plot_type = 'partial')

Cortecosteroid drug sales (6/20)

Choose \(D=1\) and \(d=0\).
Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term.
Spikes in PACF suggests possible non-seasonal AR(3) term.
Initial candidate model: ARIMA(3,0,0)(2,1,0)\(_{12}\).

Cortecosteroid drug sales (7/20)

.model	AICc
ARIMA(3,0,1)(0,1,2)[12]	-485.48
ARIMA(3,0,1)(1,1,1)[12]	-484.25
ARIMA(3,0,1)(0,1,1)[12]	-483.67
ARIMA(3,0,1)(2,1,0)[12]	-476.31
ARIMA(3,0,0)(2,1,0)[12]	-475.12
ARIMA(3,0,2)(2,1,0)[12]	-474.88
ARIMA(3,0,1)(1,1,0)[12]	-463.40

Cortecosteroid drug sales (8/20)

fit <- h02 |>
  model(best = ARIMA(log(Cost) ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
report(fit)

Series: Cost 
Model: ARIMA(3,0,1)(0,1,2)[12] 
Transformation: log(Cost) 

Coefficients:
          ar1     ar2     ar3     ma1     sma1     sma2
      -0.1603  0.5481  0.5678  0.3827  -0.5222  -0.1768
s.e.   0.1636  0.0878  0.0942  0.1895   0.0861   0.0872

sigma^2 estimated as 0.004278:  log likelihood=250.04
AIC=-486.08   AICc=-485.48   BIC=-463.28

Cortecosteroid drug sales (9/20)

gg_tsresiduals(fit)

Cortecosteroid drug sales (10/20)

augment(fit) |>
  features(.innov, ljung_box, lag = 36, dof = 6)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 best      50.7    0.0104

Cortecosteroid drug sales (11/20)

fit <- h02 |> model(auto = ARIMA(log(Cost)))
report(fit)

Series: Cost 
Model: ARIMA(2,1,0)(0,1,1)[12] 
Transformation: log(Cost) 

Coefficients:
          ar1      ar2     sma1
      -0.8491  -0.4207  -0.6401
s.e.   0.0712   0.0714   0.0694

sigma^2 estimated as 0.004387:  log likelihood=245.39
AIC=-482.78   AICc=-482.56   BIC=-469.77

Cortecosteroid drug sales (12/20)

gg_tsresiduals(fit)

Cortecosteroid drug sales (13/20)

augment(fit) |>
  features(.innov, ljung_box, lag = 36, dof = 3)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 auto      59.3   0.00332

Cortecosteroid drug sales (14/20)

fit <- h02 |>
  model(best = ARIMA(log(Cost), stepwise = FALSE,
                 approximation = FALSE,
                 order_constraint = p + q + P + Q <= 9))
report(fit)

Series: Cost 
Model: ARIMA(4,1,1)(2,1,2)[12] 
Transformation: log(Cost) 

Coefficients:
          ar1     ar2     ar3      ar4      ma1    sar1     sar2     sma1    sma2
      -0.0425  0.2098  0.2017  -0.2273  -0.7424  0.6213  -0.3832  -1.2019  0.4959
s.e.   0.2167  0.1813  0.1144   0.0810   0.2074  0.2421   0.1185   0.2491  0.2135

sigma^2 estimated as 0.004049:  log likelihood=254.31
AIC=-488.63   AICc=-487.4   BIC=-456.1

Cortecosteroid drug sales (15/20)

gg_tsresiduals(fit)

Cortecosteroid drug sales (16/20)

augment(fit) |>
  features(.innov, ljung_box, lag = 36, dof = 9)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 best      36.5     0.106

Cortecosteroid drug sales (17/20)

Training data: July 1991 to June 2006

Test data: July 2006–June 2008

fit <- h02 |>
  filter_index(~ "2006 Jun") |>
  model(
    ARIMA(log(Cost) ~ 0 + pdq(3, 0, 0) + PDQ(2, 1, 0)),
    ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(2, 1, 0)),
    ARIMA(log(Cost) ~ 0 + pdq(3, 0, 2) + PDQ(2, 1, 0)),
    ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(1, 1, 0))
    # ... #
  )

fit |>
  forecast(h = "2 years") |>
  accuracy(h02)

Cortecosteroid drug sales (18/20)

.model	RMSE
ARIMA(3,0,1)(1,1,1)[12]	0.0619
ARIMA(3,0,1)(0,1,2)[12]	0.0621
ARIMA(3,0,1)(0,1,1)[12]	0.0630
ARIMA(2,1,0)(0,1,1)[12]	0.0630
ARIMA(4,1,1)(2,1,2)[12]	0.0631
ARIMA(3,0,2)(2,1,0)[12]	0.0651
ARIMA(3,0,1)(2,1,0)[12]	0.0653
ARIMA(3,0,1)(1,1,0)[12]	0.0666
ARIMA(3,0,0)(2,1,0)[12]	0.0668

Cortecosteroid drug sales (19/20)

Models with lowest AICc values tend to give slightly better results than the other models.
AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models.
Use the best model available, even if it does not pass all tests.

Cortecosteroid drug sales (20/20)

fit <- h02 |>
  model(ARIMA(Cost ~ 0 + pdq(3,0,1) + PDQ(0,1,2)))
fit |> forecast() |> autoplot(h02) +
  labs(y = "H02 Expenditure ($AUD)")

ARIMA vs ETS

Myth that ARIMA models are more general than exponential smoothing.
Linear exponential smoothing models all special cases of ARIMA models.
Non-linear exponential smoothing models have no equivalent ARIMA counterparts.
Many ARIMA models have no exponential smoothing counterparts.
ETS models all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models have one unit root.

ARIMA vs ETS

Equivalences

ETS model	ARIMA model	Parameters
ETS(A,N,N)	ARIMA(0,1,1)	\(\theta_1 = \alpha-1\)
ETS(A,A,N)	ARIMA(0,2,2)	\(\theta_1 = \alpha+\beta-2\)
		\(\theta_2 = 1-\alpha\)
ETS(A,A,N)	ARIMA(1,1,2)	\(\phi_1=\phi\)
		\(\theta_1 = \alpha+\phi\beta-1-\phi\)
		\(\theta_2 = (1-\alpha)\phi\)
ETS(A,N,A)	ARIMA(0,0,\(m\))(0,1,0)\(_m\)
ETS(A,A,A)	ARIMA(0,1,\(m+1\))(0,1,0)\(_m\)
ETS(A,A,A)	ARIMA(1,0,\(m+1\))(0,1,0)\(_m\)

Example: Australian population

aus_economy <- global_economy |> filter(Code == "AUS") |>
  mutate(Population = Population/1e6)
aus_economy |>
  slice(-n()) |>
  stretch_tsibble(.init = 10) |>
  model(ets = ETS(Population),
        arima = ARIMA(Population)
  ) |>
  forecast(h = 1) |>
  accuracy(aus_economy) |>
  select(.model, ME:RMSSE)

# A tibble: 2 × 8
  .model     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE
  <chr>   <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima  0.0420 0.194  0.0789 0.277 0.509 0.317 0.746
2 ets    0.0202 0.0774 0.0543 0.112 0.327 0.218 0.298

Example: Australian population

aus_economy |>
  model(ETS(Population)) |>
  forecast(h = "5 years") |>
  autoplot(aus_economy) +
  labs(title = "Australian population",  y = "People (millions)")

Example: Cement production (1/9)

cement <- aus_production |>
  select(Cement) |>
  filter_index("1988 Q1" ~ .)
train <- cement |> filter_index(. ~ "2007 Q4")
fit <- train |>
  model(
    arima = ARIMA(Cement),
    ets = ETS(Cement)
  )

Example: Cement production (2/9)

fit |>
  select(arima) |>
  report()

Series: Cement 
Model: ARIMA(1,0,1)(2,1,1)[4] w/ drift 

Coefficients:
         ar1      ma1   sar1     sar2     sma1  constant
      0.8886  -0.2366  0.081  -0.2345  -0.8979    5.3884
s.e.  0.0842   0.1334  0.157   0.1392   0.1780    1.4844

sigma^2 estimated as 11456:  log likelihood=-463.52
AIC=941.03   AICc=942.68   BIC=957.35

Example: Cement production (3/9)

fit |> select(ets) |> report()

Series: Cement 
Model: ETS(M,N,M) 
  Smoothing parameters:
    alpha = 0.7533714 
    gamma = 0.0001000093 

  Initial states:
     l[0]     s[0]    s[-1]    s[-2]     s[-3]
 1694.712 1.031179 1.045209 1.011424 0.9121874

  sigma^2:  0.0034

     AIC     AICc      BIC 
1104.095 1105.650 1120.769

Example: Cement production (4/9)

gg_tsresiduals(fit |> select(arima), lag_max = 16)

Example: Cement production (5/9)

gg_tsresiduals(fit |> select(ets), lag_max = 16)

Example: Cement production (6/9)

fit |>
  select(arima) |>
  augment() |>
  features(.innov, ljung_box, lag = 16, dof = 6)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 arima     6.37     0.783

Example: Cement production (7/9)

fit |>
  select(ets) |>
  augment() |>
  features(.innov, ljung_box, lag = 16, dof = 6)

# A tibble: 1 × 3
  .model lb_stat lb_pvalue
  <chr>    <dbl>     <dbl>
1 ets       10.0     0.438

Example: Cement production (8/9)

fit |>
  forecast(h = "2 years 6 months") |>
  accuracy(cement) |>
  select(-ME, -MPE, -ACF1)

# A tibble: 2 × 7
  .model .type  RMSE   MAE  MAPE  MASE RMSSE
  <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima  Test   216.  186.  8.68  1.27  1.26
2 ets    Test   222.  191.  8.85  1.30  1.29

Example: Cement production (9/9)

fit |>
  select(arima) |>
  forecast(h="3 years") |>
  autoplot(cement) +
  labs(title = "Cement production in Australia", y="Tonnes ('000)")