integrated (differencing to make series stationary)
MA
moving average (lagged errors as inputs)
An ARIMA model is rarely interpretable in terms of visible data structures like trend and seasonality. But it can capture a huge range of time series patterns.
Stationarity and differencing
Stationarity
Definition If \(\{y_t\}\) is a stationary time series, then for all \(s\), the distribution of \((y_t,\dots,y_{t+s})\) does not depend on \(t\).
When both seasonal and first differences are applied
it makes no difference which is done first—the result will be the same.
If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference.
It is important that if differencing is used, the differences are interpretable.
Interpretation of differencing
first differences are the change between one observation and the next;
seasonal differences are the change between one year to the next.
But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted.
Unit root tests
Statistical tests to determine the required order of differencing.
Augmented Dickey Fuller test: null hypothesis is that the data are non-stationary and non-seasonal.
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test: null hypothesis is that the data are stationary and non-seasonal.
Other tests available for seasonal data.
We will use the KPSS test.
KPSS test
google_2018 |>features(Close, unitroot_kpss)
# A tibble: 1 × 3
Symbol kpss_stat kpss_pvalue
<chr> <dbl> <dbl>
1 GOOG 0.573 0.0252
google_2018 |>features(Close, unitroot_ndiffs)
# A tibble: 1 × 2
Symbol ndiffs
<chr> <int>
1 GOOG 1
For the tourism dataset, compute the total number of trips and find an appropriate differencing (after transformation if necessary) to obtain stationary data.
Backshift notation (1/4)
A very useful notational device is the backward shift operator, \(B\), which is used as follows:
\[
B y_{t} = y_{t - 1}
\]
In other words, \(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.
Two applications of \(B\) to \(y_{t}\)shifts the data back two periods: \[
B(By_{t}) = B^{2}y_{t} = y_{t-2}
\]
For monthly data, if we wish to shift attention to “the same month last year”, then \(B^{12}\) is used, and the notation is \(B^{12}y_{t} = y_{t-12}\).
Backshift notation (2/4)
The backward shift operator is convenient for describing the process of differencing.
A first-order difference can be written as \[
y'_{t} = y_{t} - y_{t-1} = y_t - By_{t} = (1 - B)y_{t}
\]
Similarly, if second-order differences (i.e., first differences of first differences) have to be computed, then:
For monthly data, \(m=12\) and we obtain the same result as earlier.
Non-seasonal ARIMA models
Autoregressive models
Autoregressive (AR) models:
\[
y_{t} = c + \phi_{1}y_{t - 1} + \phi_{2}y_{t - 2} + \cdots + \phi_{p}y_{t - p} + \epsilon_{t},
\] where \(\epsilon_t\) is white noise. This is a multiple regression with lagged values of \(y_t\) as predictors.
Any MA(\(q\)) process can be written as an AR(\(\infty\)) process if we impose some constraints on the MA parameters.
Then the MA model is called “invertible”.
Invertible models have some mathematical properties that make them easier to use in practice.
Invertibility
General condition for invertibility Complex roots of \(1+\theta_1 z + \theta_2 z^2 + \dots + \theta_qz^q\) lie outside the unit circle on the complex plane.
For \(q=1\): \(-1<\theta_1<1\).
For \(q=2\):\(-1<\theta_2<1\qquad \theta_2+\theta_1 >-1 \qquad \theta_1 -\theta_2 < 1\).
where \(\epsilon_t\) is white noise with a standard deviation of \(2.837 = \sqrt{8.046}\).
Egyptian exports
gg_tsresiduals(fit)
Egyptian exports
augment(fit) |>features(.innov, ljung_box, lag =10, dof =4)
# A tibble: 1 × 4
Country .model lb_stat lb_pvalue
<fct> <chr> <dbl> <dbl>
1 Egypt, Arab Rep. ARIMA(Exports) 5.78 0.448
Egyptian exports
fit |>forecast(h=10) |>autoplot(global_economy) +labs(y ="% of GDP", title ="Egyptian Exports")
Understanding ARIMA models
The constant c has an important effect on the long-term forecasts obtained from these models.
If \(c=0\) and \(d=0\), the long-term forecasts will go to zero.
If \(c=0\) and \(d=1\), the long-term forecasts will go to a non-zero constant.
If \(c=0\) and \(d=2\), the long-term forecasts will follow a straight line.
If \(c\ne0\) and \(d=0\), the long-term forecasts will go to the mean of the data.
If \(c\ne0\) and \(d=1\), the long-term forecasts will follow a straight line.
If \(c\ne0\) and \(d=2\), the long-term forecasts will follow a quadratic trend. (This is not recommended, and fable will not permit it.)
Understanding ARIMA models
Forecast variance and \(d\)
The higher the value of \(d\), the more rapidly the prediction intervals increase in size.
For \(d=0\), the long-term forecast standard deviation will go to the standard deviation of the historical data.
Cyclic behaviour
For cyclic forecasts, \(p\ge2\) and some restrictions on coefficients are required.
If \(p=2\), we need \(\phi_1^2+4\phi_2<0\). Then average cycle of length \[(2\pi)/\left[\text{arc cos}(-\phi_1(1-\phi_2)/(4\phi_2))\right].\]
Estimation and order selection
Maximum likelihood estimation
Having identified the model order, we need to estimate the parameters \(c\), \(\phi_1,\dots,\phi_p\), \(\theta_1,\dots,\theta_q\).
MLE is very similar to least squares estimation obtained by minimizing
\[
\sum_{t-1}^T e_t^2
\]
The ARIMA() function allows LS or MLE estimation.
Non-linear optimization must be used in either case.
Different software will give different estimates.
Partial autocorrelations
Partial autocorrelations measure relationship
between \(y_{t}\) and \(y_{t - k}\), when the effects of other time lags — \(1, 2, 3, \dots, k - 1\) — are removed.
\[
\begin{aligned}
\alpha_k &= k\text{th partial autocorrelation coefficient}\\
&= \text{equal to the estimate of } \phi_k \text{ in regression:}\\
& y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_k y_{t-k} + \epsilon_t.
\end{aligned}
\]
Varying number of terms on RHS gives \(\alpha_k\) for different values of \(k\).
\(\alpha_1=\rho_1\)
same critical values of \(\pm 1.96/\sqrt{T}\) as for ACF.
Last significant \(\alpha_k\) indicates the order of an AR model.
where \(L\) is the maximized likelihood fitted to the differenced data, \(k=1\) if \(c \ne 0\) and \(k=0\) otherwise.
Step 1:
Select current model (with smallest AICc) from:
ARIMA\((2,d,2)\)
ARIMA\((0,d,0)\)
ARIMA\((1,d,0)\)
ARIMA\((0,d,1)\)
How does ARIMA() work? (3/7)
Step 2:
Consider variations of current model: - Vary one of \(p, q\) from current model by \(\pm1\) - \(p, q\) both vary from current model by \(\pm1\) - Include/exclude \(c\) from current model
# A mable: 4 x 3
# Key: Country, Model name [4]
Country `Model name` Orders
<fct> <chr> <model>
1 Central African Republic arima210 <ARIMA(2,1,0)>
2 Central African Republic arima013 <ARIMA(0,1,3)>
3 Central African Republic stepwise <ARIMA(2,1,2)>
4 Central African Republic search <ARIMA(3,1,0)>
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
If the data are non-stationary: take first differences of the data until the data are stationary.
Examine the ACF/PACF: Is an AR(\(p\)) or MA(\(q\)) model appropriate?
Try your chosen model(s), and use the AICc to search for a better model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
Automatic modelling procedure with ARIMA()
Plot the data. Identify any unusual observations.
If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance.
Use ARIMA to automatically select a model.
Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model.
Once the residuals look like white noise, calculate forecasts.
Modelling procedure
Forecasting
Prediction intervals
95% prediction interval
\[
\hat{y}_{T+h|T} \pm 1.96\sqrt{v_{T+h|T}}
\]
where \(v_{T+h|T}\) is estimated forecast variance.
Prediction intervals
Multi-step prediction intervals for ARIMA(0,0,\(q\)):
AR(1): Can be rewritten as MA(\(\infty\)) and use above result.
Other models beyond scope of this subject.
Prediction intervals
Prediction intervals increase in size with forecast horizon.
Prediction intervals can be difficult to calculate by hand
Calculations assume residuals are uncorrelated and normally distributed.
Prediction intervals tend to be too narrow.
the uncertainty in the parameter estimates has not been accounted for.
the ARIMA model assumes historical patterns will not change during the forecast period.
the ARIMA model assumes uncorrelated future errors
Your turn
For the GDP data (from global_economy):
fit a suitable ARIMA model to the logged data for all countries
check the residual diagnostics for Australia;
produce forecasts of your fitted model for Australia.
Backshift notation revisited
Recall Backshift notation
Backward shift operator, \(B\), which is used as follows: \[
B y_{t} = y_{t - 1} \: .
\]
\(B\), operating on \(y_{t}\), has the effect of shifting the data back one period.
For monthly data, if we wish to shift attention to “the same month last year,” then \(B^{12}\) is used, and the notation is \(B^{12}y_{t} = y_{t-12}\).
Seasonal ARIMA models
Seasonal ARIMA models
ARIMA
\(~\underbrace{(p, d, q)}\)
\(\underbrace{(P, D, Q)_{m}}\)
\({\uparrow}\)
\({\uparrow}\)
Non-seasonal part
Seasonal part of
of the model
of the model
where \(m =\) number of observations per year.
ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant)
# A mable: 3 x 2
# Key: Model name [3]
`Model name` Orders
<chr> <model>
1 arima012011 <ARIMA(0,1,2)(0,1,1)[12]>
2 arima210011 <ARIMA(2,1,0)(0,1,1)[12]>
3 auto <ARIMA(2,1,0)(1,1,1)[12]>
forecast(fit, h=36) |>filter(.model=='auto') |>autoplot(leisure) +labs(title ="US employment: leisure and hospitality", y="Number of people (millions)")