Homework 5

The Forecaster’s Toolbox

Getting started

Start with the following steps:

Go to our RStudio Server at http://turing.cornellcollege.edu:8787/
Open the respective file from the shared STA364_inst_files folder mentioned above. It will be named something like hw-0X_LAST_NAME.qmd.
Then you need to save your copy. Click File -> Save as -> Navigate to the folder STA364_Projects (that we share) -> Change the “LAST_NAME part of the file name to your last name -> Save.
Update the top of the document, called the YAML, with your name.

Homework Instructions

Note

Be sure to include the relevant R code and complete sentences answering each question (e.g., if I ask for the average, you can output the answer in R but also write a complete sentence with the answer). Be sure to save your files frequently!

From this point forward, you need to start commenting on your graphs. What do you observe? Are there trends/cycles/seasonal effects? Outliers? Other interesting features?

Important

You will now need to make your own R chunks for problems that require code. You should name each chunk appropriately, for example, q1_a. This process assures that when you render and get an error, the error tells you which chunk is causing the error. You cannot use duplicate chunk labels.

```{r q1_a}

```

Data for the homework is in the STA364_inst_files > data folder.

Exercises

Reading

Read 5.1, 5.2, and 5.8.

Problems

Question 1 (fpp5_1)

Plot the time series and produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case. Explain your choice in each case with 1 sentence.

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock). NSW stands for new south wales, use a filter on territory.
Household wealth (hh_budget). Choose a country with a filter.

Question 2 (fpp5_2)

Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.
Produce forecasts using the drift method and plot them.
Show that the forecasts are identical to extending the line drawn between the first and last observations.

Question 3 (fpp5_3)

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production |>
  filter(year(Quarter) >= 1992)
# Define and estimate a model
fit <- recent_production |> model(SNAIVE(Beer))
# Look at the residuals
fit |> gg_tsresiduals()
# Look a some forecasts
fit |> forecast() |> autoplot(recent_production)

Discuss quality of this model using the residuals.

Question 4 (fpp5_6)

Are the following statements true or false? Explain your answer.

Good forecast methods should have normally distributed residuals.
A model with small residuals will give good forecasts.
The best measure of forecast accuracy is MAPE.
If your model doesn’t forecast well, you should make it more complicated.
Always choose the model with the best forecast accuracy as measured on the test set.

Question 5 (fpp5_7)

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Create a training dataset consisting of observations before 2011 using

myseries_train <- myseries |>
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <- myseries_train |>
  model(SNAIVE())

Check the residuals.

fit |> gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

Produce forecasts for the test data

fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
fc |> autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit |> accuracy()
fc |> accuracy(myseries)

How sensitive are the accuracy measures to the amount of training data used? You will need to copy some of your code above and play around with the amount of data held out and the amount used to check the forecast. Make sure to answer this using a metric that does not depend on the amount of data in your testing set.

Question 6 (fpp5_9)

Create a training set for household wealth (hh_budget) by withholding the last four years as a test set and choosing a country with a filter.
Fit all the appropriate benchmark methods to the training set and forecast the periods covered by the test set.
Compute the accuracy of your forecasts. Which method does best?
Do the residuals from the best method resemble white noise?

Question 7 (fpp5_11)

We will use the Bricks data from aus_production (Australian quarterly clay brick production 1956–2005) for this exercise.

Use an STL decomposition to calculate the trend-cycle and seasonal indices. (Experiment with having fixed or changing seasonality. This is changing the window in the season() function within the STL() function.)
Compute and plot the seasonally adjusted data.
Use a naïve method to produce forecasts of the seasonally adjusted data.
Use decomposition_model() to reseasonalise the results, giving forecasts for the original data.
Do the residuals look uncorrelated?
Repeat with a robust STL decomposition. Does it make much difference?
Compare forecasts from decomposition_model() with those from SNAIVE(), using a test set comprising the last 2 years of data. Which is better?

Submission

When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:

Finding the .html file in your File pane of RStudio (on the bottom right of the screen)
Click the check box next to the file
Click the blue gear above and then click “Export” to download
Submit your final HTML document to the respective assignment on Moodle