<- vic_elec |>
jan14_vic_elec filter(yearmonth(Time) == yearmonth("2014 Jan")) |>
index_by(Date = as_date(Time)) |>
summarise(
Demand = sum(Demand),
Temperature = max(Temperature)
)
Homework 6
Time Series Regression
Getting started
Start with the following steps:
Go to our RStudio Server at http://turing.cornellcollege.edu:8787/
Open the respective file from the shared STA364_inst_files folder mentioned above. It will be named something like hw-0X_LAST_NAME.qmd.
Then you need to save your copy. Click File -> Save as -> Navigate to the folder
STA364_Projects
(that we share) -> Change the “LAST_NAME part of the file name to your last name -> Save.Update the top of the document, called the
YAML,
with your name.
Homework Instructions
Be sure to include the relevant R code and complete sentences answering each question (e.g., if I ask for the average, you can output the answer in R but also write a complete sentence with the answer). Be sure to save your files frequently!
From this point forward, you need to start commenting on your graphs. What do you observe? Are there trends/cycles/seasonal effects? Outliers? Other interesting features?
You will now need to make your own R chunks for problems that require code. You should name each chunk appropriately, for example, q1_a
. This process assures that when you render and get an error, the error tells you which chunk is causing the error. You cannot use duplicate chunk labels.
```{r q1_a}
```
Data for the homework is in the STA364_inst_files > data folder.
Exercises
Problems
Question 1 (fpp7_1)
Half-hourly electricity demand for Victoria, Australia is contained in vic_elec. Extract the January 2014 electricity demand, and aggregate this data to daily with daily total demands and maximum temperatures.
Plot the data and find the regression model for Demand with temperature as a predictor variable. Why is there a positive relationship?
Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?
Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was \(15C^\circ\) and compare it with the forecast if the with maximum temperature was \(35C^\circ\). Do you believe these forecasts? The following R code will get you started:
|>
jan14_vic_elec model(TSLM(Demand ~ Temperature)) |>
forecast(
new_data(jan14_vic_elec, 1) |>
mutate(Temperature = 15)
|>
) autoplot(jan14_vic_elec)
Give prediction intervals for your forecasts.
Plot Demand vs Temperature for all of the available data in vic_elec aggregated to daily total demand and maximum temperature. What does this say about your model?
Question 2 (fpp7_4)
The data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.
Explain why it is necessary to take logarithms of these data before fitting a model.
Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable.
To create a new variable, you use the mutate
function. The following code creates a variable that is TRUE (1) if it is the surfing festival and FALSE (0) otherwise. Fill the
mutate(festival = month(Month) == CHANGE & year(Month) != CHANGE) |>
- Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
Sending your fitted model object into the components()
function give a nice dataframe with the .resid column and .fitted columns.
- Do boxplots of the residuals for each month. Does this reveal any problems with the model?
To create a boxplot, use the function data |> ggplot(aes(x=variable)) + geom_boxplot()
. Swap out the data and variable.
What do the values of the coefficients tell you about each variable? Interpret coefficients for the intercept, the trend, one of the seasons and the festival variables.
What does the Ljung-Box test tell you about your model?
Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.
Question 3 (fpp7_5)
The us_gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “million barrels per day”. Consider only the data to the end of 2004.
Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.
Select the appropriate number of Fourier terms to include by minimizing the AICc value, the BIC, and maximizing adjusted \(R^2\).
The table of metrics can be found by using the
glance
function on your model object.There is a vary large number of metrics in the table. You may need to use a
select()
to pick the metrics you want to see. Their names areadj_r_squared
,AICc
andBIC
.
- Plot the residuals of the final model using the
gg_tsresiduals()
function and comment on these. Use a Ljung-Box test to check for residual autocorrelation.
Question 4 (fpp7_6)
The annual population of Afghanistan is available in the global_economy data set.
Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?
Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.
Generate forecasts from these two models for the five years after the end of the data, and comment on the results.
Submission
When you are finished with your homework, be sure to Render the final document. Once rendered, you can download your file by:
- Finding the .html file in your File pane of RStudio (on the bottom right of the screen)
- Click the check box next to the file
- Click the blue gear above and then click “Export” to download
- Submit your final HTML document to the respective assignment on Moodle