Time Series Applications Ch1 Getting Started

Author
Affiliation

Tyler George

Cornell College
STA 364 Spring 2025 Block 5

Course Information

Meet the prof

Dr. Tyler George

West 311

Headshot of Dr. Tyler George

Meet each other!

Where is?…

Course Website

stats-tgeorge.github.io/STA364_TSApps/

  • All course materials (slides, exams, etc)
  • Links (book, data, and more)

Moodle

  • Submissions
  • Gradebook

Syllabus

Let’s open it up!

Diversity + inclusion

It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.

  • Please let me know your preferred name and pronouns on the Getting to know you survey.
  • If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. I want to be a resource for you.
  • I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it.

Collaboration policy

  • Only work that is clearly assigned as team work should be completed collaboratively.

  • Some labs/activities will be completed in groups. You should work with each other during, and sometimes outside of class, to complete the labs.

  • Homework must be submitted individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.

  • Exams must be completed individually. You may not discuss any aspect of the exam with peers. If you have questions, email me, especially if you get stuck on an usual problem (not a coding error).

Sharing / reusing code policy

  • I are aware that a huge volume of code is available on the web, and many tasks may have solutions posted.

  • Unless explicitly stated otherwise, this course’s policy is that may make use of any online resources (e.g. RStudio Community, StackOverflow, etc.) but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solution(s). Often on exams, I will require you use code provided in class.

  • Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source

  • You do not need to cite code I provide in class or in the notes.

Five tips for success

  1. Complete all the preparation work before class (readings).

  2. Ask questions.

  3. Do the readings.

  4. Do the labs/activities.

  5. Do the Homework

  6. Don’t procrastinate! There is no time for falling behind on the block!

Software

Excel - not…

An Excel window with data about countries

R

An R shell

RStudio

An RStudio window

R and RStudio

R logo

  • R is an open-source statistical programming language
  • R is also an environment for statistical computing and graphics
  • It’s easily extensible with packages

RStudio logo

  • RStudio is a convenient interface for R called an IDE (integrated development environment), e.g. “I write R code in the RStudio IDE”
  • RStudio is not a requirement for programming with R, but it’s very commonly used by R programmers and data scientists

R vs. RStudio

On the left: a car engine. On the right: a car dashboard. The engine is labelled R. The dashboard is labelled RStudio.

Source: Modern Dive.

R packages

  • Packages: Fundamental units of reproducible R code, including reusable R functions, the documentation that describes how to use them, and sample data1

  • As of March 14th, 2024, there are 20,582 R packages available on CRAN (the Comprehensive R Archive Network)2

  • We’re going to work with a small (but important) subset of these!

1 Wickham and Bryan, R Packages.

2 CRAN contributed packages.

Tour: R + RStudio

Option 1:

Sit back and enjoy the show!

Option 2:

Tour recap: R + RStudio

A short list (for now) of R essentials

  • Functions are (most often) verbs, followed by what they will be applied to in parentheses:

. . .

do_this(to_this)
do_that(to_this, to_that, with_those)
  • Packages are installed with the install.packages() function and loaded with the library function, once per session:

. . .

install.packages("package_name")
library(package_name)

R essentials (continued)

  • Columns (variables) in data frames are accessed with $:

. . .

dataframe$var_name
  • Object documentation can be accessed with ?

. . .

?mean

tidyverse

Hex logos for dplyr, ggplot2, forcats, tibble, readr, stringr, tidyr, and purrr

tidyverse.org

  • The tidyverse is an opinionated collection of R packages designed for data science
  • All packages share an underlying philosophy and a common grammar

Quarto

Quarto

  • Fully reproducible reports – each time you render the analysis is ran from the beginning
  • Code goes in chunks narrative goes outside of chunks
  • A visual editor for a familiar / Google docs-like editing experience

Tour: Quarto

Option 1:

Sit back and enjoy the show!

Option 2:

Tour recap: Quarto

RStudio IDE with a Quarto document, source code on the left and output on the right. Annotated to show the YAML, a link, a header, and a code chunk.

Environments

Important

The environment of your Quarto document is separate from the Console!

Remember this, and expect it to bite you a few times as you’re learning to work with Quarto!

Environments

First, run the following in the console:

x <- 2
x * 3


All looks good, eh?

Then, add the following in an R chunk in your Quarto document

x * 3


What happens? Why the error?

How will we use Quarto?

  • Every activity, project, etc. is an Quarto document
  • You’ll always have a template Quarto document to start with
  • The amount of scaffolding in the template will decrease over the block

What can we forecast?

Forecasts that aren’t forecasts

What can we forecast? (1/9)

What can we forecast? (2/9)

What can we forecast? (3/9)

What can we forecast? (4/9)

What can we forecast? (5/9)

What can we forecast? (6/9)

What can we forecast? (7/9)

What can we forecast? (8/9)

What can we forecast? (9/9)

Which is easiest to forecast?

  • daily electricity demand in 3 days time
  • time of sunrise this day next year
  • Google stock price tomorrow
  • Google stock price in 6 months time
  • maximum temperature tomorrow
  • exchange rate of $US/AUS next week
  • total sales of drugs in Australian pharmacies next month
  • timing of next Halley’s comet appearance

Which is easiest to forecast?

  1. time of sunrise this day next year
  2. timing of next Halley’s comet appearance
  3. maximum temperature tomorrow
  4. daily electricity demand in 3 days time
  5. total sales of drugs in Australian pharmacies next month
  6. Google stock price tomorrow
  7. exchange rate of $US/AUS next week
  8. Google stock price in 6 months time

. . .

Questions - how do we measure ``easiest’’? - what makes something easy/difficult to forecast?

Forecastability factors

Something is easier to forecast if:

  1. we have a good understanding of the factors that contribute to it
  2. there is lots of data available;
  3. the future is somewhat similar to the past
  4. the forecasts cannot affect the thing we are trying to forecast.

Time series data and random futures

Time series data

  • Four-yearly Olympic winning times
  • Annual Google profits
  • Quarterly Australian beer production
  • Monthly rainfall
  • Weekly retail sales
  • Daily IBM stock prices
  • Hourly electricity demand
  • 5-minute freeway traffic counts
  • Time-stamped stock transaction data

Random futures

. . .

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

. . .

“He who sees the past as surprise-free is bound to have a future full of surprises.’’(Amos Tversky)

Statistical forecasting

  • Thing to be forecast: a random variable, \(y_t\).
  • Forecast distribution: If \({\cal I}\) is all observations, then \(y_{t} |{\cal I}\) means “the random variable \(y_{t}\) given what we know in \({\cal I}\).
  • The “point forecast” is the mean (or median) of \(y_{t} |{\cal I}\)
  • The “forecast variance” is \(\text{var}[y_{t} |{\cal I}]\). Variance is the square of the standard deviation.
  • A prediction interval or “interval forecast” is a range of values of \(y_t\) with high probability.
  • With time series, \({y}_{t|t-1} = y_t | \{y_1,y_2,\dots,y_{t-1}\}\).

Some case studies

CASE STUDY 1: Paperware company

Problem: Want forecasts of each of hundreds of items. Series can be stationary, trended or seasonal. They currently have a large forecasting program written in-house but it doesn’t seem to produce sensible forecasts. They want me to fix it.

. . .

Additional information

  • Program written in COBOL making numerical calculations limited. It is not possible to do any optimization.
  • Their programmer has little experience in numerical computing.
  • They employ no statisticians and want the program to produce forecasts automatically.

CASE STUDY 1:Paperware company

Methods currently used

. . .

  • 12 month average
  • 6 month average
  • straight line regression over last 12 months
  • straight line regression over last 6 months
  • prediction obtained by a straight line through the last observation with slope equal to the average slope of the lines connecting last year’s and this year’s values
  • prediction obtained by a straight line through the last observation with slope equal to the average slope of the lines connecting last year’s and this year’s values, where the average is taken only over the last 6 months.

CASE STUDY 2: PBS

CASE STUDY 2: PBS

The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.

  • Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs.
  • The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP.
  • The total cost is budgeted based on forecasts of drug usage.

CASE STUDY 2: PBS

CASE STUDY 2: PBS

  • In 2001: $4.5 billion budget, under-forecasted by $800 million.
  • Thousands of products. Seasonal demand.
  • Subject to covert marketing, volatile products, uncontrollable expenditure.
  • Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts.
  • All forecasts being done with the FORECAST function in MS-Excel!

CASE STUDY 3: Car fleet company

Client: One of Australia’s largest car fleet companies

Problem: how to forecast resale value of vehicles? How should this affect leasing and sales policies?

. . .

Additional information

  • They can provide a large amount of data on previous vehicles and their eventual resale values.
  • The resale values are currently estimated by a group of specialists. They see me as a threat and do not cooperate.

CASE STUDY 4: Airline

CASE STUDY 4: Airline

. . .

Not the real data! Or is it?

CASE STUDY 4: Airline

Problem: how to forecast passenger traffic on major routes?

. . .

Additional information

  • They can provide a large amount of data on previous routes.
  • Traffic is affected by school holidays, special events such as the Grand Prix, advertising campaigns, competition behavior, etc.
  • They have a highly capable team of people who are able to do most of the computing.

The basic steps in a forecasting task

  • Step 1: Problem definition.
  • Step 2: Gathering information.
  • Step 3: Preliminary (exploratory) analysis.
  • Step 4: Choosing and fitting models.
  • Step 5: Using and evaluating a forecasting model.

References