Course Information

Meet the prof

Dr. Tyler George

West 311

Headshot of Dr. Tyler George

Meet each other!

Where is?…

Course Website

stats-tgeorge.github.io/STA364_TSApps/

All course materials (slides, exams, etc)
Links (book, data, and more)

Moodle

Submissions
Gradebook

Syllabus

Let’s open it up!

Diversity + inclusion

It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit.

Please let me know your preferred name and pronouns on the Getting to know you survey.
If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. I want to be a resource for you.
I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it.

Collaboration policy

Only work that is clearly assigned as team work should be completed collaboratively.
Some labs/activities will be completed in groups. You should work with each other during, and sometimes outside of class, to complete the labs.
Homework must be submitted individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.
Exams must be completed individually. You may not discuss any aspect of the exam with peers. If you have questions, email me, especially if you get stuck on an usual problem (not a coding error).

Five tips for success

Complete all the preparation work before class (readings).
Ask questions.
Do the readings.
Do the labs/activities.
Do the Homework
Don’t procrastinate! There is no time for falling behind on the block!

Software

Excel - not…

An Excel window with data about countries

R

RStudio

RStudio Server: http://turing.cornellcollege.edu:8787 (Only available on campus)

R and RStudio

R logo

R is an open-source statistical programming language
R is also an environment for statistical computing and graphics
It’s easily extensible with packages

RStudio logo

RStudio is a convenient interface for R called an IDE (integrated development environment), e.g. “I write R code in the RStudio IDE”
RStudio is not a requirement for programming with R, but it’s very commonly used by R programmers and data scientists

R vs. RStudio

On the left: a car engine. On the right: a car dashboard. The engine is labelled R. The dashboard is labelled RStudio.

Source: Modern Dive.

R packages

Packages: Fundamental units of reproducible R code, including reusable R functions, the documentation that describes how to use them, and sample data¹
As of March 14th, 2024, there are 20,582 R packages available on CRAN (the Comprehensive R Archive Network)²
We’re going to work with a small (but important) subset of these!

¹ Wickham and Bryan, R Packages.

² CRAN contributed packages.

Tour: R + RStudio

Option 1:

Sit back and enjoy the show!

Option 2:

Go to the server http://turing.cornellcollege.edu
Cornell username all lower case
Starting password given in class

Tour recap: R + RStudio

A short list (for now) of R essentials

Functions are (most often) verbs, followed by what they will be applied to in parentheses:

. . .

do_this(to_this)
do_that(to_this, to_that, with_those)

Packages are installed with the install.packages() function and loaded with the library function, once per session:

. . .

install.packages("package_name")
library(package_name)

R essentials (continued)

Columns (variables) in data frames are accessed with $:

. . .

dataframe$var_name

Object documentation can be accessed with ?

. . .

?mean

tidyverse

tidyverse.org

The tidyverse is an opinionated collection of R packages designed for data science
All packages share an underlying philosophy and a common grammar

Quarto

Fully reproducible reports – each time you render the analysis is ran from the beginning
Code goes in chunks narrative goes outside of chunks
A visual editor for a familiar / Google docs-like editing experience

Tour: Quarto

Option 1:

Sit back and enjoy the show!

Option 2:

Go to the server http://turing.cornellcollege.edu
Cornell username all lower case
Starting password given in class

Tour recap: Quarto

RStudio IDE with a Quarto document, source code on the left and output on the right. Annotated to show the YAML, a link, a header, and a code chunk.

Environments

Important

The environment of your Quarto document is separate from the Console!

Remember this, and expect it to bite you a few times as you’re learning to work with Quarto!

Environments

First, run the following in the console:

x <- 2
x * 3

All looks good, eh?

Then, add the following in an R chunk in your Quarto document

x * 3

What happens? Why the error?

How will we use Quarto?

Every activity, project, etc. is an Quarto document
You’ll always have a template Quarto document to start with
The amount of scaffolding in the template will decrease over the block

What can we forecast?

Forecasts that aren’t forecasts

What can we forecast? (1/9)

What can we forecast? (2/9)

What can we forecast? (3/9)

What can we forecast? (4/9)

What can we forecast? (5/9)

What can we forecast? (6/9)

What can we forecast? (7/9)

CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=59368531

What can we forecast? (8/9)

What can we forecast? (9/9)

Which is easiest to forecast?

daily electricity demand in 3 days time
time of sunrise this day next year
Google stock price tomorrow
Google stock price in 6 months time
maximum temperature tomorrow
exchange rate of $US/AUS next week
total sales of drugs in Australian pharmacies next month
timing of next Halley’s comet appearance

Which is easiest to forecast?

time of sunrise this day next year
timing of next Halley’s comet appearance
maximum temperature tomorrow
daily electricity demand in 3 days time
total sales of drugs in Australian pharmacies next month
Google stock price tomorrow
exchange rate of $US/AUS next week
Google stock price in 6 months time

. . .

Questions - how do we measure ``easiest’’? - what makes something easy/difficult to forecast?

Forecastability factors

Something is easier to forecast if:

we have a good understanding of the factors that contribute to it
there is lots of data available;
the future is somewhat similar to the past
the forecasts cannot affect the thing we are trying to forecast.

Time series data and random futures

Time series data

Four-yearly Olympic winning times
Annual Google profits
Quarterly Australian beer production
Monthly rainfall
Weekly retail sales
Daily IBM stock prices
Hourly electricity demand
5-minute freeway traffic counts
Time-stamped stock transaction data

Random futures

. . .

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

. . .

“He who sees the past as surprise-free is bound to have a future full of surprises.’’(Amos Tversky)

Statistical forecasting

Thing to be forecast: a random variable, $y_t$.
Forecast distribution: If ${\cal I}$ is all observations, then $y_{t} |{\cal I}$ means “the random variable $y_{t}$ given what we know in ${\cal I}$.
The “point forecast” is the mean (or median) of $y_{t} |{\cal I}$
The “forecast variance” is $\text{var}[y_{t} |{\cal I}]$. Variance is the square of the standard deviation.
A prediction interval or “interval forecast” is a range of values of $y_t$ with high probability.
With time series, ${y}_{t|t-1} = y_t | \{y_1,y_2,\dots,y_{t-1}\}$.

Some case studies

CASE STUDY 1: Paperware company

Problem: Want forecasts of each of hundreds of items. Series can be stationary, trended or seasonal. They currently have a large forecasting program written in-house but it doesn’t seem to produce sensible forecasts. They want me to fix it.

. . .

Additional information

Program written in COBOL making numerical calculations limited. It is not possible to do any optimization.
Their programmer has little experience in numerical computing.
They employ no statisticians and want the program to produce forecasts automatically.

CASE STUDY 1:Paperware company

Methods currently used

. . .

12 month average
6 month average
straight line regression over last 12 months
straight line regression over last 6 months
prediction obtained by a straight line through the last observation with slope equal to the average slope of the lines connecting last year’s and this year’s values
prediction obtained by a straight line through the last observation with slope equal to the average slope of the lines connecting last year’s and this year’s values, where the average is taken only over the last 6 months.

CASE STUDY 2: PBS

The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.

Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs.
The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP.
The total cost is budgeted based on forecasts of drug usage.

CASE STUDY 2: PBS

In 2001: $4.5 billion budget, under-forecasted by $800 million.
Thousands of products. Seasonal demand.
Subject to covert marketing, volatile products, uncontrollable expenditure.
Although monthly data available for 10 years, data are aggregated to annual values, and only the first three years are used in estimating the forecasts.
All forecasts being done with the FORECAST function in MS-Excel!

CASE STUDY 3: Car fleet company

Client: One of Australia’s largest car fleet companies

Problem: how to forecast resale value of vehicles? How should this affect leasing and sales policies?

. . .

Additional information

They can provide a large amount of data on previous vehicles and their eventual resale values.
The resale values are currently estimated by a group of specialists. They see me as a threat and do not cooperate.

CASE STUDY 4: Airline

. . .

Not the real data! Or is it?

CASE STUDY 4: Airline

Problem: how to forecast passenger traffic on major routes?

. . .

Additional information

They can provide a large amount of data on previous routes.
Traffic is affected by school holidays, special events such as the Grand Prix, advertising campaigns, competition behavior, etc.
They have a highly capable team of people who are able to do most of the computing.

The basic steps in a forecasting task

Step 1: Problem definition.
Step 2: Gathering information.
Step 3: Preliminary (exploratory) analysis.
Step 4: Choosing and fitting models.
Step 5: Using and evaluating a forecasting model.

Course Information

Meet the prof

Meet each other!

Where is?…

Course Website

Moodle

Syllabus

Diversity + inclusion

Collaboration policy

Sharing / reusing code policy

Five tips for success

Software

Excel - not…

R

RStudio

R and RStudio

R vs. RStudio

R packages

Tour: R + RStudio

Tour recap: R + RStudio

A short list (for now) of R essentials

R essentials (continued)

tidyverse

Quarto

Quarto

Tour: Quarto

Tour recap: Quarto

Environments

Environments

How will we use Quarto?

What can we forecast?

Forecasts that aren’t forecasts

What can we forecast? (1/9)

What can we forecast? (2/9)

What can we forecast? (3/9)

What can we forecast? (4/9)

What can we forecast? (5/9)

What can we forecast? (6/9)

What can we forecast? (7/9)

What can we forecast? (8/9)

What can we forecast? (9/9)

Which is easiest to forecast?

Which is easiest to forecast?

Forecastability factors

Time series data and random futures

Time series data

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Random futures

Statistical forecasting

Some case studies

CASE STUDY 1: Paperware company

CASE STUDY 1:Paperware company

CASE STUDY 2: PBS

CASE STUDY 2: PBS

The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.

CASE STUDY 2: PBS

CASE STUDY 2: PBS

CASE STUDY 3: Car fleet company

Additional information

CASE STUDY 4: Airline

CASE STUDY 4: Airline

CASE STUDY 4: Airline

Additional information

The basic steps in a forecasting task

References