Cornell College
STA 364 Spring 2025 Block 5
Consider the GDP information in global_economy
. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time? –>
\[x_t=y_t/z_t \cdot z_{2000}\]
gives the adjusted house price at year 2000 dollar values.
aus_newspaper_retail <- readr::read_csv('data/aus_newspaper_retail.csv')
# Turnover: Retail turnover in $Million AUD
aus_newspaper_retail <- aus_newspaper_retail |>
select(Year,Turnover,name) |> # Picks out these 3 columns/variables
group_by(Year,name) |> # Tells are to now calculate by groups by year and name
summarise(sum_Turnover = sum(Turnover))|> #adds up Turnover for each year and name combination
as_tsibble(index = Year, key = "name") # Converts to time series
aus_newspaper_retail |> autoplot(sum_Turnover) +
facet_grid(name ~ ., scales = "free_y") + # This will make a grid of plots
# with name used to break into multiple plots along the y direction
labs(title = "Turnover: Australian print media industry", y = "$AU")
If the data show different variation at different levels of the series, then a transformation can be useful.
Denote original observations as \(y_1,\dots,y_T\) and transformed observations as \(w_1, \dots, w_T\).
Function | Impact | |
---|---|---|
Square root | \(w_t = \sqrt{y_t}\) | \(\downarrow\) |
Cube root | \(w_t = \sqrt[3]{y_t}\) | Increasing |
Logarithm | \(w_t = \log(y_t)\) | strength |
Logarithms, in particular, are useful because they are more interpretable: changes in a log value are relative (percent) changes on the original scale.
(log here is the natural log, base \(e\))
Each of these transformations is close to a member of the family of :
\[w_t = \left\{\begin{array}{ll} \log(y_t), & \quad \lambda = 0; \\ (sign(y_t)|y_t|^\lambda-1)/\lambda , & \quad \lambda \ne 0. \end{array}\right.\]
log1p()
can also be useful for data with zeros.fable
.)For the following series, find an appropriate transformation in order to stabilise the variance.
global_economy
aus_livestock
vic_elec
.aus_production
Why is a Box-Cox transformation unhelpful for the canadian_gas
data?
Recall
\(y_t = f(S_t, T_t, R_t)\)
where \(y_t=\), data at period \(t\)
\(T_t=\), trend-cycle component at period \(t\)
\(S_t=\), seasonal component at period \(t\)
\(R_t=\),remainder component at period \(t\)
Additive decomposition: \(y_t = S_t + T_t + R_t.\)
Multiplicative decomposition: \(y_t = S_t \times T_t \times R_t.\)
\[y_t = S_t \times T_t \times R_t \quad\Rightarrow\quad \log y_t = \log S_t + \log T_t + \log R_t.\]
us_retail_employment <- us_employment |>
filter(year(Month) >= 1990, Title == "Retail Trade") |>
select(-Series_ID)
us_retail_employment
# A tsibble: 357 x 3 [1M]
Month Title Employed
<mth> <chr> <dbl>
1 1990 Jan Retail Trade 13256.
2 1990 Feb Retail Trade 12966.
3 1990 Mar Retail Trade 12938.
4 1990 Apr Retail Trade 13012.
5 1990 May Retail Trade 13108.
6 1990 Jun Retail Trade 13183.
7 1990 Jul Retail Trade 13170.
8 1990 Aug Retail Trade 13160.
9 1990 Sep Retail Trade 13113.
10 1990 Oct Retail Trade 13185.
# ℹ 347 more rows
# A dable: 357 x 7 [1M]
# Key: .model [1]
# : Employed = trend + season_year + remainder
.model Month Employed trend season_year remainder season_adjust
<chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
1 stl 1990 Jan 13256. 13288. -33.0 0.836 13289.
2 stl 1990 Feb 12966. 13269. -258. -44.6 13224.
3 stl 1990 Mar 12938. 13250. -290. -22.1 13228.
4 stl 1990 Apr 13012. 13231. -220. 1.05 13232.
5 stl 1990 May 13108. 13211. -114. 11.3 13223.
6 stl 1990 Jun 13183. 13192. -24.3 15.5 13207.
7 stl 1990 Jul 13170. 13172. -23.2 21.6 13193.
8 stl 1990 Aug 13160. 13151. -9.52 17.8 13169.
9 stl 1990 Sep 13113. 13131. -39.5 22.0 13153.
10 stl 1990 Oct 13185. 13110. 61.6 13.2 13124.
# ℹ 347 more rows
Advantages
Disadvantages
Advantages
Disadvantages
trend(window = ?)
controls wiggliness of trend component.season(window = ?)
controls variation on seasonal component.season(window = 'periodic')
is equivalent to an infinite window.Note
STL()
chooses by default. This can include transformations.
window = 13
window = nextodd(
ceiling((1.5*period)/(1-(1.5/s.window)))