library(fpp3)
Chapter 7 Time Series Linear Models
Setup
Multiple regression and forecasting
\[y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t.\]
- \(y_t\) is the variable we want to predict: the “response” variable
- Each \(x_{j,t}\) is numerical and is called a “predictor”. They are usually assumed to be known for all past and future times.
- The coefficients \(\beta_1,\dots,\beta_k\) measure the effect of each predictor after taking account of the effect of all other predictors in the model.
. . .
That is, the coefficients measure the .
- \(\varepsilon_t\) is a white noise error term
Example: US consumption expenditure
|> autoplot(Consumption) +
us_change geom_line(aes(y = Income, color = "Income")) +
geom_line(aes(y = Consumption, color = "Consumption")) +
scale_color_manual(values = c("Income" = "blue", "Consumption" = "red")) +
labs(y = "% change", color = "Legend")
Example: US consumption expenditure
|> ggplot(aes(x=Income, y=Consumption)) +
us_change labs(y = "Consumption (quarterly % change)",
x = "Income (quarterly % change)") +
geom_point() + geom_smooth(method="lm", se=FALSE)
Example: US consumption expenditure
<- us_change |>
fit_cons model(lm = TSLM(Consumption ~ Income))
report(fit_cons)
Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-2.58236 -0.27777 0.01862 0.32330 1.42229
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54454 0.05403 10.079 < 2e-16 ***
Income 0.27183 0.04673 5.817 2.4e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5905 on 196 degrees of freedom
Multiple R-squared: 0.1472, Adjusted R-squared: 0.1429
F-statistic: 33.84 on 1 and 196 DF, p-value: 2.4022e-08
Example: US consumption expenditure
|>
us_change gather("Measure", "Change", Consumption, Income, Production, Savings, Unemployment) |>
ggplot(aes(x = Quarter, y = Change, colour = Measure)) +
geom_line() +
facet_grid(vars(Measure), scales = "free_y") +
labs(y = "") +
guides(colour = "none")
Example: US consumption expenditure
|>
us_change as_tibble() |>
select(-Quarter) |>
::ggpairs() GGally
Assumptions for the linear model
For forecasting purposes, we require the following assumptions:
\(\varepsilon_t\) have mean zero and are uncorrelated.
\(\varepsilon_t\) are uncorrelated with each \(x_{j,t}\).
. . .
It is useful to also have \(\varepsilon_t \sim \text{N}(0,\sigma^2)\) when producing prediction intervals or doing statistical tests.
Least squares estimation
- In practice we need to estimate the coefficients: \(\beta_0,\beta_1, \dots, \beta_k\).
. . .
\[\sum_{t=1}^T \varepsilon_t^2 = \sum_{t=1}^T (y_t - \beta_{0} - \beta_{1} x_{1,t} - \beta_{2} x_{2,t} - \cdots - \beta_{k} x_{k,t})^2\]
. . .
model(TSLM(y ~ x_1 + x_2 + ... + x_k))
- Estimated coefficients: \(\hat\beta_0, \dots, \hat\beta_k\)
Example: US consumption expenditure
<- us_change |>
fit_consMR model(lm = TSLM(Consumption ~ Income + Production + Unemployment + Savings))
report(fit_consMR)
Series: Consumption
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-0.90555 -0.15821 -0.03608 0.13618 1.15471
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.253105 0.034470 7.343 5.71e-12 ***
Income 0.740583 0.040115 18.461 < 2e-16 ***
Production 0.047173 0.023142 2.038 0.0429 *
Unemployment -0.174685 0.095511 -1.829 0.0689 .
Savings -0.052890 0.002924 -18.088 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3102 on 193 degrees of freedom
Multiple R-squared: 0.7683, Adjusted R-squared: 0.7635
F-statistic: 160 on 4 and 193 DF, p-value: < 2.22e-16
Fitted values
\[\hat{y}_t = \hat\beta_{0} + \hat\beta_{1} x_{1,t} + \hat\beta_{2} x_{2,t} + \cdots + \hat\beta_{k} x_{k,t}\]
augment(fit_consMR) |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Consumption, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y = NULL, title = "Percent change in US consumption expenditure") +
scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00")) +
guides(colour = guide_legend(title = NULL))
Example: US consumption expenditure
augment(fit_consMR) |>
ggplot(aes(y = .fitted, x = Consumption)) +
geom_point() +
labs(
y = "Fitted (predicted values)",
x = "Data (actual values)",
title = "Percentage change in US consumption expenditure"
+
) geom_abline(intercept = 0, slope = 1)
Goodness of fit
Coefficient of determination
\[R^2 = \frac{\sum(\hat{y}_{t} - \bar{y})^2}{\sum(y_{t}-\bar{y})^2}\]
. . .
Standard error of the regression
\[\hat{\sigma}_e=\sqrt{\frac{1}{T-k-1}\sum_{t=1}^{T}{e_t^2}}\]
where \(k\) is the number of predictors in the model.
Multiple regression and forecasting
For forecasting purposes, we require the following assumptions:
\(\varepsilon_t\) are uncorrelated and zero mean
\(\varepsilon_t\) are uncorrelated with each \(x_{j,t}\).
. . .
It is useful to also have \(\varepsilon_t \sim \text{N}(0,\sigma^2)\) when producing prediction intervals or doing statistical tests.
Residual plots
Useful for spotting outliers and whether the linear model was appropriate.
Scatterplot of residuals \(\varepsilon_t\) against each predictor \(x_{j,t}\).
Scatterplot residuals against the fitted values \(\hat y_t\)
Expect to see scatterplots resembling a horizontal band with no values too far from the band and no patterns such as curvature or increasing spread.
Residual patterns
If a plot of the residuals vs any predictor in the model shows a pattern, then the relationship is nonlinear.
If a plot of the residuals vs any predictor not in the model shows a pattern, then the predictor should be added to the model.
If a plot of the residuals vs fitted values shows a pattern, then there is heteroscedasticity in the errors. (Could try a transformation.)
Trend
Linear trend
\[x_t = t\]
- \(t=1,2,\dots,T\)
- Strong assumption that trend will continue.
Dummy variables
If a categorical variable takes only two values (e.g., Yes
, or No
), then an equivalent numerical variable can be constructed taking value 1 if yes and 0 if no. This is called a dummy variable.
Variable | dummy |
---|---|
Yes | 1 |
Yes | 1 |
No | 0 |
Yes | 1 |
No | 0 |
No | 0 |
Yes | 1 |
Yes | 1 |
No | 0 |
No | 0 |
Dummy variables
If there are more than two categories, then the variable can be coded using several dummy variables (one fewer than the total number of categories).
Day | d1 | d2 | d3 | d4 |
---|---|---|---|---|
Monday | 1 | 0 | 0 | 0 |
Tuesday | 0 | 1 | 0 | 0 |
Wednesday | 0 | 0 | 1 | 0 |
Thursday | 0 | 0 | 0 | 1 |
Friday | 0 | 0 | 0 | 0 |
Monday | 1 | 0 | 0 | 0 |
Tuesday | 0 | 1 | 0 | 0 |
Wednesday | 0 | 0 | 1 | 0 |
Thursday | 0 | 0 | 0 | 1 |
Friday | 0 | 0 | 0 | 0 |
Beware of the dummy variable trap!
Using one dummy for each category gives too many dummy variables!
The regression will then be singular and inestimable.
Either omit the constant, or omit the dummy for one category.
The coefficients of the dummies are relative to the omitted category.
Uses of dummy variables
Seasonal dummies
- For quarterly data: use 3 dummies
- For monthly data: use 11 dummies
- For daily data: use 6 dummies
- What to do with weekly data?
. . .
Outliers
- If there is an outlier, you can use a dummy variable to remove its effect.
. . .
Public holidays
- For daily data: if it is a public holiday, dummy=1, otherwise dummy=0.
Beer production revisited (1/6)
<- aus_production |> filter(year(Quarter) >= 1992)
recent_production |> autoplot(Beer) +
recent_production labs(y = "Megalitres", title = "Australian quarterly beer production")
. . .
Regression model
\[y_t = \beta_0 + \beta_1 t + \beta_2d_{2,t} + \beta_3 d_{3,t} + \beta_4 d_{4,t} + \varepsilon_t\]
- \(d_{i,t} = 1\) if \(t\) is quarter \(i\) and 0 otherwise.
Beer production revisited (2/6)
<- recent_production |> model(TSLM(Beer ~ trend() + season()))
fit_beer report(fit_beer)
Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9029 -7.5995 -0.4594 7.9908 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.80044 3.73353 118.333 < 2e-16 ***
trend() -0.34027 0.06657 -5.111 2.73e-06 ***
season()year2 -34.65973 3.96832 -8.734 9.10e-13 ***
season()year3 -17.82164 4.02249 -4.430 3.45e-05 ***
season()year4 72.79641 4.02305 18.095 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16
Beer production revisited (3/6)
augment(fit_beer) |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
labs(y = "Megalitres", title = "Australian quarterly beer production") +
scale_colour_manual(values = c(Data = "black", Fitted = "#D55E00"))
Beer production revisited (4/6)
augment(fit_beer) |>
ggplot(aes(x = Beer, y = .fitted, colour = factor(quarter(Quarter)))) +
geom_point() +
labs(y = "Fitted", x = "Actual values", title = "Quarterly beer production") +
scale_colour_brewer(palette = "Dark2", name = "Quarter") +
geom_abline(intercept = 0, slope = 1)
Beer production revisited (5/6)
|> gg_tsresiduals() fit_beer
Beer production revisited (6/6)
|>
fit_beer forecast() |>
autoplot(recent_production)
Intervention variables
Spikes
- Equivalent to a dummy variable for handling an outlier.
. . .
Steps
- Variable takes value 0 before the intervention and 1 afterwards.
. . .
Change of slope
- Variables take values 0 before the intervention and values \(\{1,2,3,\dots\}\) afterwards.
Holidays
For monthly data
- Christmas: always in December so part of monthly seasonal effect
- Easter: use a dummy variable \(v_t=1\) if any part of Easter is in that month, \(v_t=0\) otherwise.
- Ramadan and Chinese new year similar.
Distributed lags
Lagged values of a predictor.
Example: \(x\) is advertising which has a delayed effect
\[\begin{align*} x_{1} &= \text{advertising for previous month;} \\ x_{2} &= \text{advertising for two months previously;} \\ & \vdots \\ x_{m} &= \text{advertising for $m$ months previously.} \end{align*}\]
Fourier series
Periodic seasonality can be handled using pairs of Fourier terms:
\[s_{k}(t) = \sin\left(\frac{2\pi k t}{m}\right)\qquad c_{k}(t) = \cos\left(\frac{2\pi k t}{m}\right)\]
\[y_t = a + bt + \sum_{k=1}^K \left[\alpha_k s_k(t) + \beta_k c_k(t)\right] + \varepsilon_t\] where \(m\) is the seasonal period.
- Every periodic function can be approximated by sums of sin and cos terms for large enough \(K\).
- Choose \(K\) by minimizing AICc.
- Called “harmonic regression”
. . .
TSLM(y ~ trend() + fourier(K))
Harmonic regression: beer production
<- recent_production |> model(TSLM(Beer ~ trend() + fourier(K = 2)))
fourier_beer report(fourier_beer)
Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9029 -7.5995 -0.4594 7.9908 21.7895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 446.87920 2.87321 155.533 < 2e-16 ***
trend() -0.34027 0.06657 -5.111 2.73e-06 ***
fourier(K = 2)C1_4 8.91082 2.01125 4.430 3.45e-05 ***
fourier(K = 2)S1_4 -53.72807 2.01125 -26.714 < 2e-16 ***
fourier(K = 2)C2_4 -13.98958 1.42256 -9.834 9.26e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 69 degrees of freedom
Multiple R-squared: 0.9243, Adjusted R-squared: 0.9199
F-statistic: 210.7 on 4 and 69 DF, p-value: < 2.22e-16
Harmonic regression: eating-out expenditure
<- aus_retail |>
aus_cafe filter(Industry == "Cafes, restaurants and takeaway food services",
year(Month) %in% 2004:2018) |>
summarise(Turnover = sum(Turnover))
|> autoplot(Turnover) aus_cafe
Harmonic regression: eating-out expenditure (1/7)
<- aus_cafe |>
fit model(
K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6))
)
Harmonic regression: eating-out expenditure (2/7)
Harmonic regression: eating-out expenditure (3/7)
Harmonic regression: eating-out expenditure (4/7)
Harmonic regression: eating-out expenditure (5/7)
Harmonic regression: eating-out expenditure (6/7)
Harmonic regression: eating-out expenditure (7/7)
Fourier series
Periodic seasonality can be handled using pairs of Fourier terms: \[s_{k}(t) = \sin\left(\frac{2\pi k t}{m}\right)\qquad c_{k}(t) = \cos\left(\frac{2\pi k t}{m}\right)\] \[y_t = a + bt + \sum_{k=1}^K \left[\alpha_k s_k(t) + \beta_k c_k(t)\right] + \varepsilon_t\]
- Every periodic function can be approximated by sums of sin and cos terms for large enough \(K\).
- \(K \le m/2\)
- \(m\) can be non-integer
- Particularly useful for large \(m\).
Comparing regression models
Computer output for regression will always give the \(R^2\) value. This is a useful summary of the model.
- It is equal to the square of the correlation between \(y\) and \(\hat y\).
- It is often called the “coefficient of determination”.
- It can also be calculated as follows:
. . .
\[R^2 = \frac{\sum(\hat{y}_t - \bar{y})^2}{\sum(y_t-\bar{y})^2}\]
- It is the proportion of variance accounted for (explained) by the predictors.
Comparing regression models
However …
\(R^2\) does not allow for “degrees of freedom”.
Adding any variable tends to increase the value of \(R^2\), even if that variable is irrelevant.
. . .
To overcome this problem, we can use adjusted \(R^2\):
\[\bar{R}^2 = 1-(1-R^2)\frac{T-1}{T-k-1}\]
where \(k=\) no. predictors and \(T=\) no. observations.
. . .
Maximizing \(\bar{R}^2\) is equivalent to minimizing \(\hat\sigma^2\).
\[\hat{\sigma}^2 = \frac{1}{T-k-1}\sum_{t=1}^T \varepsilon_t^2\]
Akaike’s Information Criterion
\[\text{AIC} = -2\log(L) + 2(k+2)\]
where \(L\) is the likelihood and \(k\) is the number of predictors in the model.
- AIC penalizes terms more heavily than \(\bar{R}^2\).
- Minimizing the AIC is asymptotically equivalent to minimizing MSE via leave-one-out cross-validation (for any linear regression).
Corrected AIC
For small values of \(T\), the AIC tends to select too many predictors, and so a bias-corrected version of the AIC has been developed.
\[\text{AIC}_{\text{C}} = \text{AIC} + \frac{2(k+2)(k+3)}{T-k-3}\]
. . .
As with the AIC, the AIC\(_{\text{C}}\) should be minimized.
Bayesian Information Criterion
\[\text{BIC} = -2\log(L) + (k+2)\log(T)\]
where \(L\) is the likelihood and \(k\) is the number of predictors in the model.
BIC penalizes terms more heavily than AIC
Also called SBIC and SC.
Minimizing BIC is asymptotically equivalent to leave-\(v\)-out cross-validation when \(v = T[1-1/(log(T)-1)]\).
Harmonic regression: eating-out expenditure again (1/8)
<- aus_retail |>
aus_cafe filter(Industry == "Cafes, restaurants and takeaway food services",
year(Month) %in% 2004:2018) |>
summarise(Turnover = sum(Turnover))
|> autoplot(Turnover) aus_cafe
Harmonic regression: eating-out expenditure again (2/8)
<- aus_cafe |>
fit model(
K1 = TSLM(log(Turnover) ~ trend() + fourier(K = 1)),
K2 = TSLM(log(Turnover) ~ trend() + fourier(K = 2)),
K3 = TSLM(log(Turnover) ~ trend() + fourier(K = 3)),
K4 = TSLM(log(Turnover) ~ trend() + fourier(K = 4)),
K5 = TSLM(log(Turnover) ~ trend() + fourier(K = 5)),
K6 = TSLM(log(Turnover) ~ trend() + fourier(K = 6))
)glance(fit) |> select(.model, r_squared, adj_r_squared, CV, AICc)
# A tibble: 6 × 5
.model r_squared adj_r_squared CV AICc
<chr> <dbl> <dbl> <dbl> <dbl>
1 K1 0.962 0.962 0.00238 -1085.
2 K2 0.966 0.965 0.00220 -1099.
3 K3 0.976 0.975 0.00157 -1160.
4 K4 0.980 0.979 0.00138 -1183.
5 K5 0.985 0.984 0.00104 -1234.
6 K6 0.985 0.984 0.00105 -1232.
Harmonic regression: eating-out expenditure again (3/8)
Harmonic regression: eating-out expenditure again (4/8)
Harmonic regression: eating-out expenditure again (5/8)
Harmonic regression: eating-out expenditure again (6/8)
Harmonic regression: eating-out expenditure again (7/8)
Harmonic regression: eating-out expenditure again (8/8)
Ex-ante versus ex-post forecasts
- Ex ante forecasts are made using only information available in advance.
- require forecasts of predictors
- Ex post forecasts are made using later information on the predictors.
- useful for studying behavior of forecasting models.
- trend, seasonal and calendar variables are all known in advance, so these don’t need to be forecast.
Beer production
<- aus_production |> filter(year(Quarter) >= 1992)
recent_production |> model(TSLM(Beer ~ trend() + season())) |>
recent_production forecast() |> autoplot(recent_production)
Scenario based forecasting
- Assumes possible scenarios for the predictor variables
- Prediction intervals for scenario based forecasts do not include the uncertainty associated with the future values of the predictor variables.
US Consumption
<- us_change |>
fit_consBest model(
TSLM(Consumption ~ Income + Savings + Unemployment)
)
<- scenarios(
future_scenarios Increase = new_data(us_change, 4) |>
mutate(Income = 1, Savings = 0.5, Unemployment = 0),
Decrease = new_data(us_change, 4) |>
mutate(Income = -1, Savings = -0.5, Unemployment = 0),
names_to = "Scenario"
)
<- forecast(fit_consBest, new_data = future_scenarios) fc
US Consumption
|> autoplot(Consumption) +
us_change labs(y = "% change in US consumption") +
autolayer(fc) +
labs(title = "US consumption", y = "% change")
Building a predictive regression model
- If getting forecasts of predictors is difficult, you can use lagged predictors instead.
. . .
\[y_{t+h}=\beta_0+\beta_1x_{1,t}+\dots+\beta_kx_{k,t}+\varepsilon_{t+h}\]
- A different model for each forecast horizon \(h\).
Nonlinear regression
A log-log functional form
\[\log y=\beta_0+\beta_1 \log x +\varepsilon\]
where \(\beta_1\) is interpreted as an elasticity (the average percentage change in \(y\) resulting from a \(1\%\) increase in \(x\)).
- alternative specifications: log-linear, linear-log.
- use \(\log(x+1)\) if required.
Piecewise linear and regression splines (1/2)
\[y=f(x) +\varepsilon\] where \(f\) is a non-linear function.
- For piecewise linear let \(x_1=x\) and
. . .
\[\begin{align*} x_{2} = (x-c)_+ &= \left\{ \begin{array}{ll} 0 & \text{if } x < c\\ x-c & \text{if } x \ge c. \end{array}\right. \end{align*}\]
Piecewise linear and regression splines (2/2)
- In general, Linear Regression Splines
. . .
\[x_1=x~~~x_2=(x-c_1)_+~~~\ldots~~~x_k=(x-c_{k-1})_+\]
where \(c_1,\ldots,c_{k-1}\) are knots.
- Need to select knots: can be difficult and arbitrary.
- Automatic knot selection algorithms very slow.
- Using piecewise cubics achieves a smoother result.
. . .
Better fit but forecasting outside the range of the historical data is even more unreliable.
Nonlinear trends
Piecewise linear trend with bend at \(\tau\)
. . .
\[\begin{align*} x_{1,t} &= t \\ x_{2,t} &= \left\{ \begin{array}{ll} 0 & t <\tau\\ (t-\tau) & t \ge \tau \end{array}\right. \end{align*}\]
. . .
Quadratic or higher order trend
\[x_{1,t} =t,\quad x_{2,t}=t^2,\quad \dots\]
. . .
NOT RECOMMENDED
Example: Boston marathon winning times (1/4)
<- boston_marathon |>
marathon filter(Event == "Men's open division") |>
select(-Event) |>
mutate(Minutes = as.numeric(Time) / 60)
|> autoplot(Minutes) + labs(y = "Winning times in minutes") marathon
Example: Boston marathon winning times (2/4)
<- marathon |>
fit_trends model(
# Linear trend
linear = TSLM(Minutes ~ trend()),
# Exponential trend
exponential = TSLM(log(Minutes) ~ trend()),
# Piecewise linear trend
piecewise = TSLM(Minutes ~ trend(knots = c(1940, 1980)))
)
fit_trends
# A mable: 1 x 3
linear exponential piecewise
<model> <model> <model>
1 <TSLM> <TSLM> <TSLM>
Example: Boston marathon winning times (3/4)
|>
fit_trends forecast(h = 10) |>
autoplot(marathon)
<- fit_trends |> forecast(h = 10)
fc_trends |>
marathon autoplot(Minutes) +
geom_line(
data = fitted(fit_trends),
aes(y = .fitted, colour = .model)
+
) autolayer(fc_trends, alpha = 0.5, level = 95) +
labs(
y = "Minutes",
title = "Boston marathon winning times"
)
Example: Boston marathon winning times (4/4)
|>
fit_trends select(piecewise) |>
gg_tsresiduals()
Correlation is not causation
When \(x\) is useful for predicting \(y\), it is not necessarily causing \(y\).
e.g., predict number of swimmers \(y\) using number of ice-creams sold \(x\).
Correlations are useful for forecasting, even when there is no causality.
Better models usually involve causal relationships (e.g., temperature \(x\) and people \(z\) to predict swimmers \(y\)).
Multicollinearity
In regression analysis, multicollinearity occurs when:
- Two predictors are highly correlated (i.e., the correlation between them is close to \(\pm1\)).
- A linear combination of some of the predictors is highly correlated with another predictor.
- A linear combination of one subset of predictors is highly correlated with a linear combination of another subset of predictors.
Multicollinearity
If multicollinearity exists
- the numerical estimates of coefficients may be wrong (worse in Excel than in a statistics package)
- don’t rely on the \(p\)-values to determine significance.
- there is no problem with model predictions provided the predictors used for forecasting are within the range used for fitting.
- omitting variables can help.
- combining variables can help.