You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A bug in auto_arima_f is causing the wrong model to be returned in many cases. What frequently is returned is the best model but with no constant terms.
The issue the constant value is not being passed through to p_myarima, so even if you set constant = False in the try_params call, the arima model that fits will still include it.
This is the chain reaction that leads to the wrong model being selected:
Toggling the constant value is the last part of the stepwise while loop. If we get there, all other options have been exhausted.
Since the constant toggle is being ignored in try_params, it will return the same fit["ic"] value as the bestfit model. Since it's the exact same, improved will be False, and we will exit the stepwise process.
The code immediately after the stepwise process uses np.argsort to sort by lowest ic value. There will be two models with the same ic--the actual best model, and the best model with constant = False. In my experiments, for whatever reason it often selects the latter.
The result is the actual best model with the constant removed, which is often significantly worse.
The current code:
deftry_params(p, d, q, P, D, Q, constant, k, bestfit):
improved=Falseifk>=results.shape[0]:
returnk, bestfit, improvedfit=p_myarima(
order=(p, d, q),
seasonal={"order": (P, D, Q), "period": m},
)
results[k] = (p, d, q, P, D, Q, constant, fit["ic"])
k+=1iffit["ic"] <bestfit["ic"]:
bestfit=fitimproved=Truereturnk, bestfit, improved
What the code should be:
deftry_params(p, d, q, P, D, Q, constant, k, bestfit):
improved=Falseifk>=results.shape[0]:
returnk, bestfit, improvedfit=p_myarima(
order=(p, d, q),
seasonal={"order": (P, D, Q), "period": m},
constant=constant, # This is the line to be added
)
results[k] = (p, d, q, P, D, Q, constant, fit["ic"])
k+=1iffit["ic"] <bestfit["ic"]:
bestfit=fitimproved=Truereturnk, bestfit, improved
Versions / Dependencies
Click to expand
Dependencies:
numpy==1.22.4
pandas==2.2.1
statsforecast==1.7.1
Reproducible example
For a reproducible example, I just generated 5 time series, fit AutoARIMA and compared the result to an ARIMA with the same order and seasonal order but with constant turned on. In one of the five the results are the same, but the others show the significant disparity in AIC caused by this issue.
importnumpyasnpfromstatsforecast.arimaimportAutoARIMAfromstatsforecast.modelsimportARIMA# Generate 5 random time seriesnp.random.seed(42)
n_series=5n_observations=100time_series_list= []
for_inrange(n_series):
# Generate components with varying strengthstrend_strength=np.random.uniform(0, 0.2) # Uniform distribution for trend strengthseasonality_strength=np.random.uniform(0, 10) # Uniform distribution for seasonality strengthnoise_level=np.random.uniform(0.1, 1)
# Create time seriestime=np.arange(n_observations)
series=np.zeros(n_observations)
series+=trend_strength*timeseries+=seasonality_strength*np.sin(2*np.pi*time/12)
series+=np.random.normal(0, noise_level, n_observations)
time_series_list.append(series)
# Fit models and compare ICsauto_arima_ic= []
arima_ic= []
defpretty_dict(d):
return {k: f"{v:.2f}"fork, vind.items()}
forseriesintime_series_list:
# Fit AutoARIMAauto_model=AutoARIMA(period=12)
auto_model.fit(series)
# Get parameters from AutoARIMAp, q, P, Q, m, d, D=auto_model.model_.model["arma"]
# Fit ARIMA with constantarima_model=ARIMA(order=(p, d, q),
seasonal_order=(P, D, Q),
season_length=12,
include_constant=True)
arima_model.fit(series)
print(f"AutoARIMA Coefficients: {pretty_dict(auto_model.model_.model['coef'])}")
print(f"ARIMA Coefficients: {pretty_dict(arima_model.model_['coef'])}")
print(f"AutoARIMA AICc: {auto_model.model_.model['aicc']:.2f}")
print(f"ARIMA AICc: {arima_model.model_['aicc']:.2f}\n")
What happened + What you expected to happen
A bug in
auto_arima_f
is causing the wrong model to be returned in many cases. What frequently is returned is the best model but with no constant terms.The relevant section is in the
try_params
function withinauto_arima_f
: https://github.com/Nixtla/statsforecast/blob/7f60571b1242413b372028504e60b56d0d566214/statsforecast/arima.py#L2149C1-L2152C10The issue the
constant
value is not being passed through top_myarima
, so even if you setconstant = False
in thetry_params
call, the arima model that fits will still include it.This is the chain reaction that leads to the wrong model being selected:
constant
value is the last part of the stepwise while loop. If we get there, all other options have been exhausted.constant
toggle is being ignored intry_params
, it will return the samefit["ic"]
value as thebestfit
model. Since it's the exact same,improved
will beFalse
, and we will exit the stepwise process.np.argsort
to sort by lowest ic value. There will be two models with the same ic--the actual best model, and the best model withconstant = False
. In my experiments, for whatever reason it often selects the latter.The current code:
What the code should be:
Versions / Dependencies
Click to expand
Dependencies:numpy==1.22.4
pandas==2.2.1
statsforecast==1.7.1
Reproducible example
For a reproducible example, I just generated 5 time series, fit AutoARIMA and compared the result to an ARIMA with the same order and seasonal order but with constant turned on. In one of the five the results are the same, but the others show the significant disparity in AIC caused by this issue.
Issue Severity
None
The text was updated successfully, but these errors were encountered: