[AutoARIMA] Major hidden bug in `auto_arima_f` #897

robert-robison · 2024-09-05T01:10:12Z

What happened + What you expected to happen

A bug in auto_arima_f is causing the wrong model to be returned in many cases. What frequently is returned is the best model but with no constant terms.

The relevant section is in the try_params function within auto_arima_f: https://github.com/Nixtla/statsforecast/blob/7f60571b1242413b372028504e60b56d0d566214/statsforecast/arima.py#L2149C1-L2152C10

The issue the constant value is not being passed through to p_myarima, so even if you set constant = False in the try_params call, the arima model that fits will still include it.

This is the chain reaction that leads to the wrong model being selected:

Toggling the constant value is the last part of the stepwise while loop. If we get there, all other options have been exhausted.
Since the constant toggle is being ignored in try_params, it will return the same fit["ic"] value as the bestfit model. Since it's the exact same, improved will be False, and we will exit the stepwise process.
The code immediately after the stepwise process uses np.argsort to sort by lowest ic value. There will be two models with the same ic--the actual best model, and the best model with constant = False. In my experiments, for whatever reason it often selects the latter.
The result is the actual best model with the constant removed, which is often significantly worse.

The current code:

    def try_params(p, d, q, P, D, Q, constant, k, bestfit):
        improved = False
        if k >= results.shape[0]:
            return k, bestfit, improved
        fit = p_myarima(
            order=(p, d, q),
            seasonal={"order": (P, D, Q), "period": m},
        )
        results[k] = (p, d, q, P, D, Q, constant, fit["ic"])
        k += 1
        if fit["ic"] < bestfit["ic"]:
            bestfit = fit
            improved = True
        return k, bestfit, improved

What the code should be:

    def try_params(p, d, q, P, D, Q, constant, k, bestfit):
        improved = False
        if k >= results.shape[0]:
            return k, bestfit, improved
        fit = p_myarima(
            order=(p, d, q),
            seasonal={"order": (P, D, Q), "period": m},
            constant=constant,  # This is the line to be added
        )
        results[k] = (p, d, q, P, D, Q, constant, fit["ic"])
        k += 1
        if fit["ic"] < bestfit["ic"]:
            bestfit = fit
            improved = True
        return k, bestfit, improved

Versions / Dependencies

Click to expand

Dependencies:

numpy==1.22.4
pandas==2.2.1
statsforecast==1.7.1

Reproducible example

For a reproducible example, I just generated 5 time series, fit AutoARIMA and compared the result to an ARIMA with the same order and seasonal order but with constant turned on. In one of the five the results are the same, but the others show the significant disparity in AIC caused by this issue.

import numpy as np
from statsforecast.arima import AutoARIMA
from statsforecast.models import ARIMA


# Generate 5 random time series
np.random.seed(42)
n_series = 5
n_observations = 100

time_series_list = []
for _ in range(n_series):
    # Generate components with varying strengths
    trend_strength = np.random.uniform(0, 0.2)  # Uniform distribution for trend strength
    seasonality_strength = np.random.uniform(0, 10)  # Uniform distribution for seasonality strength
    noise_level = np.random.uniform(0.1, 1)

    # Create time series
    time = np.arange(n_observations)
    series = np.zeros(n_observations)
    
    series += trend_strength * time
    series += seasonality_strength * np.sin(2 * np.pi * time / 12)
    
    series += np.random.normal(0, noise_level, n_observations)
    
    time_series_list.append(series)

# Fit models and compare ICs
auto_arima_ic = []
arima_ic = []

def pretty_dict(d):
    return {k: f"{v:.2f}" for k, v in d.items()}

for series in time_series_list:
    # Fit AutoARIMA
    auto_model = AutoARIMA(period=12)
    auto_model.fit(series)

    
    # Get parameters from AutoARIMA
    p, q, P, Q, m, d, D = auto_model.model_.model["arma"]
    
    # Fit ARIMA with constant
    arima_model = ARIMA(order=(p, d, q),
                        seasonal_order=(P, D, Q),
                        season_length=12,
                        include_constant=True)
    arima_model.fit(series)


    print(f"AutoARIMA Coefficients: {pretty_dict(auto_model.model_.model['coef'])}")
    print(f"ARIMA Coefficients: {pretty_dict(arima_model.model_['coef'])}")
    print(f"AutoARIMA AICc: {auto_model.model_.model['aicc']:.2f}")
    print(f"ARIMA AICc: {arima_model.model_['aicc']:.2f}\n")

AutoARIMA Coefficients: {'sar1': '0.03', 'sma1': '-0.00'}
ARIMA Coefficients: {'sar1': '-0.38', 'sma1': '-0.77', 'drift': '0.08'}
AutoARIMA AICc: 324.35
ARIMA AICc: 233.16

AutoARIMA Coefficients: {'sma1': '0.26'}
ARIMA Coefficients: {'sma1': '-0.74', 'drift': '0.14'}
AutoARIMA AICc: 382.74
ARIMA AICc: 271.36

AutoARIMA Coefficients: {'sma1': '0.35'}
ARIMA Coefficients: {'sma1': '-0.79', 'drift': '0.16'}
AutoARIMA AICc: 383.81
ARIMA AICc: 234.84

AutoARIMA Coefficients: {'sar1': '0.76', 'sma1': '-0.83'}
ARIMA Coefficients: {'sar1': '-0.07', 'sma1': '-0.83', 'drift': '0.19'}
AutoARIMA AICc: 435.30
ARIMA AICc: 191.53

AutoARIMA Coefficients: {'sar1': '-0.22', 'sma1': '-0.40', 'sma2': '-0.41', 'drift': '0.17'}
ARIMA Coefficients: {'sar1': '-0.22', 'sma1': '-0.40', 'sma2': '-0.41', 'drift': '0.17'}
AutoARIMA AICc: 23.89
ARIMA AICc: 23.89

Issue Severity

None

The text was updated successfully, but these errors were encountered:

jmoralez · 2024-09-05T01:33:15Z

Hey @robert-robison, thanks for the through report! Are you interested in contributing the fix?

robert-robison · 2024-09-05T12:47:40Z

Sure, would be happy to! Could take care of #896 at the same time?

jmoralez · 2024-09-05T18:10:00Z

Please do that in a separate PR, since that's a breaking change and it's better to categorize each contribution for the release notes.

robert-robison added the bug label Sep 5, 2024

robert-robison mentioned this issue Sep 6, 2024

fix: pass constant in autoarima stepwise #899

Merged

jmoralez closed this as completed in #899 Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoARIMA] Major hidden bug in `auto_arima_f` #897

[AutoARIMA] Major hidden bug in `auto_arima_f` #897

robert-robison commented Sep 5, 2024

jmoralez commented Sep 5, 2024

robert-robison commented Sep 5, 2024

jmoralez commented Sep 5, 2024

[AutoARIMA] Major hidden bug in auto_arima_f #897

[AutoARIMA] Major hidden bug in auto_arima_f #897

Comments

robert-robison commented Sep 5, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproducible example

Issue Severity

jmoralez commented Sep 5, 2024

robert-robison commented Sep 5, 2024

jmoralez commented Sep 5, 2024

[AutoARIMA] Major hidden bug in `auto_arima_f` #897

[AutoARIMA] Major hidden bug in `auto_arima_f` #897