Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoARIMA] Major hidden bug in auto_arima_f #897

Closed
robert-robison opened this issue Sep 5, 2024 · 3 comments · Fixed by #899
Closed

[AutoARIMA] Major hidden bug in auto_arima_f #897

robert-robison opened this issue Sep 5, 2024 · 3 comments · Fixed by #899
Labels

Comments

@robert-robison
Copy link
Contributor

What happened + What you expected to happen

A bug in auto_arima_f is causing the wrong model to be returned in many cases. What frequently is returned is the best model but with no constant terms.

The relevant section is in the try_params function within auto_arima_f: https://github.com/Nixtla/statsforecast/blob/7f60571b1242413b372028504e60b56d0d566214/statsforecast/arima.py#L2149C1-L2152C10

The issue the constant value is not being passed through to p_myarima, so even if you set constant = False in the try_params call, the arima model that fits will still include it.

This is the chain reaction that leads to the wrong model being selected:

  1. Toggling the constant value is the last part of the stepwise while loop. If we get there, all other options have been exhausted.
  2. Since the constant toggle is being ignored in try_params, it will return the same fit["ic"] value as the bestfit model. Since it's the exact same, improved will be False, and we will exit the stepwise process.
  3. The code immediately after the stepwise process uses np.argsort to sort by lowest ic value. There will be two models with the same ic--the actual best model, and the best model with constant = False. In my experiments, for whatever reason it often selects the latter.
  4. The result is the actual best model with the constant removed, which is often significantly worse.

The current code:

    def try_params(p, d, q, P, D, Q, constant, k, bestfit):
        improved = False
        if k >= results.shape[0]:
            return k, bestfit, improved
        fit = p_myarima(
            order=(p, d, q),
            seasonal={"order": (P, D, Q), "period": m},
        )
        results[k] = (p, d, q, P, D, Q, constant, fit["ic"])
        k += 1
        if fit["ic"] < bestfit["ic"]:
            bestfit = fit
            improved = True
        return k, bestfit, improved

What the code should be:

    def try_params(p, d, q, P, D, Q, constant, k, bestfit):
        improved = False
        if k >= results.shape[0]:
            return k, bestfit, improved
        fit = p_myarima(
            order=(p, d, q),
            seasonal={"order": (P, D, Q), "period": m},
            constant=constant,  # This is the line to be added
        )
        results[k] = (p, d, q, P, D, Q, constant, fit["ic"])
        k += 1
        if fit["ic"] < bestfit["ic"]:
            bestfit = fit
            improved = True
        return k, bestfit, improved

Versions / Dependencies

Click to expand Dependencies:

numpy==1.22.4
pandas==2.2.1
statsforecast==1.7.1

Reproducible example

For a reproducible example, I just generated 5 time series, fit AutoARIMA and compared the result to an ARIMA with the same order and seasonal order but with constant turned on. In one of the five the results are the same, but the others show the significant disparity in AIC caused by this issue.

import numpy as np
from statsforecast.arima import AutoARIMA
from statsforecast.models import ARIMA


# Generate 5 random time series
np.random.seed(42)
n_series = 5
n_observations = 100

time_series_list = []
for _ in range(n_series):
    # Generate components with varying strengths
    trend_strength = np.random.uniform(0, 0.2)  # Uniform distribution for trend strength
    seasonality_strength = np.random.uniform(0, 10)  # Uniform distribution for seasonality strength
    noise_level = np.random.uniform(0.1, 1)

    # Create time series
    time = np.arange(n_observations)
    series = np.zeros(n_observations)
    
    series += trend_strength * time
    series += seasonality_strength * np.sin(2 * np.pi * time / 12)
    
    series += np.random.normal(0, noise_level, n_observations)
    
    time_series_list.append(series)

# Fit models and compare ICs
auto_arima_ic = []
arima_ic = []

def pretty_dict(d):
    return {k: f"{v:.2f}" for k, v in d.items()}

for series in time_series_list:
    # Fit AutoARIMA
    auto_model = AutoARIMA(period=12)
    auto_model.fit(series)

    
    # Get parameters from AutoARIMA
    p, q, P, Q, m, d, D = auto_model.model_.model["arma"]
    
    # Fit ARIMA with constant
    arima_model = ARIMA(order=(p, d, q),
                        seasonal_order=(P, D, Q),
                        season_length=12,
                        include_constant=True)
    arima_model.fit(series)


    print(f"AutoARIMA Coefficients: {pretty_dict(auto_model.model_.model['coef'])}")
    print(f"ARIMA Coefficients: {pretty_dict(arima_model.model_['coef'])}")
    print(f"AutoARIMA AICc: {auto_model.model_.model['aicc']:.2f}")
    print(f"ARIMA AICc: {arima_model.model_['aicc']:.2f}\n")
AutoARIMA Coefficients: {'sar1': '0.03', 'sma1': '-0.00'}
ARIMA Coefficients: {'sar1': '-0.38', 'sma1': '-0.77', 'drift': '0.08'}
AutoARIMA AICc: 324.35
ARIMA AICc: 233.16

AutoARIMA Coefficients: {'sma1': '0.26'}
ARIMA Coefficients: {'sma1': '-0.74', 'drift': '0.14'}
AutoARIMA AICc: 382.74
ARIMA AICc: 271.36

AutoARIMA Coefficients: {'sma1': '0.35'}
ARIMA Coefficients: {'sma1': '-0.79', 'drift': '0.16'}
AutoARIMA AICc: 383.81
ARIMA AICc: 234.84

AutoARIMA Coefficients: {'sar1': '0.76', 'sma1': '-0.83'}
ARIMA Coefficients: {'sar1': '-0.07', 'sma1': '-0.83', 'drift': '0.19'}
AutoARIMA AICc: 435.30
ARIMA AICc: 191.53

AutoARIMA Coefficients: {'sar1': '-0.22', 'sma1': '-0.40', 'sma2': '-0.41', 'drift': '0.17'}
ARIMA Coefficients: {'sar1': '-0.22', 'sma1': '-0.40', 'sma2': '-0.41', 'drift': '0.17'}
AutoARIMA AICc: 23.89
ARIMA AICc: 23.89

Issue Severity

None

@jmoralez
Copy link
Member

jmoralez commented Sep 5, 2024

Hey @robert-robison, thanks for the through report! Are you interested in contributing the fix?

@robert-robison
Copy link
Contributor Author

Sure, would be happy to! Could take care of #896 at the same time?

@jmoralez
Copy link
Member

jmoralez commented Sep 5, 2024

Please do that in a separate PR, since that's a breaking change and it's better to categorize each contribution for the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants