-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SeasonalNaive
(and other models) spend a lot of time acquiring thread locks
#898
Comments
This is affecting me, too! Exact same problem |
Hey @caseytomlin, thanks for raising this. This is expected for very simple models, since the overhead of running multiprocessing is greater than the actual model training, so if you're using My guess is that you're seeing the locks because statsforecast/statsforecast/core.py Lines 190 to 192 in 7f60571
and when n_jobs>1 time time is spent waiting for the results here statsforecast/statsforecast/core.py Line 1320 in 7f60571
Here's a colab notebook with the comparison. I don't get a nice speedup there with multiprocessing (4.5s |
@jmoralez thanks for the quick reply and the hints! I will try line_profiler. Just to clarify, we see basically the same behavior with |
Do your series differ a lot in length? The scheduling now works per serie, to support progress bars and also (hopefully) better scheduling, since previously we just partitioned all series in I think the locks you see are actually how |
The series in our original use case do differ somewhat in length, but I can see the behavior when using dummy data like above. I didn't get to the line profiler yet, but there seems to be a threshold in the number of series ( Using the same code as above on an 8-cpu machine (replacing
Let me know if I can provide more info/tests. Very curious! |
@jmoralez could you point to the package version that does the partitioning the "old" way? |
1.7.5 |
Rerunning the above example with |
For the |
Apologies, I meant with With |
Thanks a lot for working through this issue with us. I've narrowed down the problems and I'll push a fix soon. |
Thanks @jmoralez for the quick fix, but unfortunately I am still running into similar problems using 1.7.7.1. However the threshold number of series where the "crossover" occurs has gone up (and for my actual use cases this is the relevant scale). Here are some timings from 1.7.7.1:
With 1.7.5 on the other hand e.g.,
|
Hey. Yeah, sorry I only tested with 100k, it seems that the process pool executor chokes on that many tasks, I'm working on keeping the number of tasks at a given time constant. It already works but I'm trying to improve it a bit, I'll push another PR soon. |
What happened + What you expected to happen
Caveats: I'm not sure if this is expected behavior, and I'm not sure if it's new, but I think it is.
The majority of wall clock time is spent (in the case of
SeasonalNaive
) acquiring thread locks. See below for an example. It seems to take even longer per-time series when the number of series increases (e.g. 10000 series takes 1 minute, but 100000 series takes 20 minutes).The issue is reproducible in multiple environments, but it is particularly problematic in kubeflow pipeline executions (the resulting long running times cause the pods to stagnate and never finish). We have a fair number of time series (~O(10^6)), but
SeasonalNaive
withn_jobs=12
can take 20+ hours to finish (if it finishes at all).Thanks for an otherwise great product!
Versions / Dependencies
Click to expand
Dependencies:Reproducible example
Top of
prun
:Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: