-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: multithread linting #129
base: main
Are you sure you want to change the base?
Conversation
I don't see any mention of the previous RFCs around parallelisation: Both of these have a lot of context about the difficulties of parallelisation outside of the core rules - eg in cases where the parser or the rules store state in any form.
Naively parallelising by just "randomly" distributing files across threads may lead to a SLOWER lint run in cases where people use such stateful plugins because the cached work may need to be redone once for each thread. I would like to see such usecases addressed as part of this RFC given that these mentioned usecases are very prevalent - with both mentioned plugins in use by the majority of the ecosystem. These problems have been discussed before in the context of language plugins and parser contexts (I can try to find the threads a bit later). |
Thanks for the input @bradzacher. How would you go about incorporating context from #42 and #87 into this RFC? I see that #42 suggests introducing a plugin setting As for #87, it seems about an unrelated feature that doesn't even require multithreading. But I get why it would be beneficial to limit the number of instances of the same parser across threads, especially if the parser takes a long time to load its initial state, like typescript-eslint with type-aware parsing. If you have any concrete suggestions on how to do that, I'd love to know.
I imagine the way one would address such use cases is by making no changes, i.e. not enabling multithread linting if the results are not satisfactory. But if something can be done to improve performance for popular plugins that would be awesome. |
To be clear - I'm all for such a system existing. Like caching it can vastly improve the experience for those that fit within the bounds. The thing I want to make sure of is that we ensure the bounds are either intentionally designed to be tight to avoid complexity explosion, or that we are at least planning a path forward for the mentioned cases. #87 has some discussions around parallel parsing which are relevant to the sorts of ideas we'd need to consider here. Some other relevant discussions can be found in I'm pretty swamped atm cos holiday season and kids and probably won't be able to get back to this properly until the new year. |
Co-authored-by: 唯然 <weiran.zsd@outlook.com>
Thanks for putting this together. I'm going to need more time to dig into the details, and I really appreciate the amount of thought and explanation you've included in this RFC. I have a few high-level thoughts from reviewing this today:
|
Yes, it would be interesting to look into other tools to understand how they handle concurrency. This could actually bring in some interesting ideas even if the general approach is different. I was thinking to check Prettier but haven't managed to do that yet. Jest and Ava are also good candidates.
Thanks for the list. I missed most of those links while skimming through the discussion in eslint#3565. I'll be sure to go through the items and add a prior art mention.
Workers don't need to create a new instance of |
One thing I'd like to point out before it's too late and just in case it's relevant: multithreading makes multifile analysis harder. If there ever comes a system where a single rule can look at the contents of multiple files—as implemented in I imagine this kind of analysis is not really in the scope for ESLint at the moment (I haven't seen anything in the RFCs at least), as it would a high complexity impact on the project (I've written about some tradeoffs in this post). But if this proposal were to be implemented without consideration for multifile analysis—which seems to be the case currently—then the cost of implementing it later would skyrocket and I imagine would lead it to never be implemented. I'm looking forward to see how this evolves, as I have unfortunately not figured out multi-threading well enough for this task to even try implementing it for |
The only way to parallelise and efficiently maintain cross-file analysis is with shared memory. Unfortunately in JS as a whole this is nigh-impossible with the current state of the world. Sharing memory via The shared structs proposal would go a long way in enabling shared memory models and is currently at stage 2 -- so there is some hope for this in the relatively near future! I know the TS team is eagerly looking forward to this proposal landing in node so they can explore parallelising TS's type analysis. For now at least the best way to efficiently do parallelised multi-file analysis is to do some "grouping aware" task splitting. I.e. instead of assigning files to threads randomly you instead try to keep "related" files in the same thread to minimise duplication of data across threads. But this is what I was alluding to in my above comments [1] [2] -- there needs to be an explicit decision encoded in this RFC:
The former is "the easy route" for obvious reasons -- there's a lot to think about and work through for the latter. As a quick-and-dirty example that we have discussed before (see eslint/eslint#16819): Just to reiterate my earlier comments -- I'm 100% onboard with the going with the former decision and ignoring the cross-file problem. I just want to ensure that this case has been fully considered and intentionally discarded, or that the design has a consideration to eventually grow the parallelism system to support such usecases. |
I think what we're going for here is effectively a "stop the bleeding" situation where we can get ESLint's current behavior to go faster, as this becomes an even bigger problem as people start to use ESLint to lint their JSON, CSS, and Markdown files, significantly increasing the number of files an average user will lint. I'm on board with discounting the cross-file problem at this point, as I think it's clear that many companies have created their own concurrent linting solutions built on top of ESLint that also discount this issue. I would like to revisit cross-file linting at some point in the future, but before we can do that, we really need to get the core rewrite in progress. |
**[Backstage](https://backstage.io/)** | ||
|
||
The Backstage CLI has an option to run ESLint (currently ESLint v8) in multithread mode along with other tools. | ||
Each file is linted in the next available thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean each thread is passed a single file to lint, passes back the results, and then gets another file to lint?
|
||
**[Trunk Code Quality](https://trunk.io/code-quality)** | ||
|
||
Trunk manages to parallelize ESLint and other linters by splitting the workload over multiple processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean it's splitting up the file list and spreading across multiple processes? Or just one process for ESLint and other processes for other tools?
**[eslint-p](https://www.npmjs.com/package/eslint-p)** | ||
|
||
A CLI-only wrapper around ESLint v9 that adds multithread linting support, authored by myself. | ||
After starting a worker thread pool, each file is linted in the next available thread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question: threads are just passed one file at a time?
This is of particular interest to me as an approach vs. passing multiple files to each thread and limiting the back-and-forth communication.
Summary
This document proposes adding a new multithread mode for
ESLint#lintFiles()
. The aim of multithread linting is to speed up ESLint by allowing workload distribution across multiple CPU cores.Related Issues
eslint/eslint#3565