task evaluation concurrency #2820

lefou · 2023-10-04T15:30:14Z

lefou
Oct 4, 2023
Maintainer

I wonder if their is no special handling for task concurrency in Mill just because it was never an issue, or because it was considered as too slow? I almost never experienced any such issue in my daily work with Mill, but some users already had those.

These are the potential scenarios

One single Mill process, there is no need for any special concurrency management, even when Mill is executed with parallel task processing, there is no risk, as the dependency graph ensures each target is only evaluated once.
One single Mill process processing more than one request at a time, e.g. as BSP server. It can happen, if two requests cover the same transitive dependency targets, that a targets will run more that once at the same time. This can result is all kind of typical concurrency issue, e.g. A missing or already existing T.dest folder. We could easily add some in-memory lock table to never evaluate a target more than once at a time. The latest documented case is BSP: tasks seem to be executed duplicately and concurrently #2818.
Two Mill processes can evaluate the same target. E.g. we run Mill on two different terminals. The potential issues and effects are the same as in point 2., but we can't apply some cheap in-process locks to mitigate. We probably need to use some filesystem based lock mechanism, which may slow down the evaluation process.

cc @lihaoyi @lolgab

lihaoyi · 2023-10-04T23:09:51Z

lihaoyi
Oct 4, 2023
Maintainer

There was never any serious thought around this issue, because "typically" it didn't cause issues: e.g. for IDE support, you would run GenIdea first, and then separate commands after. I've hit race conditions in the past when I forgot I had another mill process on --watch in another terminal, though that was very rare. But with IDE integration and BSP, the chance for concurrency has gone up substantially, so things may be different now.

Bazel does a coarse grained "lock entire repository" thing. It's a bit annoying, and sometimes means the CLI is blocked on IDE import and vice versa. But in practice it works out OK enough even for large repositories. So I think that's an option for us in Mill. We may want to do something better, but it's an option. But it does have a big downside in that the "common case" where multiple Mill commands are running and not causing problems, it will block things and make things slower. So that's why it was not an obvious win and was not done initially
Another option would be something more fine-grained using the dependency graph; tasks depend on the output of tasks upstream of them in the graph, so in theory each time we lock something we would need to lock the upstream transitive graph subset. But that can be done by taking file locks on each individual target in evaluation order as the evaluation progresses. The consistent ordering would also help ensure that we don't get deadlocks, though we would need to be careful of edge cases where the task graph topology changes between concurrent runs that could result in locks being taken in different orders and deadlocking
There may be even finer grained locking possible, e.g. only file-locking upstream tasks for which you have a PathRef referencing their .dest folder. I haven't fully thought this one through w.r.t. all the edge cases, but it seems like it should be possible somehow. Most tasks do not return PathRefs, so by limiting locking to the tasks that do it greatly reduces contention while in theory becoming robust to any disk-based mutation
There's probably some things we can do here with reader-writer locks, e.g. letting Mill evaluations share a target they only read from, but only allow write when there are no other readers (and vice versa). This could mitigate a lot of the downsides of starting to lock things up: in any particular evaluation they're typically only writing to one (-j 1) or a small number (-j n) of targets that are under active evaluation, while potentially reading from a large number of upstream targets that were evaluated earlier. This way all the read locks would not get in the way of each other. I'm not familiar with how RW locks work on the filesystem/inter-process level, but it seems like it should be possible. java.nio.channels.FileLock allows you to take shared or exclusive locks, which should fit right into this use case

0 replies

lefou · 2023-10-05T16:03:28Z

lefou
Oct 5, 2023
Maintainer Author

Related issue: FileAlreadyExistsException when compiling bridges #2826

1 reply

lefou Jan 24, 2024
Maintainer Author

Related issue: BuildInfo fails to write the same file twice #1711

lefou · 2023-10-05T20:25:48Z

lefou
Oct 5, 2023
Maintainer Author

I think we may want to differentiate between two concurrency scenarios.

Concurrent cache faults resulting in parallel evaluation of the same tasks.
Concurrent invalidation of a task due to upstream changes while some downstream dependencies still (want to) access the previous results.

All instances we currently witnessed and linked are examples for scenario 1. This is great, as it can be much easier solved than 2. A simple lock should be sufficient to avoid a second parallel evaluation of the same task. We just need to wait until the first evaluation is finished and then re-use it's result (after some checks) for all waiting parallel evaluations as well. Essentially, we solve this by synchronizing the task evaluation. I think we can completely ignore the downstream consumers, as those are waiting for the result anyways and we most likely won't change the result once it's available.

Having a solution for scenario 2. is harder to achieve but it's also less important. We may always run into situations where changes to task sources have the ability to break things (e.g. removing a source file which we try to compile at the same time), so it's probably ok to keep these potential issues a while longer but tackle the apparent and disruptive concurrency issues first.

0 replies

lihaoyi · 2023-10-06T03:44:28Z

lihaoyi
Oct 6, 2023
Maintainer

Yeah a simpler per-target lock sounds like a good first step

0 replies

lefou · 2024-01-18T12:32:28Z

lefou
Jan 18, 2024
Maintainer Author

A first implementation doing the in-process and in-memory task synchronization is ready for review.

Synchronize evaluateGroupCached to avoid concurrent access to cache #2980

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task evaluation concurrency #2820

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

task evaluation concurrency #2820

lefou Oct 4, 2023 Maintainer

Replies: 5 comments · 1 reply

lihaoyi Oct 4, 2023 Maintainer

lefou Oct 5, 2023 Maintainer Author

lefou Jan 24, 2024 Maintainer Author

lefou Oct 5, 2023 Maintainer Author

lihaoyi Oct 6, 2023 Maintainer

lefou Jan 18, 2024 Maintainer Author

lefou
Oct 4, 2023
Maintainer

Replies: 5 comments 1 reply

lihaoyi
Oct 4, 2023
Maintainer

lefou
Oct 5, 2023
Maintainer Author

lefou Jan 24, 2024
Maintainer Author

lefou
Oct 5, 2023
Maintainer Author

lihaoyi
Oct 6, 2023
Maintainer

lefou
Jan 18, 2024
Maintainer Author