-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codegen: on demand distribution to forked processes #14273
Codegen: on demand distribution to forked processes #14273
Conversation
Instead of pre-slicing the list of compilation units then having each forked process handle its personal list, with some finishing before the others, this patch adds bidirectional pipes to push the compilation units as the forked processes make progress, so that each forked process will continue to compile for as long as there is something to compile. The forker processes are overqueued with one compilation unit to avoid latencies (they compile the next unit while the main process pushes the next). This only improves performance when there are multiple compilation units. It won't have any effect when `--single-module` (implied by `--release`) is set, and will have little impact when a module is extra large (e.g. crystal specs) unless maybe when there are multiple of them and they were set to the same forked process.
The method exists but raises NotImplementedError which causes Process.new(pid) to fail because it doesn't exist with a `PidT`). This patch fixes the issue by checking LibC.fork directly. Weirdly, both were used before these changes and compilation did succeed.
3288c59
to
2f1af88
Compare
Let's note that we default to 8 processes, which used to be fine when we had 4 cores with hyperthreading (8 threads), but might now be much in phase with the current CPU trends where 16, 20 or 32 threads are becoming the norm, though we might not want to fork 160 processes on an Ampere Altra 😁 Maybe we could improve the default value to be Maybe we could accept a |
And... now I'm wondering about what happens when we have fewer compilation units than configured threads 🤔 Maybe I should limit the number of forked processes, just in case 😅 |
I'd like to merge this as is and fine tune the number of threads in a follow-up. |
I just pushed one last commit to fix edge cases with too little (must be >= 1) or too many processes (pointless). The comment was to leave some indication about how to go further. |
Yeah, that clamp makes sense for sanity. |
This patch optimizes the
Codegen (bc+obj)
pass of the compiler by taking better advantage of multiple CPU cores.Instead of pre-slicing the list of compilation units then having each forked process handle its personal list, with some finishing before the others, this patch adds bidirectional pipes to push the compilation units as the forked processes make progress, so that each forked process will continue to compile for as long as there is something to compile.
The forker processes are overqueued with one compilation unit to avoid latencies (they compile the next unit while the main process pushes the next).
This only improves performance when there are multiple compilation units. It won't have any effect when
--single-module
(implied by--release
) is set, and will have little impact when a module is extra large (e.g. crystal specs) unless maybe when there are multiple of them and they were set to the same forked process.Example: while compiling Crystal the
Codegen (bc+obj)
pass goes from ~18s to ~11s on an empty cache, and ~5s to ~3s on a full cache recompilation.Related to #14227 but using forks+pipes instead of threads+channel.