Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support
module.sc
files in subfolders (#3213)
This PR implements support for per-subfolder `build.sc` files, named `module.sc`. This allows large builds to be split up into multiple `build.sc` files each relevant to the sub-folder that they are placed in, rather than having a multi-thousand-line `build.sc` in the root configuring Mill for the entire codebase. ## Semantics 1. The `build.sc` in the project root and all nested `module.sc` files are recursively within the Mill directory tree are discovered (TODO ignoring files listed in `.millignore`), and compiled together 2. We ignore any subfolders that have their own `build.sc` file, indicating that they are the root of their own project and not part of the enclosing folder. 3. Each `foo/bar/qux/build.sc` file is compiled into a `millbuild.foo.bar.qux` `package object`, with the `build.sc` and `module.sc` files being compiled into a `millbuild` `package object` (rather than a plain `object` in the status quo) 4. An `object blah extends Module` within each `foo/bar/qux/build.sc` file can be referenced in code via `foo.bar.qux.blah`, or referenced from the command line via `foo.bar.qux.blah` 5. The base modules of `module.sc` files do not have the `MainModule` tasks: `init`, `version`, `clean`, etc.. Only the base module of the root `build.sc` file has those ## Design ### Uniform vs Non-uniform hierarchy One design decision here is whether a `module.sc` file in a subfolder `foo/bar/` containing `object qux{ def baz }` would have their targets referenced via `foo.bar.qux.baz` syntax, or via some alternative e.g. `foo/bar/qux.baz`. A non-uniform hierarchy `foo/bar/qux.baz` would be similar to how Bazel treats folders v.s. targets non-uniformly `foo/bar:qux-baz`, and also similar to how external modules in Mill are handled e.g. `mill.idea.GenIdea/idea`, as well as existing foreign modules. However, it introduces significant user-facing complexity: 1. What's the difference between `foo/bar/qux.baz` vs `foo/bar.qux.baz` or `foo/bar/qux/baz`? 2. What query syntax would we use to query targets in all nested `module.sc` files rather than just the top-level one e.g. `__.compile`? 3. Would there be a correspondingly different way of referencing nested `module.sc` modules and targets in Scala code as well? Bazel has significant complexity to handle these cases, e.g. query via `...` vs `:all` vs `*`. It works, but it does complicate the user-facing semantics. The alternative of a uniform hierarchy also has downsides: 1. How do you go from a module name e.g. `foo.bar.qux.baz` to the `build.sc` or `module.sc` file in which it is defined? 2. If a module is defined in both the root `build.sc` and in a nested `module.sc`, what happens? I decided to go with a uniform hierarchy where everything, both in top-level `build.sc` and in nested `module.sc`, end up squashed together in a single uniform `foo.bar.qux.baz` hierarchy. ### Package Objects The goal of this is to try and make modules defined in nested `module.sc` files "look the same" as modules defined in the root `build.sc`. There are two possible approaches: 1. Splice the source code of the various nested `module.sc` files into the top-level `object build`. This is possible, but very complex and error prone. Especially when it comes to reporting proper error locations in stack traces (filename/linenumber), this will likely require a custom compiler plugin similar to the `LineNumberPlugin` we have today 5. Convert the `object`s into `package object`s, such that module tree defined in the root `build.sc` becomes synonymous with the JVM package tree. While the `package object`s will cause the compiler to synthesize `object package { ... }` wrappers, that is mostly hidden from the end user. I decided to go with (2) because it seemed much simpler, making use of existing language features rather than trying to force the behavior we want using compiler hackery. Although `package object`s may go away at some point in Scala 3, they should be straightforward to replace with explicit `export foo.*` statements when that time comes. ### Existing Foreign Modules Mill already supports existing `foo.sc` files which support targets and modules being defined within them, but does not support referencing them from the command line. I have removed the ability to define targets and modules in random `foo.sc` files. We should encourage people to put things in `module.sc`, since that would allow the user to co-locate the build logic within the folder containing the files it is related to, rather than as a bunch of loose `foo.sc` scripts. Removing support for modules/targets in `foo.sc` files greatly simplifies the desugaring of these scripts, and since we are already making a breaking change by overhauling how per-folder `module.sc` files work we might as well bundle this additional breakage together (rather than making multiple breaking changes in series) ### `build.sc`/`module.sc` file discovery For this implementation, I chose to make `module.sc` files discovered automatically by traversing the filesystem: we recursively walk the subfolders of the root `build.sc` project, look for any files named `module.sc`. We only traverse folders with `module.sc` files to avoid having to traverse the entire filesystem structure every time. Empty `module.sc` files can be used as necessary to allow `module.sc` files to be placed deeper in the folder tree This matches the behavior of Bazel and SBT in discovering their `BUILD`/`build.sbt` files, and notably goes against Maven/Gradle which require submodules/subprojects to be declared in the top level build config. This design has the following characteristics: 1. In future, if we wish to allow `mill` invocations from within a subfolder, the distinction between `build.sc` and `module.sc` allows us to easily find the "enclosing" project root. 2. It ensures that any folders containing `build.sc`/`module.sc` files that accidentally get included within a Mill build do not end up getting picked up and confusing the top-level build, because we automatically skip any subfolders containing `build.sc` 3. Similarly, it ensures that adding a `build.sc` file "enclosing" an existing project, it would not affect Mill invocations in the inner project, because we only walk to the nearest enclosing `build.sc` file to find the project root 4. We do not automatically traverse deeply into sub-folders to discover `module.sc` files, which means that it should be almost impossible to accidentally pick up `module.sc` files that happen to be on the filesystem but you did not intend to include in the build This mechanism should do the right thing 99.9% of the time. For the last 0.1% where it doesn't do the right thing, we can add a `.millignore`/`.config/millignore` file to support ignoring things we don't want picked up, but I expect that should be a very rare edge case ## Task Resolution I have aimed to keep the logic in `resolve/` mostly intact. The core change is replacing `rootModule: BaseModule` with `baseModules: BaseModuleTree`, which provides enough metadata to allow `resolveDirectChildren` and `resolveTransitiveChildren` to find `BaseModule`s in sub-folders in addition to `Module` `object`s nested within the parent `Module`. Other than that, the logic should be basically unchanged, which hopefully should mitigate the risk of introducing new bugs ## Compatibility This change is not binary compatible, and the change in the `.sc` file desugaring is invasive enough we should consider it a major breaking change. This will need to go into Mill 0.12.x ## Out-Of-Scope/TODO 1. Running `mill` without a subfolder of the enclosing project. Shouldn't be hard to add given this design, but the PR is complicated enough as is that I'd like to leave it for follow up 2. Error reporting when a module is duplicated in an enclosing `object` and in a nested `module.sc` file. Again, probably not hard to add, but can happen in a follow up Pull request: #3213
- Loading branch information