Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLEANUP: Revisit pointing to github.com/cran/{pkg} rather than pull, tweak, push here #12

Open
HenrikBengtsson opened this issue Jan 10, 2025 · 5 comments

Comments

@HenrikBengtsson
Copy link
Contributor

HenrikBengtsson commented Jan 10, 2025

@jeroen wrote in r-universe-org/help#558 (comment):

By the way, are you aware that you could also simply pull in the git sources from github.com/cran/{pkg} instead of pushing them all in branches in your registry repo?

We should revisit this.

The reason we're doing what we're doing right now, is that we inject https://cranhaven.r-universe.dev to Additional_resources for all packages' DESCRIPTION file. If I recall correctly, the reason for doing that was so that packages, that got archived as a side effect of another package being archived, could still be installed from https://cranhaven.r-universe.dev. Without it, I think install.packages(...) failed to pick up the archived parent from https://cranhaven.r-universe.dev.

Now, this was all added during the early days, and it could have been that I was inpatient. So, it's worth revisiting this, because that would make this repo really lightweight.

@jeroen
Copy link

jeroen commented Jan 10, 2025

If you install packages from cranhaven using the normal using install.packages then any dependencies on on this same repos should automatically be available from here as well? E.g.:

install.packages("rticulate", repos = c("https://cranhaven.r-universe.dev", "https://cloud.r-project.org"))

Afaik this has always been the case, as it is a standard feature of install.packages.

@HenrikBengtsson
Copy link
Contributor Author

HenrikBengtsson commented Jan 10, 2025

You're definitely right about install. packages().

It just struck me that it was something about R universe failing to build some of the packages that depend on others, depending on what order R universe attempted to build them.

Now, it could be that I wasn't patient enough, i.e. I didn't wait long enough to give the build system a chance to come back to the repo for another build cycle or two. On that note, how does the build system work? Does it build a dependency DAG and build packages in the correct order, or is it just done in lexicographic order, or something else?

@jeroen
Copy link

jeroen commented Jan 11, 2025

It just builds everything concurrently. Failed builds are tried again the next day, and so on.

@HenrikBengtsson
Copy link
Contributor Author

HenrikBengtsson commented Jan 11, 2025

It just builds everything concurrently. Failed builds are tried again the next day, and so on.

Thanks. So, that's probably why I decided on the current strategy. We definitely have packages being archived when a "parent" package is archive. For example, if packages 'ChildA' and 'ChildB' are archived when 'ParentA' is archive, CRANhaven will provide 'ParentA' almost instantly, but it will take until the next day for 'ChildA' and 'ChildB' to be hosted on CRANhaven. If additional packages are archived because of 'ChildA' and 'ChildB', it will take another day. Since we want CRANhaven to immediately pick up and host packages as soon as they are archived, we probably have to keep the current strategy until we find a better approach.

Knowing how the R universe build system works right now, I guess we could orchestrate it from our end by adding packages 'ChildA' and 'ChildB' to packages.json only when 'ParentA' has been processed.

Ultimately, it sounds like R Universe could benefit from figuring out the optimal build order, or alternatively, just install everything locally first, then build in parallel as now. That way, a new R universe repo with internal dependencies could be fully up and running when first created. I think that could especially benefit first timers, which otherwise might be confused and spend time troubleshooting (I guess, like me). It would also avoid wasting some initial build cycles that are guaranteed to fail.

@jeroen
Copy link

jeroen commented Jan 12, 2025

I don't think the git origin of the package affects in the build order or result in any way? Afaik these issues are unrelated.

In R-universe, just like CRAN, each build job only builds the package itself, not other packages. So just like CRAN, the assumption is that the dependencies are also on CRAN at the time you submit the new package.

However the problem resolves itself quite quickly. If you add 2 interdependent packages exactly at the same time, it might happen that one build fails because the other is not published. But the situation will automatically fix itself quickly as it automatically retries the failed build once the other package has been published.

We could somehow try to pause builds if some dependency is not yet published, but that is actually quite complex. It is much easier to let fail and trigger a retry later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants