-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vendor resolvelib 1.1.0 #13001
base: main
Are you sure you want to change the base?
Vendor resolvelib 1.1.0 #13001
Conversation
167e384
to
d7e6ee3
Compare
So I checked how resolution performed using https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks, and I have good news and bad news. Firstly, as expected, due to correctness fixes #12768 and #12317 resolve instead of showing Some unexpected good news is this resolves #12305. Some slightly bad news is this turns #12990 from a Build Failure to a ResolutionTooDeep, but this can be fixed if #13017 lands. Some worse news is that the requirements in astral-sh/uv#1398 go from resolving (after some time), to |
d7e6ee3
to
88ceab2
Compare
a87fca9
to
d309e2c
Compare
Ready for review / merge once 24.3 is closed. |
It turns out I accidentally broke depth by changing some resolvelib behavior: pdm-project/pdm#3235 (comment) I am going to investigate further on this to see if it's causing the performance regression I see in astral-sh/uv#1398, if so I'm going to make a PR on resolvelib, or if removing depth doesn't make any difference on performance then it might be worth dropping as a preference like PDM has. |
Okay, I've tried fixing depths and removing depths and I ran the scenarios in https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks to see what is the difference (the left hand side is fixing, the right hand side is removing):
There are some interesting takeaways:
My conclusions from this are:
|
d309e2c
to
3a157bb
Compare
This is ready for review now. I have removed depth as a preference, which unfortunately means removing the only unit tests that were unit testing the PipProvider (though it is tested greatly as part of functional tests), but I will add additional unit tests to |
I'm marking this back to draft while we discuss performance issues with resolvelib 1.1, see discussion here: sarugaku/resolvelib#180 At some point we might have to decide whether to vendor this version of resolvelib, which includes an important correctness fix, and some performance improvements, but some big performance regressions, or decide to wait for a resolvelib 1.2 which might be able to regain some of the performance back in very deep backtracking situations (e.g. boto3 / urllib3). I would very much like input from other maintainers on that, both here and over on the resolvelib performance issue. |
@notatallshaw I'm generally in favour of correctness improvements as AFAIU the user usually has workarounds to deal with excessive backtracking / ResolutionImpossible, i.e. restrict the version ranges pip considers. In contrast, issues like #12317 cause pip to have straight-up broken behaviour. A valid solution exists (and within the backtrack limit), but pip skips over it. Last time I had to debug this with a colleague, it was quite annoying and frustrating. Also, there are additional optimizations in the works, right? I see that you have a draft optimistic backjumping PR on resolvelib's side, and then there's #12317 (prefer direct conflicts on backtrack) on pip's end. How common are the known scenarios where this PR is a net negative? As long as they're not common, I'd say the small amount of fallout doesn't outweigh the correctness improvements. |
I agree with @ichard26 - we should prioritise correctness over performance. Claiming there is no solution when there is one, is a bug, and we should prioritise fixing that even if it requires a slower approach. Pip's focus is on standards compliance and correctness, with performance being secondary (although clearly important as long as the first two criteria don't suffer). If users need performance and are willing to compromise on the other factors, then they are likely to be choosing |
Thanks both your feedback, the problem here is performance means the user will see a "ResolutionTooDeep", there are definitely real cases involving boto3 and urllib3 as dependencies where this will happen with resolvelib 1.1.0 where it did not happen in the previous version. So if this is merged we are likely to see additional users complain. If other maintainers are comfortable with that, knowing we are achieving correctness as best we can, at the cost of some users hitting resolution problems for now, then I am going to mark this as ready for review, if we can merge soon, I can create another PR to add a few simple performance improvements with the new API before pip 25.0. I took a look at implementing a fallback inside resolvelib but also found it would cause at least one known "ResolutionTooDeep" that resolvelib 1.1.0 doesn't 🙁. So I'm still on the fence about that. I'm going to take a deeper look in a few weeks if we can have our cake and eat it, i.e. introduce this optimization while remaining formally correct, but my conclusion may end up being that the resolvelib API needs to change in non-trivial ways, or we need to consider writing a new resolver, one that learns the lessons from resolvelib, uv, and Poetry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's to another day where I am annoyed that botocore has thousands of releases... even if it keeps us honest.
news/resolvelib.vendor.rst
Outdated
@@ -0,0 +1 @@ | |||
Upgrade resolvelib to 1.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably call out that this improves correctness, but may introduce worse performance in certain situations. I'm being hand-wavy with my wording as I assume you can come up with something better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the news item, and seperately I think I will make ResolutionTooDeep
a diagnostic exception and point users to a specific pip issue where I can collect scenarios where this is happens.
I'll be blunt, I think that "thousands of releases" should stress a backtracking resolver (or for that matter any form of resolver). If users are writing requirements that state that any of many thousands of botocore releases are equally valid, then I'd question those requirements. Obviously, no-one is actually going to bother trying to add lower bounds to every botocore requirement out there - and it only takes one unbounded requirement, selected at the wrong point in the process, to blow up. But if we're optimising for the use case of "please pick any one of these 3,000 candidate versions" then something is seriously skewed1. Maybe what we should have is some sort of limit in the finder - by default, only consider the latest 1002 versions of a project. Then, people using a pathological project like botocore would have to explicitly opt-in with Footnotes |
bd61563
to
96ef7e2
Compare
96ef7e2
to
aaa3ce7
Compare
Setting aside the problem of thousands of releases, these complex dependency graphs naturally arise from how Python packaging handles dependencies. For example, Python package dependencies are an NP hard problem, other languages (e.g. rust and NodeJS) sidestep this by allowing multiple versions of the same library as a solution. Even without boto3 there are many other examples where resolution is very hard, just boto3 sticks out more obviously because it can cause a huge number of downloads.
I'm dubious but would be happy to discuss elsewhere, I think this would break resolutions in a non-obvious way. Let's say you have some package And the problem is the number of possible solutions can grow exponentially along dimensions other than number of candidates, e.g. number of requirements. So this doesn’t work when the number of requirements are causing the issue, in which case you may need a very small number of max candidates (e.g. 10) to avoid a ResolutionTooDeep, and you may just get a ResolutionImpossible instead and you would have to do significant analysis to understand why. |
Agreed 100%. I'm not trying to minimise the complexity of the problem here, but I think that packages like botocore with thousands of releases are (and should be treated as) very much outliers. I don't think we should compromise correctness, or the performance of simpler cases, just to accommodate such projects.
It's quite possible it would. I haven't done any sort of analysis at this point. I'm happy to drop the idea. |
I'm moving this to 25.1, I'm not comfortable landing this without being able to follow it up with related performance improvements, and I have had an unusually busy January and have not had time to make that PR with those improvements. So, I don't want this landing right before a milestone deadline, so will push to merge this early in the 25.1 cycle, as soon as 25.0 is considered done. |
Fixes #12768
Fixes #12317
Fixes #12305
Fixes #12497
Fixes #12754 (comment)
Draft PR of resolvelib beta to see if it passes tests. Will mark as ready to review and ask for review/merge when final is ready.