-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid selecting subnets with insufficient available IP address + test #7623
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Vacant2333 <vacant2333@gmail.com>
…r-aws into merge-with-vacant pull main
…ip-count-subnet Feat select the most availalbe ip count subnet
✅ Deploy Preview for karpenter-docs-prod canceled.
|
Hey @Vacant2333 @saurav-agarwalla |
I added the missing import @saurav-agarwalla |
Thanks for bringing all the changes together. I discussed this at length with the team yesterday and it seems like with this change, we risk breaking customers who might be using secondary networking with Karpenter. This isn't something that Karpenter is explicitly aware of today but it still supports it in the sense that we don't actively block anyone from using it. Making Karpenter become aware of secondary networking is more of a design question so I don't think we will be able to resolve it in the scope of this PR. That said, even with this fix, you might make the issue less likely to happen but the underlying root cause still won't be fixed (i.e. the subnet running out of IPs). As the next step, I'd recommend the following:
Please don't hesitate to engage me if you have any other questions. |
In fact, I do not agree with the reasoning for not accepting this solution. The issue of IP exhaustion needs to be addressed from both the user’s and Karpenter’s perspectives. From Karpenter’s perspective, if it is already known that a particular subnet might not have any available IP addresses, Karpenter should not continue to use that subnet or create new nodes in that availability zone (AZ). |
I think of it like spot instances, if there are no spot instances available you can either fail (if on-demand is not allowed) or fallback tk on-demand.
On this case, by having 3 available subnet, we just need to fallback to one that have available IPs.
Sent from Proton Mail Android
…-------- Original Message --------
On 1/24/25 13:54, Vacant2333 wrote:
> Thanks for bringing all the changes together. I discussed this at length with the team yesterday and it seems like with this change, we risk breaking customers who might be using secondary networking with Karpenter. This isn't something that Karpenter is explicitly aware of today but it still supports it in the sense that we don't actively block anyone from using it.
>
> Making Karpenter become aware of secondary networking is more of a design question so I don't think we will be able to resolve it in the scope of this PR.
>
> That said, even with this fix, you might make the issue less likely to happen but the underlying root cause still won't be fixed (i.e. the subnet running out of IPs).
>
> As the next step, I'd recommend the following:
>
> - Figure out a way to ensure that IP exhaustion doesn't happen on the subnets (or maybe use secondary networking to mitigate the impact when this does happen). https://aws.github.io/aws-eks-best-practices/networking/ip-optimization-strategies/ has some references around how to prevent this.
> - Attend the [working group](https://karpenter.sh/docs/contributing/working-group/) meetings if you want to discuss other solutions to this. This seems like a great problem to discuss with the community.
>
> Please don't hesitate to engage me if you have any other questions.
In fact, I do not agree with the reasoning for not accepting this solution. The issue of IP exhaustion needs to be addressed from both the user’s and Karpenter’s perspectives. From Karpenter’s perspective, if it is already known that a particular subnet might not have any available IP addresses, Karpenter should not continue to use that subnet or create new nodes in that availability zone (AZ).
—
Reply to this email directly, [view it on GitHub](#7623 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AEOL5GCGF4XTKTVRYVAPNLL2MIZZVAVCNFSM6AAAAABVV5TCMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJSGQ3DOMZRGM).
You are receiving this because you commented.Message ID: ***@***.***>
|
couldn't agree more 👍🏼 |
I definitely agree that we need to improve the handling of this on Karpenter's side. My call out was regarding the fact that taking this PR in its current form will break customers who use secondary/custom networking since in those cases even if the node's subnet doesn't have free IPs, pods can still launch. The other reason this is a slightly more complex problem and different from the spot one is that Karpenter can only estimate how many IPs are going to be used. Ultimately it is the kube-scheduler which schedules the pods so there's no guarantee that the pods will be scheduled on the node that Karpenter thinks they might be. |
What if we introduce this feature as a configurable variable, disabled by default? This way, it won't impact users relying on their custom networking, but others can enable it if needed. Thoughts? |
I discussed this in the community meeting today. The general consensus is that we need a design for this problem in order to decide what's the right way forward. Adding a configuration flag can be one of the solutions as part of that proposal. |
(contributing to the future design discussion) Rather than "restrict" or "avoid", how about an opt-in weighting approach? For example, make Karpenter |
I would expect this to be an opt-in/out semantic (flag or api input) that let the fix to be non disruptive for other networking scenarios. |
Fixes #5234 , #2921
Description
its a feature that when launchInstance select the subnet which has the most available ip count by @Vacant2333
i added a test around it so we can merge it
How was this change tested?
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
merged with #7310,