Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues adding nodes to the cluster. #7583

Open
MadhvendraDixit opened this issue Jan 13, 2025 · 5 comments
Open

Issues adding nodes to the cluster. #7583

MadhvendraDixit opened this issue Jan 13, 2025 · 5 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@MadhvendraDixit
Copy link

Description

The issue that I am facing right now is that, even though Karpenter Instance Node role and Karpenter Controller roles have all the permission required for necessary actions. When the number of pods increase, karpenter launches nodes based on the required compute but those nodes do not attach to the cluster and I do not even see any error in the karpenter pod logs.

@MadhvendraDixit MadhvendraDixit added bug Something isn't working needs-triage Issues that need to be triaged labels Jan 13, 2025
@jigisha620
Copy link
Contributor

Can you please share all the logs from when this happened as well as your nodepool and nodeclass configuration?

@MadhvendraDixit
Copy link
Author

Can you please share all the logs from when this happened as well as your nodepool and nodeclass configuration?

Yes sure,
2025-01-13T08:45:14.800Z INFO controller.provisioner found provisionable pod(s) {"commit": "f013f7b", "pods": 8}
2025-01-13T08:45:14.800Z INFO controller.provisioner computed new machine(s) to fit pod(s) {"commit": "f013f7b", "machines": 3, "pods": 8}
2025-01-13T08:45:14.815Z INFO controller.provisioner created machine {"commit": "f013f7b", "provisioner": "default", "machine": "default-bpqcn", "requests": {"cpu":"1150m","memory":"1Gi","pods":"6"}, "instance-types": "t4g.large, t4g.medium"}
2025-01-13T08:45:14.818Z INFO controller.provisioner created machine {"commit": "f013f7b", "provisioner": "default", "machine": "default-sqssb", "requests": {"cpu":"1650m","memory":"1536Mi","pods":"7"}, "instance-types": "t4g.large, t4g.medium"}
2025-01-13T08:45:14.818Z INFO controller.provisioner created machine {"commit": "f013f7b", "provisioner": "default", "machine": "default-g9b4l", "requests": {"cpu":"1650m","memory":"1536Mi","pods":"7"}, "instance-types": "t4g.large, t4g.medium"}
2025-01-13T08:45:16.055Z DEBUG controller.machine.lifecycle created launch template {"commit": "f013f7b", "machine": "default-bpqcn", "provisioner": "default", "launch-template-name": "karpenter.k8s.aws/16414183443499091970", "id": "lt-0a02dbc53d1b58d5b"}
2025-01-13T08:45:16.200Z DEBUG controller.machine.lifecycle created launch template {"commit": "f013f7b", "machine": "default-bpqcn", "provisioner": "default", "launch-template-name": "karpenter.k8s.aws/7710324587535318196", "id": "lt-0421ee8c2fc2a1094"}
2025-01-13T08:45:17.683Z INFO controller.machine.lifecycle launched machine {"commit": "f013f7b", "machine": "default-g9b4l", "provisioner": "default", "provider-id": "aws:///ap-south-1c/i-09ab3e712bbeb6fa6", "instance-type": "t4g.medium", "zone": "ap-south-1c", "capacity-type": "on-demand", "allocatable": {"cpu":"1930m","ephemeral-storage":"17Gi","memory":"3187Mi","pods":"17"}}
2025-01-13T08:45:17.684Z INFO controller.machine.lifecycle launched machine {"commit": "f013f7b", "machine": "default-bpqcn", "provisioner": "default", "provider-id": "aws:///ap-south-1c/i-0c245ac4ce2c9baa1", "instance-type": "t4g.medium", "zone": "ap-south-1c", "capacity-type": "on-demand", "allocatable": {"cpu":"1930m","ephemeral-storage":"17Gi","memory":"3187Mi","pods":"17"}}
2025-01-13T08:45:17.684Z INFO controller.machine.lifecycle launched machine {"commit": "f013f7b", "machine": "default-sqssb", "provisioner": "default", "provider-id": "aws:///ap-south-1c/i-0e614a7082214f979", "instance-type": "t4g.medium", "zone": "ap-south-1c", "capacity-type": "on-demand", "allocatable": {"cpu":"1930m","ephemeral-storage":"17Gi","memory":"3187Mi","pods":"17"}}
2025-01-13T08:46:41.548Z DEBUG controller.awsnodetemplate discovered subnets {"commit": "f013f7b", "awsnodetemplate": "default", "subnets": ["subnet-07c3433cda79d6457 (ap-south-1a)", "subnet-01da29930d682869e (ap-south-1b)", "subnet-0f24c228d64a92e35 (ap-south-1c)"]}

these are the logs when there are any provisionable pods , and the machines are launched and healthy but are not attached to the EKS cluster

@jigisha620
Copy link
Contributor

What version of Karpenter are you running in your cluster?

@MadhvendraDixit
Copy link
Author

What version of Karpenter are you running in your cluster?

Actually I am using same architecture on stage environment and it is working and when I am using it on prod it is not working.
and the version of karpenter I am using is v0.30.0-rc.0

@saurav-agarwalla
Copy link
Contributor

You may need to look at the node logs to see what's happening. Or maybe share the policy attached to the role.

Either way, I also see that you are running a RC version of Karpenter. Is there a reason to not use the latest stable version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

4 participants
@saurav-agarwalla @jigisha620 @MadhvendraDixit and others