-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal Load Balancer Controller do NOT remove nodes with empty string in the zone field #757
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If the repository mantainers determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @08volt! |
Hi @08volt. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: 08volt The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
37b5fb0
to
d8fd8fe
Compare
6563a59
to
a428be3
Compare
/retest |
eca15c8
to
49ed0a4
Compare
b8fe50c
to
152e629
Compare
/retest |
152e629
to
86ee4d6
Compare
/retest |
4 similar comments
/retest |
/retest |
/retest |
/retest |
@cezarygerard @aojea |
/lgtm |
Can we add more information in the Git commit message (not just the Github PR). Use
|
86ee4d6
to
e40c2fc
Compare
New changes are detected. LGTM label has been removed. |
Thanks for the input @bowei , I updated the commit message with new information. Please let me know if you think it is enough. |
e40c2fc
to
67dd896
Compare
Previously, new nodes were created with all topology labels present, including the zone label. However, recently, labels are being patched/updated on the node resource after the resource is created. This can result in nodes having an empty zone field when initially added to an instance group. The Internal Load Balancer controller was incorrectly removing these nodes with empty zone labels from their assigned instance groups. This occurred because the controller interpreted nodes with empty zones as not belonging to any zone. This commit fixes the issue by: Storing nodes with empty zones during node processing. Preventing the controller from removing these nodes during zone processing. This ensures that new nodes with initially empty zone labels, due to the asynchronous label patching, are not prematurely deleted from their instance groups. The fix has been verified with a new test which add a node with an empty zone label to an instance group and confirming that it is not deleted after a controller sync.
67dd896
to
2361da9
Compare
@@ -700,7 +719,7 @@ func (g *Cloud) ensureInternalInstanceGroups(name string, nodes []*v1.Node) ([]s | |||
igLinks = append(igLinks, ig.SelfLink) | |||
} | |||
} else { | |||
igLink, err := g.ensureInternalInstanceGroup(name, zone, nodes) | |||
igLink, err := g.ensureInternalInstanceGroup(name, zone, nodes, zonedNodes[""]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know that all Nodes with an empty zone belong to the zone being ensured here?
Specifically:
igLink, err := g.ensureInternalInstanceGroup(name, zone, nodes, zonedNodes[""])
^^^^ ^^^^^^^^^^^^^^
A B
That B all belong in zone A.
e.g. can B have a node in zone z1 and zone z2 but A == z1? Will this work correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the logic here is that zonedNodes contain nodes that 'should not be removed from Instance Groups'.
What this means is that if, a node is attached to an IG, but for some reason it's zone assignment disappears, don't remove it from the group.
nodes in emptyZoneNodes
will not be added, it's just to keep them from being removed.
inside ensureInternalInstanceGroup
there is this part removeNodes := gceNodes.Difference(kubeNodes).Difference(emptyZoneNodesNames).List()
- when determining which nodes to remove, skip the empty zones one.
In general we have many issues that start with nodes becoming 'temporarily' unready, or missing some data like zone info. Then LB controllers start removing them and we get outages.
Fix(ILB): Prevent deletion of nodes with empty zone label
Previously, new nodes were created with all topology labels present, including the zone label. However, recently, labels are being patched/updated on the node resource after the resource is created. This can result in nodes having an empty zone field when initially added to an instance group.
The Internal Load Balancer controller was incorrectly removing these nodes with empty zone labels from their assigned instance groups. This occurred because the controller interpreted nodes with empty zones as not belonging to any zone.
This commit fixes the issue by:
This ensures that new nodes with initially empty zone labels, due to the asynchronous label patching, are not prematurely deleted from their instance groups.
The fix has been verified with a new test which add a node with an empty zone label to an instance group and confirming that it is not deleted after a controller sync.