-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Partition placement based on nodeset selector #2545
Conversation
- Tests were not refactored, so build on GHA is expected to fail - This will cause a few errors on start of partitions due to log not being found until it's provisioned by log-controller
Test Results 7 files ±0 7 suites ±0 3m 22s ⏱️ -51s Results for commit 5b6aea1. ± Comparison against base commit 0ca8e3f. This pull request removes 2 tests.
♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @AhmedSoliman. The changes make sense to me. I had one question regarding how correctly the PP nodeset maps to the log nodesets that are generated by the DomainAwareNodeSetSelector
. It seems that they don't map exactly 1 to 1.
Regarding our offline discussion on Friday about making the Scheduler
not depend on the provisioned logs, I think this should be possible. However, letting the LogsController
strictly follow the Scheduler
seems to break once we are handling a disjoint set of log-server and worker nodes. In such a deployment, it seems to me that the LogsController
and Scheduler
need to be able to react independently of each other.
// todo: consider removing after experimentation and discussion | ||
.with_max_target_size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be equivalent to run a partition processor on every node w/o another filter to select nodes to run PPs on?
let selection = NodeSetSelector::select( | ||
nodes_config, | ||
partition_replication, | ||
|_node_id, config| config.has_role(Role::Worker), | ||
|node_id, _config| alive_workers.contains(node_id), | ||
opts, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the exact relation between a nodeset and where to run PPs? Right now, the Scheduler
will start a PP on every node in the nodeset, I believe.
Given this, I am wondering whether the NodeSetSelector
wouldn't often generate larger nodesets than what is actually required for the PPs (independent of the max target size configuration right now).
Assuming I want to run a single PP on any node (replication property node: 1
), the NodeSetSelector
will return a nodeset of size 2.
Assuming I want to run PPs across two regions and on 5 nodes to tolerate 4 random node failures region: 2, node: 5
and assuming there are enough nodes available, I think the DomainAwareNodetSetSelector
will give us a nodeset of at least size 8 (two regions with 4 nodes each). Wouldn't a nodeset of the form {region1.node1, region2.node2, region2.node3, region2.node4, region2.node5}
be sufficient in this case?
So is the current DomainAwareNodeSetSelector
the best fit for deciding where to place the PPs or do we want to have a specialized implementation eventually?
As discussed offline, this will be parked. I'll open another PR with some of the minor improvements that this PR had. |
Stack created with Sapling. Best reviewed with ReviewStack.