Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Partition placement based on nodeset selector #2545

Closed
wants to merge 2 commits into from
Closed

Conversation

AhmedSoliman
Copy link
Contributor

@AhmedSoliman AhmedSoliman commented Jan 24, 2025

  • Tests were not refactored, so build on GHA is expected to fail
  • This will cause a few errors on start of partitions due to log not being found until it's provisioned by log-controller

Stack created with Sapling. Best reviewed with ReviewStack.

- Tests were not refactored, so build on GHA is expected to fail
- This will cause a few errors on start of partitions due to log not being found until it's provisioned by log-controller
Copy link

github-actions bot commented Jan 24, 2025

Test Results

  7 files  ±0    7 suites  ±0   3m 22s ⏱️ -51s
 45 tests  - 2   44 ✅  - 2  1 💤 ±0  0 ❌ ±0 
174 runs   - 8  171 ✅  - 8  3 💤 ±0  0 ❌ ±0 

Results for commit 5b6aea1. ± Comparison against base commit 0ca8e3f.

This pull request removes 2 tests.
dev.restate.sdktesting.tests.AwaitTimeout ‑ timeout(Client)
dev.restate.sdktesting.tests.RawHandler ‑ rawHandler(Client)

♻️ This comment has been updated with latest results.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @AhmedSoliman. The changes make sense to me. I had one question regarding how correctly the PP nodeset maps to the log nodesets that are generated by the DomainAwareNodeSetSelector. It seems that they don't map exactly 1 to 1.

Regarding our offline discussion on Friday about making the Scheduler not depend on the provisioned logs, I think this should be possible. However, letting the LogsController strictly follow the Scheduler seems to break once we are handling a disjoint set of log-server and worker nodes. In such a deployment, it seems to me that the LogsController and Scheduler need to be able to react independently of each other.

Comment on lines +281 to +282
// todo: consider removing after experimentation and discussion
.with_max_target_size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be equivalent to run a partition processor on every node w/o another filter to select nodes to run PPs on?

Comment on lines +286 to +292
let selection = NodeSetSelector::select(
nodes_config,
partition_replication,
|_node_id, config| config.has_role(Role::Worker),
|node_id, _config| alive_workers.contains(node_id),
opts,
);
Copy link
Contributor

@tillrohrmann tillrohrmann Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exact relation between a nodeset and where to run PPs? Right now, the Scheduler will start a PP on every node in the nodeset, I believe.

Given this, I am wondering whether the NodeSetSelector wouldn't often generate larger nodesets than what is actually required for the PPs (independent of the max target size configuration right now).

Assuming I want to run a single PP on any node (replication property node: 1), the NodeSetSelector will return a nodeset of size 2.

Assuming I want to run PPs across two regions and on 5 nodes to tolerate 4 random node failures region: 2, node: 5 and assuming there are enough nodes available, I think the DomainAwareNodetSetSelector will give us a nodeset of at least size 8 (two regions with 4 nodes each). Wouldn't a nodeset of the form {region1.node1, region2.node2, region2.node3, region2.node4, region2.node5} be sufficient in this case?

So is the current DomainAwareNodeSetSelector the best fit for deciding where to place the PPs or do we want to have a specialized implementation eventually?

@AhmedSoliman
Copy link
Contributor Author

As discussed offline, this will be parked. I'll open another PR with some of the minor improvements that this PR had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants