[WIP] Partition placement based on nodeset selector #2545

AhmedSoliman · 2025-01-24T18:30:54Z

Tests were not refactored, so build on GHA is expected to fail
This will cause a few errors on start of partitions due to log not being found until it's provisioned by log-controller

Stack created with Sapling. Best reviewed with ReviewStack.

- Tests were not refactored, so build on GHA is expected to fail - This will cause a few errors on start of partitions due to log not being found until it's provisioned by log-controller

github-actions · 2025-01-24T18:56:20Z

Test Results

7 files ±0 7 suites ±0 3m 22s ⏱️ -51s
45 tests - 2 44 ✅ - 2 1 💤 ±0 0 ❌ ±0
174 runs - 8 171 ✅ - 8 3 💤 ±0 0 ❌ ±0

Results for commit 5b6aea1. ± Comparison against base commit 0ca8e3f.

This pull request removes 2 tests.

dev.restate.sdktesting.tests.AwaitTimeout ‑ timeout(Client)
dev.restate.sdktesting.tests.RawHandler ‑ rawHandler(Client)

♻️ This comment has been updated with latest results.

tillrohrmann

Thanks for creating this PR @AhmedSoliman. The changes make sense to me. I had one question regarding how correctly the PP nodeset maps to the log nodesets that are generated by the DomainAwareNodeSetSelector. It seems that they don't map exactly 1 to 1.

Regarding our offline discussion on Friday about making the Scheduler not depend on the provisioned logs, I think this should be possible. However, letting the LogsController strictly follow the Scheduler seems to break once we are handling a disjoint set of log-server and worker nodes. In such a deployment, it seems to me that the LogsController and Scheduler need to be able to react independently of each other.

tillrohrmann · 2025-01-27T09:20:28Z

crates/admin/src/cluster_controller/scheduler.rs

+            // todo: consider removing after experimentation and discussion
+            .with_max_target_size()


Wouldn't this be equivalent to run a partition processor on every node w/o another filter to select nodes to run PPs on?

tillrohrmann · 2025-01-27T09:32:50Z

crates/admin/src/cluster_controller/scheduler.rs

+        let selection = NodeSetSelector::select(
+            nodes_config,
+            partition_replication,
+            |_node_id, config| config.has_role(Role::Worker),
+            |node_id, _config| alive_workers.contains(node_id),
+            opts,
+        );


What is the exact relation between a nodeset and where to run PPs? Right now, the Scheduler will start a PP on every node in the nodeset, I believe.

Given this, I am wondering whether the NodeSetSelector wouldn't often generate larger nodesets than what is actually required for the PPs (independent of the max target size configuration right now).

Assuming I want to run a single PP on any node (replication property node: 1), the NodeSetSelector will return a nodeset of size 2.

Assuming I want to run PPs across two regions and on 5 nodes to tolerate 4 random node failures region: 2, node: 5 and assuming there are enough nodes available, I think the DomainAwareNodetSetSelector will give us a nodeset of at least size 8 (two regions with 4 nodes each). Wouldn't a nodeset of the form {region1.node1, region2.node2, region2.node3, region2.node4, region2.node5} be sufficient in this case?

So is the current DomainAwareNodeSetSelector the best fit for deciding where to place the PPs or do we want to have a specialized implementation eventually?

AhmedSoliman · 2025-01-28T11:56:57Z

As discussed offline, this will be parked. I'll open another PR with some of the minor improvements that this PR had.

AhmedSoliman added 2 commits January 24, 2025 17:23

[8/n] Reconfigure logs if default replicatation has changed

ff6aa52

[WIP] Partition placement based on nodeset selector

5b6aea1

- Tests were not refactored, so build on GHA is expected to fail - This will cause a few errors on start of partitions due to log not being found until it's provisioned by log-controller

AhmedSoliman mentioned this pull request Jan 24, 2025

[8/n] Reconfigure logs if default log replication has changed #2542

Merged

AhmedSoliman requested a review from tillrohrmann January 24, 2025 18:31

tillrohrmann reviewed Jan 27, 2025

View reviewed changes

AhmedSoliman closed this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Partition placement based on nodeset selector #2545

[WIP] Partition placement based on nodeset selector #2545

AhmedSoliman commented Jan 24, 2025 •

edited

Loading

github-actions bot commented Jan 24, 2025 •

edited

Loading

tillrohrmann left a comment

tillrohrmann Jan 27, 2025

tillrohrmann Jan 27, 2025 •

edited

Loading

AhmedSoliman commented Jan 28, 2025

		// todo: consider removing after experimentation and discussion
		.with_max_target_size()

[WIP] Partition placement based on nodeset selector #2545

[WIP] Partition placement based on nodeset selector #2545

Conversation

AhmedSoliman commented Jan 24, 2025 • edited Loading

github-actions bot commented Jan 24, 2025 • edited Loading

Test Results

tillrohrmann left a comment

Choose a reason for hiding this comment

tillrohrmann Jan 27, 2025

Choose a reason for hiding this comment

tillrohrmann Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

AhmedSoliman commented Jan 28, 2025

AhmedSoliman commented Jan 24, 2025 •

edited

Loading

github-actions bot commented Jan 24, 2025 •

edited

Loading

tillrohrmann Jan 27, 2025 •

edited

Loading