Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] OmniPaxos and Raft based embedded metadata store #2376

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

tillrohrmann
Copy link
Contributor

@tillrohrmann tillrohrmann commented Nov 28, 2024

This WIP pr adds support for an OmniPaxos based embedded highly available metadata store. We are using Harald's OmniPaxos library. This PR is based on an older WIP pr which added support for a Raft based metadata store because both components need a networking component and a state machine which I could reuse. One can see that the implementation of both variants is quite similar.

If you want to try it out, then you can use these configuration files. If you want to try out the Raft based metadata store, then change the metadata-store type to "raft".

Both implementations have a persistent log storage implementation based on RocksDb. This should allow them to be killed and restarted w/o losing data.

A few notable things which are missing:

  • Support for reconfiguration
  • Support for snapshots
  • Support for trimming the logs

What is quite ugly is how the participating peers (Raft as well as OmniPaxos) need to be explicitly configured atm via

[metadata-store]
bind-address = "0.0.0.0:5133"
id = 2
peers = [
    [
        1,
        "http://localhost:5123",
    ],
    [
        2,
        "http://localhost:5133",
    ],
    [
        3,
        "http://localhost:5143",
    ],
]
type = "omnipaxos"

In the future, we probably can add tooling to start a cluster with a single metadata peer and then extending the set of peers via restatectl to reach the required metadata peers size.

Ideally, I would have loved to reuse Restate's networking component. However, because of the node validation wrt NodesConfiguration this wasn't possible. That's why I added a very simple one.

After starting the metdata store service and the grpc server, the node will
try to initialize itself by joining an existing cluster. Additionally each node
exposes an provision cluster grpc call with which it is possible to provision
a cluster (writing the initial NodesConfiguration, PartitionTable and Logs).
Nodes can only join after the cluster is provisioned.

This fixes restatedev#2409.
This commit makes it configurable which metadata will be run
by the Node when starting the Restate server.
This commit adds the skeleton of the Raft metadata store. At the moment
only a single node with memory storage is supported.

This fixes restatedev#1785.
The raft metadata store does not accept new proposals if there is
no known leader. In this situation, request failed with an internal
ProposalDropped error. This commit changes the behavior so that a
ProposalDropped error will be translated into an unavailable Tonic
status. That way, the request will get automatically retried.
This commit adds RocksDbStorage which implements raft::Storage.
The RocksDbStorage is a durable storage implementation which is
used by the RaftMetadataStore to store the raft state durably.

This fixes restatedev#1791.
The OmniPaxos metadata store stores its state in memory.
This commit introduces the ability to specify multiple addresses for the
metadata store endpoint. On error, the GrpcMetadataStoreClient randomly
switches to another endpoint. Moreover, this commit makes the OmniPaxosMetadataStore
only accept requests if it is the leader. Additionally, it fails all pending
callbacks if it loses leadership to avoid hanging requests if the request
was not decided.
The Restate version enables OmniPaxos to run with a single peer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant