Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Databend Meta Log Data Overwrites Config Values #17296

Open
2 tasks done
kkapper opened this issue Jan 15, 2025 · 17 comments
Open
2 tasks done

bug: Databend Meta Log Data Overwrites Config Values #17296

kkapper opened this issue Jan 15, 2025 · 17 comments
Assignees
Labels
C-bug Category: something isn't working

Comments

@kkapper
Copy link

kkapper commented Jan 15, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Using Kubernetes with Docker image: datafuselabs/databend-meta:v1.2.680-p3

What's Wrong?

When databend meta starts, it ignores config values and loads them from logs instead.

Here is startup log for a node. Most important you can see that we want these nodes to join the clone3 namespace.

Databend Metasrv

Version: v1.2.680-p3-4c4896dc57-simd(1.85.0-nightly-2025-01-14T09:48:25.167808264Z)
Working DataVersion: V004(2024-11-11: WAL based raft-log)

Raft Feature set:
Server Provide: { append:v0, install_snapshot:v1, install_snapshot:v3, vote:v0 }
Client Require: { append:v0, install_snapshot:v3, vote:v0 }

Disk Data: V004(2024-11-11: WAL based raft-log); Upgrading: None
Dir: /data/databend-meta/raft

Log File: enabled=false, level=INFO, dir=/data/databend-meta/log, format=json, limit=48, prefix_filter=
Stderr: enabled=true, level=WARN, format=text
Raft Id: 2; Cluster: databend
Dir: /data/databend-meta/raft
Status: join ["plaid-databend-meta-0.plaid-databend-meta.pw-clone3.svc.cluster.local:28004", "plaid-databend-meta-1.plaid-databend-meta.pw-clone3.svc.cluster.local:28004", "plaid-databend-meta-2.plaid-databend-meta.pw-clone3.svc.cluster.local:28004"]

HTTP API listen at: 0.0.0.0:28002
gRPC API listen at: 0.0.0.0:9191 advertise: plaid-databend-meta-2.plaid-databend-meta.pw-clone3.svc.cluster.local:9191
Raft API listen at: 0.0.0.0:28004 advertise: plaid-databend-meta-2.plaid-databend-meta.pw-clone3.svc.cluster.local:28004
Upgrade ondisk data if out of date: V004
Find and clean previous unfinished upgrading
Upgrade ondisk data finished: V004

But, what happens if you check databendmetactl --status

root@plaid-databend-meta-0:/# ./databend-metactl status
BinaryVersion: v1.2.680-p3-4c4896dc57-simd(1.85.0-nightly-2025-01-14T09:48:25.167808264Z)
DataVersion: V004
RaftLogSize: 5261946
RaftLog:
  - CacheItems: 4604
  - CacheUsedSize: 5760356
  - WALTotalSize: 5261946
  - WALOpenChunkSize: 17553
  - WALOffset: 5261946
  - WALClosedChunkCount: 3
  - WALClosedChunkTotalSize: 5244393
  - WALClosedChunkSizes:
    - ChunkId(00_000_000_000_000_000_000): 5243001
    - ChunkId(00_000_000_000_005_243_001): 196
    - ChunkId(00_000_000_000_005_243_197): 1196
SnapshotKeyCount: 60742
Node: id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-clone.svc.cluster.local:28004
State: Leader
Leader: id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-clone.svc.cluster.local:28004 grpc=plaid-databend-meta-0.plaid-databend-meta.pw-clone.svc.cluster.local:9191
CurrentTerm: 83091
LastSeq: 563264
LastLogIndex: 438193
LastApplied: T83091-N0.438173
SnapshotLastLogID: T83089-N0.437685
Purged: T83086-N1.433589
Replication:
  - [0] T83091-N0.438193 *
Voters:
  - id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-clone.svc.cluster.local:28004 grpc=plaid-databend-meta-0.plaid-databend-meta.pw-clone.svc.cluster.local:9191
  - id=1 raft=plaid-databend-meta-1.plaid-databend-meta.pw-clone.svc.cluster.local:28004 grpc=plaid-databend-meta-1.plaid-databend-meta.pw-clone.svc.cluster.local:9191
  - id=2 raft=plaid-databend-meta-2.plaid-databend-meta.pw-clone.svc.cluster.local:28004 grpc=plaid-databend-meta-2.plaid-databend-meta.pw-clone.svc.cluster.local:9191

You see it took the connection string from the logs and replaced all the connection info. Which break all nodes.

How to Reproduce?

Take a databend meta backup from one cluster/namespace, or one set of server.

Restore that backup to another cluster/namespace or other server.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@kkapper kkapper added the C-bug Category: something isn't working label Jan 15, 2025
@inviscid
Copy link

This appears to be a regression of some kind since this used to work correctly in the past.

@drmingdrmer
Copy link
Member

Yes, this is a designed feature. The raft advertisement address is stored in the raft log and cannot be changed unless you execute a membership change command. Therefore, regardless of what the raft address is in the config file, it will be ignored after the cluster is initialized.

If you want to change the raft address, you will need to remove the node and add a new one with the updated address. However, this behavior is different for the gRPC address, which is updated every time the corresponding node is restarted.

If there is an issue with your scenario, could you clarify which field in the config file is being ignored or replaced?

@drmingdrmer
Copy link
Member

the advertisement address is strictly prohibited from any change from the very first day. The address must not be changed under any circumstances. Any alteration to the address can trigger a brain - split issue.

In contrast, the GRPC address can be modified during system restart. This ability to change the GRPC address during restart was added as a feature about a year ago.

@kkapper
Copy link
Author

kkapper commented Jan 17, 2025

@drmingdrmer What would happen if you needed to migrate your databend meta cluster to another datacenter? For disaster recovery or changing hosting providers.

You would need to set all of the IP addresses in the new architecture be the same as they were in the original?

@kkapper
Copy link
Author

kkapper commented Jan 17, 2025

This is especially problematic on Kubernetes, because we have the ability to move things across namespaces quite easily, which changes addresses, and in general practice we can never guarantee we will hold the same IP address for long.

@kkapper
Copy link
Author

kkapper commented Jan 17, 2025

The V003 convention seems to behave in the way I would expect, where the raft log only retained the entry for the node by index, and mapped the value when the log was replayed:

["header",{"DataHeader":{"key":"header","value":{"version":"V003","upgrading":null}}}] ["raft_state",{"RaftStateKV":{"key":"Id","value":{"NodeId":0}}}] ["raft_state",{"RaftStateKV":{"key":"HardState","value":{"HardState":{"leader_id":{"term":155936,"node_id":2},"committed":true}}}}] ["raft_state",{"RaftStateKV":{"key":"Committed","value":{"Committed":{"leader_id":{"term":155936,"node_id":2},"index":163988}}}}]

["raft_log",{"Logs":{"key":61440,"value":{"log_id":{"leader_id":{"term":712,"node_id":1},"index":61440},"payload":{"Normal":{"txid":null,"time_ms":1720357064373,"cmd":{"UpsertKV":{"key":"__fd_clusters_v2/plaid/default/databend_query/72BYsTnAMhgr4njReCgTX4","seq":{"GE":1},"value":"AsIs","value_meta":{"expire_at":null,"ttl":{"millis":60000}}}}}}}}}]

@kkapper
Copy link
Author

kkapper commented Jan 17, 2025

Storing any connection data via IP is going to break Kubernetes implementations for sure.

Is it possible that we could get a log by index or log by ip toggle?

@drmingdrmer
Copy link
Member

@drmingdrmer What would happen if you needed to migrate your databend meta cluster to another datacenter? For disaster recovery or changing hosting providers.

You would need to set all of the IP addresses in the new architecture be the same as they were in the original?

The standard process is to backup data from the running databend-meta cluster with:

databend-metactl export --grpc-api-address "127.0.0.1:9191" --db meta.db

and then restore the a new cluster in another DC with updated peer addresses specified:

databend-metactl import --raft-dir ./.databend/new_meta1 --db meta.db \
    --id=1 \
    --initial-cluster 1=localhost:29103 \
    --initial-cluster 2=localhost:29203 \
    --initial-cluster 3=localhost:29303
databend-metactl import --raft-dir ./.databend/new_meta2 --db meta.db \
    --id=2 \
    --initial-cluster 1=localhost:29103 \
    --initial-cluster 2=localhost:29203 \
    --initial-cluster 3=localhost:29303
databend-metactl import --raft-dir ./.databend/new_meta3 --db meta.db \
    --id=3 \
    --initial-cluster 1=localhost:29103 \
    --initial-cluster 2=localhost:29203 \
    --initial-cluster 3=localhost:29303

See: https://docs.databend.com/guides/deploy/deploy/production/metasrv-backup-restore#import-data-as-a-new-databend-meta-cluster

@drmingdrmer
Copy link
Member

This is especially problematic on Kubernetes, because we have the ability to move things across namespaces quite easily, which changes addresses, and in general practice we can never guarantee we will hold the same IP address for long.

You do not need to hold the same IP. but the host name for raft-advertise-address should be kept consistent during migrating.

@drmingdrmer
Copy link
Member

The V003 convention seems to behave in the way I would expect, where the raft log only retained the entry for the node by index, and mapped the value when the log was replayed:

["header",{"DataHeader":{"key":"header","value":{"version":"V003","upgrading":null}}}] ["raft_state",{"RaftStateKV":{"key":"Id","value":{"NodeId":0}}}] ["raft_state",{"RaftStateKV":{"key":"HardState","value":{"HardState":{"leader_id":{"term":155936,"node_id":2},"committed":true}}}}] ["raft_state",{"RaftStateKV":{"key":"Committed","value":{"Committed":{"leader_id":{"term":155936,"node_id":2},"index":163988}}}}]

["raft_log",{"Logs":{"key":61440,"value":{"log_id":{"leader_id":{"term":712,"node_id":1},"index":61440},"payload":{"Normal":{"txid":null,"time_ms":1720357064373,"cmd":{"UpsertKV":{"key":"__fd_clusters_v2/plaid/default/databend_query/72BYsTnAMhgr4njReCgTX4","seq":{"GE":1},"value":"AsIs","value_meta":{"expire_at":null,"ttl":{"millis":60000}}}}}}}}}]

There is no behavior change in any version. What is the unexpected behavior you have encountered?

@drmingdrmer
Copy link
Member

Storing any connection data via IP is going to break Kubernetes implementations for sure.

Is it possible that we could get a log by index or log by ip toggle?

What do you mean by “log by IP”? What kind of information are you trying to obtain using the IP address?

@kkapper
Copy link
Author

kkapper commented Jan 20, 2025

@drmingdrmer What would happen if you needed to migrate your databend meta cluster to another datacenter? For disaster recovery or changing hosting providers.
You would need to set all of the IP addresses in the new architecture be the same as they were in the original?

The standard process is to backup data from the running databend-meta cluster with:

databend-metactl export --grpc-api-address "127.0.0.1:9191" --db meta.db

and then restore the a new cluster in another DC with updated peer addresses specified:

databend-metactl import --raft-dir ./.databend/new_meta1 --db meta.db \
    --id=1 \
    --initial-cluster 1=localhost:29103 \
    --initial-cluster 2=localhost:29203 \
    --initial-cluster 3=localhost:29303
databend-metactl import --raft-dir ./.databend/new_meta2 --db meta.db \
    --id=2 \
    --initial-cluster 1=localhost:29103 \
    --initial-cluster 2=localhost:29203 \
    --initial-cluster 3=localhost:29303
databend-metactl import --raft-dir ./.databend/new_meta3 --db meta.db \
    --id=3 \
    --initial-cluster 1=localhost:29103 \
    --initial-cluster 2=localhost:29203 \
    --initial-cluster 3=localhost:29303

See: https://docs.databend.com/guides/deploy/deploy/production/metasrv-backup-restore#import-data-as-a-new-databend-meta-cluster

This does not work when importing into a new environment.

The address specified for --initial-cluster will be replaced with what came from the backup being imported.

@kkapper
Copy link
Author

kkapper commented Jan 20, 2025

Here is an example of a backup and restore with different addresses:

initialize leader node

Import:
Into Meta Dir: '/data/databend-meta/raft'
Initialize Cluster with Id: 0, cluster: {
Peer: 0=plaid-databend-meta-0:28004
Peer: 1=plaid-databend-meta-1:28004
Peer: 2=plaid-databend-meta-2:28004
}
Initialize Cluster: id=0, ["0=plaid-databend-meta-0:28004", "1=plaid-databend-meta-1:28004", "2=plaid-databend-meta-2:28004"]
peer:0=plaid-databend-meta-0:28004
new cluster node:id=0 raft=plaid-databend-meta-0:28004 grpc=
peer:1=plaid-databend-meta-1:28004
new cluster node:id=1 raft=plaid-databend-meta-1:28004 grpc=
peer:2=plaid-databend-meta-2:28004
new cluster node:id=2 raft=plaid-databend-meta-2:28004 grpc=

^
Here is the import into a brand new databend cluster.

When running ./databend-metactl status right after the import finishes.

root@plaid-databend-meta-0:/# ./databend-metactl status
BinaryVersion: v1.2.680-p3-4c4896dc57-simd(1.85.0-nightly-2025-01-14T09:48:25.167808264Z)
DataVersion: V004
RaftLogSize: 78544686
RaftLog:
  - CacheItems: 102578
  - CacheUsedSize: 74142898
  - WALTotalSize: 78544686
  - WALOpenChunkSize: 741
  - WALOffset: 78544686
  - WALClosedChunkCount: 4
  - WALClosedChunkTotalSize: 78543945
  - WALClosedChunkSizes:
    - ChunkId(00_000_000_000_000_000_000): 77957284
    - ChunkId(00_000_000_000_077_957_284): 585442
    - ChunkId(00_000_000_000_078_542_726): 188
    - ChunkId(00_000_000_000_078_542_914): 1031
SnapshotKeyCount: 262486
Node: id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004
State: Candidate
CurrentTerm: 43495
LastSeq: 2239773
LastLogIndex: 1895021
LastApplied: T43488-N0.1895016
SnapshotLastLogID: T43484-N2.1894843
Purged: T41914-N1.1792443
Voters:
  - id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004 grpc=plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local:9191
  - id=1 raft=plaid-databend-meta-1.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004 grpc=plaid-databend-meta-1.plaid-databend-meta.pw-superkellendev.svc.cluster.local:9191
  - id=2 raft=plaid-databend-meta-2.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004 grpc=plaid-databend-meta-2.plaid-databend-meta.pw-superkellendev.svc.cluster.local:9191

@kkapper
Copy link
Author

kkapper commented Jan 20, 2025

You can see the voting addresses are different from the member addresses of the current cluster.

@kkapper
Copy link
Author

kkapper commented Jan 20, 2025

The V003 convention seems to behave in the way I would expect, where the raft log only retained the entry for the node by index, and mapped the value when the log was replayed:
["header",{"DataHeader":{"key":"header","value":{"version":"V003","upgrading":null}}}] ["raft_state",{"RaftStateKV":{"key":"Id","value":{"NodeId":0}}}] ["raft_state",{"RaftStateKV":{"key":"HardState","value":{"HardState":{"leader_id":{"term":155936,"node_id":2},"committed":true}}}}] ["raft_state",{"RaftStateKV":{"key":"Committed","value":{"Committed":{"leader_id":{"term":155936,"node_id":2},"index":163988}}}}]
["raft_log",{"Logs":{"key":61440,"value":{"log_id":{"leader_id":{"term":712,"node_id":1},"index":61440},"payload":{"Normal":{"txid":null,"time_ms":1720357064373,"cmd":{"UpsertKV":{"key":"__fd_clusters_v2/plaid/default/databend_query/72BYsTnAMhgr4njReCgTX4","seq":{"GE":1},"value":"AsIs","value_meta":{"expire_at":null,"ttl":{"millis":60000}}}}}}}}}]

There is no behavior change in any version. What is the unexpected behavior you have encountered?

The entirety of the

Here is an example of a backup and restore with different addresses:

initialize leader node

Import:
Into Meta Dir: '/data/databend-meta/raft'
Initialize Cluster with Id: 0, cluster: {
Peer: 0=plaid-databend-meta-0:28004
Peer: 1=plaid-databend-meta-1:28004
Peer: 2=plaid-databend-meta-2:28004
}
Initialize Cluster: id=0, ["0=plaid-databend-meta-0:28004", "1=plaid-databend-meta-1:28004", "2=plaid-databend-meta-2:28004"]
peer:0=plaid-databend-meta-0:28004
new cluster node:id=0 raft=plaid-databend-meta-0:28004 grpc=
peer:1=plaid-databend-meta-1:28004
new cluster node:id=1 raft=plaid-databend-meta-1:28004 grpc=
peer:2=plaid-databend-meta-2:28004
new cluster node:id=2 raft=plaid-databend-meta-2:28004 grpc=

^ Here is the import into a brand new databend cluster.

When running ./databend-metactl status right after the import finishes.

root@plaid-databend-meta-0:/# ./databend-metactl status
BinaryVersion: v1.2.680-p3-4c4896dc57-simd(1.85.0-nightly-2025-01-14T09:48:25.167808264Z)
DataVersion: V004
RaftLogSize: 78544686
RaftLog:
  - CacheItems: 102578
  - CacheUsedSize: 74142898
  - WALTotalSize: 78544686
  - WALOpenChunkSize: 741
  - WALOffset: 78544686
  - WALClosedChunkCount: 4
  - WALClosedChunkTotalSize: 78543945
  - WALClosedChunkSizes:
    - ChunkId(00_000_000_000_000_000_000): 77957284
    - ChunkId(00_000_000_000_077_957_284): 585442
    - ChunkId(00_000_000_000_078_542_726): 188
    - ChunkId(00_000_000_000_078_542_914): 1031
SnapshotKeyCount: 262486
Node: id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004
State: Candidate
CurrentTerm: 43495
LastSeq: 2239773
LastLogIndex: 1895021
LastApplied: T43488-N0.1895016
SnapshotLastLogID: T43484-N2.1894843
Purged: T41914-N1.1792443
Voters:
  - id=0 raft=plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004 grpc=plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local:9191
  - id=1 raft=plaid-databend-meta-1.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004 grpc=plaid-databend-meta-1.plaid-databend-meta.pw-superkellendev.svc.cluster.local:9191
  - id=2 raft=plaid-databend-meta-2.plaid-databend-meta.pw-superkellendev.svc.cluster.local:28004 grpc=plaid-databend-meta-2.plaid-databend-meta.pw-superkellendev.svc.cluster.local:9191

Specifically in the backup being restored, you can see the endpoints are included in the backup set:

["raft_log",{"LogEntry":{"log_id":{"leader_id":{"term":42013,"node_id":2},"index":1844602},"payload":{"Normal":{"txid":null,"cmd":{"AddNode":{"node_id":0,"node":{"name":"0","endpoint":{"addr":"plaid-databend-meta-0.plaid-databend-meta.pw-superkellendev.svc.cluster.local","port":28004},"grpc_api_advertise_address":null},"overriding":true}}}}}}]
["raft_log",{"LogEntry":{"log_id":{"leader_id":{"term":42013,"node_id":2},"index":1844603},"payload":{"Normal":{"txid":null,"cmd":{"AddNode":{"node_id":1,"node":{"name":"1","endpoint":{"addr":"plaid-databend-meta-1.plaid-databend-meta.pw-superkellendev.svc.cluster.local","port":28004},"grpc_api_advertise_address":null},"overriding":true}}}}}}]
["raft_log",{"LogEntry":{"log_id":{"leader_id":{"term":42013,"node_id":2},"index":1844604},"payload":{"Normal":{"txid":null,"cmd":{"AddNode":{"node_id":2,"node":{"name":"2","endpoint":{"addr":"plaid-databend-meta-2.plaid-databend-meta.pw-superkellendev.svc.cluster.local","port":28004},"grpc_api_advertise_address":null},"overriding":true}}}}}}]

My proposal is that a backup should not have any concern for the hostnames of the source cluster.

For example, a Redis AOF can be replayed on any new cluster.

@kkapper
Copy link
Author

kkapper commented Jan 20, 2025

@drmingdrmer I should probably rephrase:

This issue should read: Databend Meta Backups Use The Source Cluster Hostnames Instead Of The Destination.

What I'd like to see exactly:

Databend backups only include the name of the node in each line, which would enable a backup to be transferred from any source cluster, to any destination.

@drmingdrmer
Copy link
Member

My proposal is that a backup should not have any concern for the hostnames of the source cluster.

That is impossible. The hostname is part of the raft log. Removing portion of raft log just results in data inconsistency.

The raft-advertise-address should be updated with --initial-cluster argument specified. In order to find out what's going wrong, can you re-export the data from a restored databend-meta service? For example: databend-metactl --export --raft-dir .databend/new_meta1 after shutting down the restored databend-meta service. And grep for AddNode.

There should be several lines of raft-log that adds new cluster configuration to override the existing ones, such as:

["raft_log",{"LogEntry":{"log_id":{"leader_id":{"term":1,"node_id":1},"index":14},"payload":{"Normal":{"txid":null,"cmd":{"AddNode":{"node_id":4,"node":{"name":"4","endpoint":{"addr":"localhost","port":29103},"grpc_api_advertise_address":null},"overriding":true}}}}}}]
["raft_log",{"LogEntry":{"log_id":{"leader_id":{"term":1,"node_id":1},"index":15},"payload":{"Normal":{"txid":null,"cmd":{"AddNode":{"node_id":5,"node":{"name":"5","endpoint":{"addr":"localhost","port":29203},"grpc_api_advertise_address":null},"overriding":true}}}}}}]
["raft_log",{"LogEntry":{"log_id":{"leader_id":{"term":1,"node_id":1},"index":16},"payload":{"Normal":{"txid":null,"cmd":{"AddNode":{"node_id":6,"node":{"name":"6","endpoint":{"addr":"localhost","port":29303},"grpc_api_advertise_address":null},"overriding":true}}}}}}]

And make sure the config file you were using to start the databend-meta contains the correct new cluster addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants