Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster automatic recovery after graceful shutdown of all nodes #283

Open
joffrey92 opened this issue Jun 4, 2015 · 22 comments
Open

Cluster automatic recovery after graceful shutdown of all nodes #283

joffrey92 opened this issue Jun 4, 2015 · 22 comments

Comments

@joffrey92
Copy link

Hi,

I really like the recovery feature (pc.recovery) : http://galeracluster.com/documentation-webpages/pcrecovery.html

Unfortunately, this doesn't cover the case when the cluster is gracefully shutdown, as the grwstate.dat file is deleted on all nodes.

One example situation would be a power failure in a data-center:

  1. Power failure
  2. Ups runs
  3. Ups has no battery and stops gracefully the servers
  4. Power is back
  5. Cluster doesn't start
  6. Operator needs to bootstrap the cluster by starting one node first, or creating a fake grwstate.dat file.

What do you think?
Joffrey

@temeo
Copy link
Contributor

temeo commented Jun 5, 2015

If cluster is shut down gracefully there will usually (maybe always) be only one node left in the last primary component. If gvwstate.dat is not removed, the node that was shut down last will always rebootstrap primary component automatically. This seemed to be too error prone in normal use cases where primary component is bootstrapped manually so it was decided that gvwstate.dat must go in graceful shutdown.

However, it should not be too much effort to implement alternative behavior for pc.recovery which would save gvwstate.dat also in graceful shutdown and which could be enabled if specifically needed. But this solution might not be safe enough...

There could be better way by making gvwstate.dat "sticky" in some cluster configuration, or by configuring minimum number of nodes required to save gvwstate.dat and always preserving node UUID over restart.

@GeoffMontee
Copy link

What if a few new wsrep provider options were added to allow this?

  • pc.recover_on_shutdown - Determines whether gvwstate.dat is written on a graceful shutdown.
  • pc.recover_minimum_nodes - The minimum value of wsrep_cluster_size in which gvwstate.dat is written. Maybe 3 by default?
  • pc.recover_minimum_nodes_weight - The minimum total value of the quorum weight of all nodes required for gvwstate.dat to be written.

@ayurchen
Copy link
Member

ayurchen commented Jun 6, 2015

I think this all can be generalized into just pc.recover_minimum_weight:

  • if it is 0, gvwstate.dat is not used on startup
  • if it is non-0, the primary component stops updating gvwstate.dat as soon as total component weight goes below specified. PC is bootstrapped when all members of at least last update see each other.

But frankly, I don't really see what is so dangerous in just letting the last node to shutdown (so the last node in PC) to bootstrap the PC. Isn't it the right thing to do?

@temeo
Copy link
Contributor

temeo commented Jun 6, 2015

But frankly, I don't really see what is so dangerous in just letting the last node to shutdown (so the last node in PC) to bootstrap the PC. Isn't it the right thing to do?

It is too easy to end up with two disjoint clusters.

Think about restarting the cluster with scripts which always assume that the first node should be bootstrapped with --wsrep-new-cluster. For example three node cluster restart, shutdown in sequence 1, 2, 3 and start in the same order. You will always end up with two disjoint clusters.

@ayurchen
Copy link
Member

ayurchen commented Jun 7, 2015

Think about restarting the cluster with scripts which always assume that the first node should be bootstrapped with --wsrep-new-cluster. For example three node cluster restart, shutdown in sequence 1, 2, 3 and start in the same order. You will always end up with two disjoint clusters.

True that. But in this case we would not need --wsrep-new-cluster any more. And, one should never put --wsrep-new-cluster in a script. Ideally.

@chandlermelton
Copy link

Joffrey also links to http://galeracluster.com/documentation-webpages/pcrecovery.html which states

This feature allows for: Graceful full cluster restarts, without the need for explicitly bootstrapping a new Primary Component.

How does this work? If I gracefully shutdown to 1 node, then do 'service mysql restart' on that node, it will not come back up without bootstrapping it.

@ayurchen
Copy link
Member

It is a mistake in the manual.

@chandlermelton
Copy link

I see.

pc.recover_minimum_weight seems like a good idea. What would be its default setting?

@ayurchen
Copy link
Member

Perhaps 0 for 3.x and 1 for 4.x. And if it is non-0, refuse to bootstrap the cluster via the usual --wsrep-new-cluster until gvwstate.dat is removed.

@chandlermelton
Copy link

I am revisiting this after a long while. For a typical failed 3 node cluster, why is it not possible for a failed node to start and then wait for the other nodes to check in? Is that what we're solving here, or is this suggesting that the 1st node to bootstrap will also start the PC and begin accepting writes?

If a three node cluster simply loses network connectivity and it is restored, you're good to go. I'd like to achieve something like that even if the mysql service stops, but I agree it wouldn't be safe to start the PC without a quorum based on the running nodes in last state of the cluster.

IIRC, when I used the mysqldump replication method during testing, a node was able to bootstrap on its own. Is there something special about xtrabackup* that prevents this?

@chandlermelton
Copy link

To add to my previous comment:
My confusion came from the behavior of wait_prim using non-mysqldump SST methods.

If you set wait_prim to yes, the nodes will start and listen/notify until quorum can be reached based on their gvwstate.dat files.

@seanjnkns
Copy link

I realize this feature request is over a year old, but has there been any traction on this whatsoever? Asking cause I'm in need of a similar feature, without having to revert to some hacky method of potentially generating a fake gvwstate.dat when it's determined to be the last node in the cluster, chattr'ing the file on shutdown and unchattr'ing on startup, etc... Having a pc.* option to save this is much more ideal.

@andvgal
Copy link

andvgal commented Sep 11, 2016

I would bump the issue too as graceful shutdown & automatic restore of primary can avoid SST.

If we do bootstrap after full shutdown then it leads to unavoidable SST what may involve serious load on disks and network as hundreds of GBs or even TBs would need to be re-copied in production (e.g. with xtrabackup method).

One typical production case of full shutdown is major software upgrade without rolling update option.

@rldleblanc
Copy link

rldleblanc commented Dec 1, 2016

We have been using (70559dc) #332 for a while with good success. What needs to happen to get this adopted?

@ayurchen
Copy link
Member

ayurchen commented Dec 3, 2016

We need the author to sign the Contributor Agreement. Unfortunately we can't accept significant code contributions without that.
In any case it looks like this will be solved another way.

@seanjnkns
Copy link

Out of curiosity, how is the other way that this issue will be solved if the author of this patch doesn't respond? And even if he does respond, I'm still curious what the other approach is. Can you provide a link or details on how otherwise the issue that this PR addresses would be addressed?

@ayurchen
Copy link
Member

ayurchen commented Dec 5, 2016

I can't provide any link since no work has been done yet on this. However recent updates to Galera make a solution to this issue more straightforward than it was initially envisioned.

@Oloremo
Copy link

Oloremo commented Oct 16, 2017

This feature is truly needed. Any updates?

@jeffrey4l
Copy link

In my test, galera already implement auto recovery from a power failure. nodes will wait until all node are up.

But i can not found how this is implement. any ideas?

here is what i am using now:

$ rpm -qa | egrep -i 'mariadb|galera'
MariaDB-shared-10.0.31-1.el7.centos.x86_64
MariaDB-common-10.0.31-1.el7.centos.x86_64
MariaDB-client-10.0.31-1.el7.centos.x86_64
galera-25.3.20-1.rhel7.el7.centos.x86_64
MariaDB-Galera-server-10.0.31-1.el7.centos.x86_64

@mluczak
Copy link

mluczak commented Jul 12, 2018

Hi, any news about this feature request?

@Oloremo
Copy link

Oloremo commented Feb 13, 2019

Anyone?..

@Oloremo
Copy link

Oloremo commented Nov 13, 2019

@ayurchen Any plans for this? Look like there is a patch already implemented to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests