Cluster automatic recovery after graceful shutdown of all nodes #283

joffrey92 · 2015-06-04T12:53:18Z

Hi,

I really like the recovery feature (pc.recovery) : http://galeracluster.com/documentation-webpages/pcrecovery.html

Unfortunately, this doesn't cover the case when the cluster is gracefully shutdown, as the grwstate.dat file is deleted on all nodes.

One example situation would be a power failure in a data-center:

Power failure
Ups runs
Ups has no battery and stops gracefully the servers
Power is back
Cluster doesn't start
Operator needs to bootstrap the cluster by starting one node first, or creating a fake grwstate.dat file.

What do you think?
Joffrey

temeo · 2015-06-05T08:18:27Z

If cluster is shut down gracefully there will usually (maybe always) be only one node left in the last primary component. If gvwstate.dat is not removed, the node that was shut down last will always rebootstrap primary component automatically. This seemed to be too error prone in normal use cases where primary component is bootstrapped manually so it was decided that gvwstate.dat must go in graceful shutdown.

However, it should not be too much effort to implement alternative behavior for pc.recovery which would save gvwstate.dat also in graceful shutdown and which could be enabled if specifically needed. But this solution might not be safe enough...

There could be better way by making gvwstate.dat "sticky" in some cluster configuration, or by configuring minimum number of nodes required to save gvwstate.dat and always preserving node UUID over restart.

GeoffMontee · 2015-06-05T18:46:34Z

What if a few new wsrep provider options were added to allow this?

pc.recover_on_shutdown - Determines whether gvwstate.dat is written on a graceful shutdown.
pc.recover_minimum_nodes - The minimum value of wsrep_cluster_size in which gvwstate.dat is written. Maybe 3 by default?
pc.recover_minimum_nodes_weight - The minimum total value of the quorum weight of all nodes required for gvwstate.dat to be written.

ayurchen · 2015-06-06T10:34:19Z

I think this all can be generalized into just pc.recover_minimum_weight:

if it is 0, gvwstate.dat is not used on startup
if it is non-0, the primary component stops updating gvwstate.dat as soon as total component weight goes below specified. PC is bootstrapped when all members of at least last update see each other.

But frankly, I don't really see what is so dangerous in just letting the last node to shutdown (so the last node in PC) to bootstrap the PC. Isn't it the right thing to do?

temeo · 2015-06-06T13:38:21Z

But frankly, I don't really see what is so dangerous in just letting the last node to shutdown (so the last node in PC) to bootstrap the PC. Isn't it the right thing to do?

It is too easy to end up with two disjoint clusters.

Think about restarting the cluster with scripts which always assume that the first node should be bootstrapped with --wsrep-new-cluster. For example three node cluster restart, shutdown in sequence 1, 2, 3 and start in the same order. You will always end up with two disjoint clusters.

ayurchen · 2015-06-07T14:50:17Z

Think about restarting the cluster with scripts which always assume that the first node should be bootstrapped with --wsrep-new-cluster. For example three node cluster restart, shutdown in sequence 1, 2, 3 and start in the same order. You will always end up with two disjoint clusters.

True that. But in this case we would not need --wsrep-new-cluster any more. And, one should never put --wsrep-new-cluster in a script. Ideally.

chandlermelton · 2015-06-12T22:48:23Z

Joffrey also links to http://galeracluster.com/documentation-webpages/pcrecovery.html which states

This feature allows for: Graceful full cluster restarts, without the need for explicitly bootstrapping a new Primary Component.

How does this work? If I gracefully shutdown to 1 node, then do 'service mysql restart' on that node, it will not come back up without bootstrapping it.

ayurchen · 2015-06-13T13:02:51Z

It is a mistake in the manual.

chandlermelton · 2015-06-16T18:34:19Z

I see.

pc.recover_minimum_weight seems like a good idea. What would be its default setting?

ayurchen · 2015-06-16T19:37:33Z

Perhaps 0 for 3.x and 1 for 4.x. And if it is non-0, refuse to bootstrap the cluster via the usual --wsrep-new-cluster until gvwstate.dat is removed.

chandlermelton · 2015-09-01T02:37:23Z

I am revisiting this after a long while. For a typical failed 3 node cluster, why is it not possible for a failed node to start and then wait for the other nodes to check in? Is that what we're solving here, or is this suggesting that the 1st node to bootstrap will also start the PC and begin accepting writes?

If a three node cluster simply loses network connectivity and it is restored, you're good to go. I'd like to achieve something like that even if the mysql service stops, but I agree it wouldn't be safe to start the PC without a quorum based on the running nodes in last state of the cluster.

IIRC, when I used the mysqldump replication method during testing, a node was able to bootstrap on its own. Is there something special about xtrabackup* that prevents this?

chandlermelton · 2015-10-26T14:31:23Z

To add to my previous comment:
My confusion came from the behavior of wait_prim using non-mysqldump SST methods.

If you set wait_prim to yes, the nodes will start and listen/notify until quorum can be reached based on their gvwstate.dat files.

seanjnkns · 2016-08-31T17:11:53Z

I realize this feature request is over a year old, but has there been any traction on this whatsoever? Asking cause I'm in need of a similar feature, without having to revert to some hacky method of potentially generating a fake gvwstate.dat when it's determined to be the last node in the cluster, chattr'ing the file on shutdown and unchattr'ing on startup, etc... Having a pc.* option to save this is much more ideal.

andvgal · 2016-09-11T15:49:36Z

I would bump the issue too as graceful shutdown & automatic restore of primary can avoid SST.

If we do bootstrap after full shutdown then it leads to unavoidable SST what may involve serious load on disks and network as hundreds of GBs or even TBs would need to be re-copied in production (e.g. with xtrabackup method).

One typical production case of full shutdown is major software upgrade without rolling update option.

rldleblanc · 2016-12-01T22:40:41Z

We have been using (70559dc) #332 for a while with good success. What needs to happen to get this adopted?

ayurchen · 2016-12-03T16:02:02Z

We need the author to sign the Contributor Agreement. Unfortunately we can't accept significant code contributions without that.
In any case it looks like this will be solved another way.

seanjnkns · 2016-12-05T15:45:53Z

Out of curiosity, how is the other way that this issue will be solved if the author of this patch doesn't respond? And even if he does respond, I'm still curious what the other approach is. Can you provide a link or details on how otherwise the issue that this PR addresses would be addressed?

ayurchen · 2016-12-05T18:59:32Z

I can't provide any link since no work has been done yet on this. However recent updates to Galera make a solution to this issue more straightforward than it was initially envisioned.

Oloremo · 2017-10-16T10:57:29Z

This feature is truly needed. Any updates?

jeffrey4l · 2018-01-12T08:45:36Z

In my test, galera already implement auto recovery from a power failure. nodes will wait until all node are up.

But i can not found how this is implement. any ideas?

here is what i am using now:

$ rpm -qa | egrep -i 'mariadb|galera'
MariaDB-shared-10.0.31-1.el7.centos.x86_64
MariaDB-common-10.0.31-1.el7.centos.x86_64
MariaDB-client-10.0.31-1.el7.centos.x86_64
galera-25.3.20-1.rhel7.el7.centos.x86_64
MariaDB-Galera-server-10.0.31-1.el7.centos.x86_64

mluczak · 2018-07-12T15:19:08Z

Hi, any news about this feature request?

Oloremo · 2019-02-13T12:30:13Z

Anyone?..

Oloremo · 2019-11-13T15:17:42Z

@ayurchen Any plans for this? Look like there is a patch already implemented to support this.

ayurchen added MariaDB enhancement labels Oct 27, 2015

philip-galera assigned ayurchen Dec 2, 2016

Oloremo mentioned this issue Jan 12, 2018

issue#283 : Cluster automatic recovery after graceful shutdown of all… #332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster automatic recovery after graceful shutdown of all nodes #283

Cluster automatic recovery after graceful shutdown of all nodes #283

joffrey92 commented Jun 4, 2015

temeo commented Jun 5, 2015

GeoffMontee commented Jun 5, 2015

ayurchen commented Jun 6, 2015

temeo commented Jun 6, 2015

ayurchen commented Jun 7, 2015

chandlermelton commented Jun 12, 2015

ayurchen commented Jun 13, 2015

chandlermelton commented Jun 16, 2015

ayurchen commented Jun 16, 2015

chandlermelton commented Sep 1, 2015

chandlermelton commented Oct 26, 2015

seanjnkns commented Aug 31, 2016

andvgal commented Sep 11, 2016

rldleblanc commented Dec 1, 2016 •

edited

Loading

ayurchen commented Dec 3, 2016 •

edited

Loading

seanjnkns commented Dec 5, 2016

ayurchen commented Dec 5, 2016

Oloremo commented Oct 16, 2017

jeffrey4l commented Jan 12, 2018

mluczak commented Jul 12, 2018

Oloremo commented Feb 13, 2019

Oloremo commented Nov 13, 2019

Cluster automatic recovery after graceful shutdown of all nodes #283

Cluster automatic recovery after graceful shutdown of all nodes #283

Comments

joffrey92 commented Jun 4, 2015

temeo commented Jun 5, 2015

GeoffMontee commented Jun 5, 2015

ayurchen commented Jun 6, 2015

temeo commented Jun 6, 2015

ayurchen commented Jun 7, 2015

chandlermelton commented Jun 12, 2015

ayurchen commented Jun 13, 2015

chandlermelton commented Jun 16, 2015

ayurchen commented Jun 16, 2015

chandlermelton commented Sep 1, 2015

chandlermelton commented Oct 26, 2015

seanjnkns commented Aug 31, 2016

andvgal commented Sep 11, 2016

rldleblanc commented Dec 1, 2016 • edited Loading

ayurchen commented Dec 3, 2016 • edited Loading

seanjnkns commented Dec 5, 2016

ayurchen commented Dec 5, 2016

Oloremo commented Oct 16, 2017

jeffrey4l commented Jan 12, 2018

mluczak commented Jul 12, 2018

Oloremo commented Feb 13, 2019

Oloremo commented Nov 13, 2019

rldleblanc commented Dec 1, 2016 •

edited

Loading

ayurchen commented Dec 3, 2016 •

edited

Loading