Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galera attempts to recover primary view if both pc.recovery and --wsrep-new-cluster are set #355

Open
GeoffMontee opened this issue Sep 29, 2015 · 1 comment

Comments

@GeoffMontee
Copy link

When bootstrapping a new cluster with --wsrep-new-cluster, it doesn't make sense to recover an old primary view from gvwstate.dat, since the user is explicitly asking to bootstrap a new cluster. However, in that situation, Galera does still attempt to recover the old primary view if pc.recovery is set.

We can show this by reproducing issue #354 and then bootstrapping. For example:

[ec2-user@ip-172-31-31-73 ~]$ sudo cp /dev/null /var/lib/mysql/gvwstate.dat
[ec2-user@ip-172-31-31-73 ~]$ sudo service mysql bootstrap
Bootstrapping the cluster.. Starting MySQL... ERROR!

And then the error log shows:

150929 12:02:36 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster
150929 12:02:36 [Note] WSREP: Setting initial position to 3b43c7c9-45c3-11e5-b1d7-be55ee2e2933:10
150929 12:02:36 [Note] WSREP: protonet asio version 0
150929 12:02:36 [Note] WSREP: Using CRC-32C for message checksums.
150929 12:02:36 [Note] WSREP: backend: asio
150929 12:02:36 [Note] WSREP: restore pc from disk successfully
150929 12:02:36 [Note] WSREP: GMCast version 0
150929 12:02:36 [Note] WSREP: (00000000, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
150929 12:02:36 [Note] WSREP: (00000000, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
150929 12:02:36 [ERROR] WSREP: failed to open gcomm backend connection: 131: invalid UUID: 00000000 (FATAL)
         at gcomm/src/pc.cpp:PC():309
150929 12:02:36 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():206: Failed to open backend connection: -131 (State not recoverable)
150929 12:02:36 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1379: Failed to open channel 'my_wsrep_cluster' at 'gcomm://172.31.31.72,172.31.31.73,172.31.20.230': -131 (State not recoverable)
150929 12:02:36 [ERROR] WSREP: gcs connect failed: State not recoverable
150929 12:02:36 [ERROR] WSREP: wsrep::connect() failed: 7
150929 12:02:36 [ERROR] Aborting

I believe this is fixed in pull request #332. See here: https://github.com/nirbhayc/galera/commit/70559dc1d8d3f88dc08fbe38a387b3507bba3410#diff-f74f5be8fb18cad3dd5bd7049b19dc8eR98

@anthonyoteri
Copy link

I'm running into the same issue at one of our customer's sites. Is there any workaround / solution which can be performed which will allow our customer to recover from this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants