Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-34891 : SST failure occurs when gtid_strict_mode is enabled #468

Open
wants to merge 2 commits into
base: 10.5
Choose a base branch
from

Conversation

janlindstrom
Copy link

Problem was that initial GTID was set on wsrep_before_prepare out-of-order. In practice GTID was set to same as previous executed transaction GTID. In recovery valid GTID was found from prepared transaction and this transaction is committed leadin to fac that same GTID was executed twice.

This is fixed by setting invalid GTID at wsrep_before_prepare and later in wsrep_before_commit actual correct GTID is set and this setting is done while we are in commit monitor i.e. assigment is done in order of replication.

In recovery if prepared transaction is found we check its GTID, if it is invalid transaction will be rolled back and if it is valid it will be committed.

Added two test cases for both mariabackup and rsync SST methods to show that GTIDs remain consistent on cluster and that all expected rows are in the table.

@janlindstrom janlindstrom self-assigned this Jan 2, 2025
Problem was that initial GTID was set on wsrep_before_prepare
out-of-order. In practice GTID was set to same as previous
executed transaction GTID. In recovery valid GTID was found
from prepared transaction and this transaction is committed
leadin to fac that same GTID was executed twice.

This is fixed by setting invalid GTID at wsrep_before_prepare
and later in wsrep_before_commit actual correct GTID is set
and this setting is done while we are in commit monitor i.e.
assigment is done in order of replication.

In recovery if prepared transaction is found we check its
GTID, if it is invalid transaction will be rolled back
and if it is valid it will be committed.

Added two test cases for both mariabackup and rsync SST methods
to show that GTIDs remain consistent on cluster and that
all expected rows are in the table.
@@ -324,4 +324,8 @@ ERROR 42000: You have an error in your SQL syntax; check the manual that corresp
ALTER SEQUENCE IF EXISTS t MINVALUE=1;
ERROR 42000: This version of MariaDB doesn't yet support 'CACHE without INCREMENT BY 0 in Galera cluster'
DROP TABLE t;
connection node_2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these suppressions are related to this PR?

else
{
WSREP_DEBUG("before_prepare failed for %s ret=%s (%d)",
wsrep_xid_print(&thd->wsrep_xid), strerror(ret), ret);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come does this work? wsrep_xid_print() returns std::string, but logging macros use C-style format specifiers?

Another issue with this debug logging is that the return value of thd->wsrep_cs().before_prepare() is not system error code. If there is some need for debug logging here, better to log

wsrep::to_c_string(thd->wsrep_cs().current_error())

wsrep::to_c_string(thd->wsrep_cs().state()),
wsrep::to_c_string(thd->wsrep_cs().mode()),
wsrep::to_c_string(thd->wsrep_cs().transaction().state()),
strerror(ret), ret);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return code from thd->wsrep_cs().after_statement() is not a system error code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants