Semi-synchronous Replication in MySQL

If we talk about MySQL replication, we always default to asynchronous replication. By asynchronous, we mean master will keep doing transactions while they are getting replicated to slave via IO thread asynchronously. In this replication, its possible that slave will fall behind but that does not impact master in anyway.
Problem with this replication is data-loss. Example: In case slave is falling behind and master host crashes, some of the transactions may not get transfered/replicated to slave. Following 2 options exists in this scenario

  1. Wait for master host to come up. Which may take around 20-25 mins
  2. Promote slave to master with some data loss

Nether options sounds good on production database. That’s why we have semi-synchronous replication.

Purpose of semi-synchronous replication is twofold

  1. Reduce replication lag and keep slave in sync with master
  2. Ensure no data-loss in case master crashes

Semi-synchronous replication has been in place from MySQL 5.5 onwards in the form of plugin. We just have to install the plugin and then we can enable the same. But again this semi-synchronous replication available in MySQL 5.5 and 5.6 does not guarantee “no data loss” completely for the reasons we will see later but it keeps slave in sync with master.

MySQL 5.7 Oracle has made some enhanced semi-synchronous replication which they call “loss-less semi-synchronous replication”. This guarantees that there is no data loss in case master crashes.

We will take a look at how semi-synchronous replication works, how to install the plugin and whats the enhancement in MySQL 5.7 to prevent data loss.

How Semi-Synchronous replication works?

As discussed above, there are 2 different algorithm for semi-synchronous replication.

  1. Normal semi-synchronous replication available in mysql 5.5 and 5.6 – does not guarantee “no data loss”
  2. Loss-less semi-synchronous replication available in mysql 5.7 – guarantee “no data loss”

lets take a look at each and how it works

1) Normal semi-synchronous replication in MySQL 5.5 and 5.6

If we talk above true synchronous replication – In truly synchronous replication, when you commit a transaction, the commit does not complete on master until all replicas have also committed successfully and acknowledged to master.

In MySQL’s semi-synchronous replication when transaction is committed on master, its written to binary log and innodb transaction log and commit is complete. At this point the data has not even reached to any of the slave but on master commit is complete. The client issuing the commit is still on hold and is not told that commit has complete.
At this point slave IO thread pulls transaction data from binary logs and updated relay logs, flush relay logs to disk and acknowledges master that slave has received and written the data. Only after receiving ack from at least 1 slave, MySQL confirms the client that transaction is committed.

There is a problem with above functionality. MySQL is only delaying acknowledge of commit to client. In reality its committing the transaction on master way before that (before even sending data to slave). Since committing of transaction is done before waiting for ack from semi-sync slave, so the committed rows are visible from applications, even though semi-sync slaves may not have received the data. If master is crashed and none of the slaves received the data, the data is lost but applications may have seen them. This is called phantom reads, and in many cases it’s problematic.

2) Loss-less semi-synchronous replication in MySQL 5.7

loss-less semi-synchronous replication in MySQL 5.7 answers above problem. Only change that’s done in this semi-synchronous replication is that commit on primary is delayed until master receives acknowledgement from only of the semi-synchronous slave that data is received and written by the slave. So when committed data is visible from applications, one of the semi-sync slaves have received that. Phantom read risk is much smaller: if both master and the latest semi-sync slave are down at the same time, data is lost. But it’s much less likely to happen compared to normal semi-sync.

To avoid data loss and phantom reads, Normal Semi-sync can’t meet your expectations. Using Loss-Less Semi-sync is needed.
With Loss-Less Semi-Synchronous replication, committed data should be on one of the slaves, so you can recover from the latest slave. You can always do fast failover here.

The acknowledgement from the slave means, it has received the transaction and applied it to the relay log and flushed it to disk. It doesn’t mean the data has actually gone into the database.

Just to make this statement clearer, it means the transaction was written to the relay log, and the relay log was flushed to disk.
There is still the need for the SQL thread to pick it up and apply it to the database. It’s very unlikely not to complete this task, but that way we’re getting a bit more performance.

Implementing semi-synchronous replication

1) check if semi-synchronous plugin is installed using “show plugins;”


mysql> show plugins;
+----------------------------+----------+--------------------+---------+---------+
| Name                       | Status   | Type               | Library | License |
+----------------------------+----------+--------------------+---------+---------+
| binlog                     | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| mysql_native_password      | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| mysql_old_password         | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| sha256_password            | ACTIVE   | AUTHENTICATION     | NULL    | GPL     |
| MRG_MYISAM                 | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MyISAM                     | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| MEMORY                     | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| CSV                        | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| InnoDB                     | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| INNODB_TRX                 | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
| INNODB_LOCKS               | ACTIVE   | INFORMATION SCHEMA | NULL    | GPL     |
...
...
| PERFORMANCE_SCHEMA         | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| BLACKHOLE                  | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| FEDERATED                  | DISABLED | STORAGE ENGINE     | NULL    | GPL     |
| ARCHIVE                    | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
| partition                  | ACTIVE   | STORAGE ENGINE     | NULL    | GPL     |
+----------------------------+----------+--------------------+---------+---------+
42 rows in set (0.01 sec)

You should be looking for “rpl_semi_sync_master” name.
In my case plugin was not installed.

Optionally you can also check system variables


mysql> show variables like 'rpl_semi_sync%';
Empty set (0.00 sec)

Nothing. That means semi-synchronous functionality is not installed.

2) Install semi-synchronous plugin

You need to install “semisync_master.so” on master and “semisync_slave.so” on slave

on Master:


mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.12 sec)

on Slave:


mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.00 sec)

These .so plugin files are available in <mysql_install_dir>/lib/plugin directory.

“show plugins;” should show the plugin now


mysql> show plugins;
+----------------------------+----------+--------------------+--------------------+---------+
| Name                       | Status   | Type               | Library            | License |
+----------------------------+----------+--------------------+--------------------+---------+
| binlog                     | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
| mysql_native_password      | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| mysql_old_password         | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| sha256_password            | ACTIVE   | AUTHENTICATION     | NULL               | GPL     |
| MRG_MYISAM                 | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
...
...
| rpl_semi_sync_master       | ACTIVE   | REPLICATION        | semisync_master.so | GPL     |
+----------------------------+----------+--------------------+--------------------+---------+
43 rows in set (0.00 sec)

You should also see the variables


mysql> show variables like 'rpl_semi_syn%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled       | OFF   |
| rpl_semi_sync_master_timeout       | 10000 |
| rpl_semi_sync_master_trace_level   | 32    |
| rpl_semi_sync_master_wait_no_slave | ON    |
+------------------------------------+-------+
4 rows in set (0.00 sec)

3) Enable semi-synchronous replication

On master:


mysql> SET GLOBAL rpl_semi_sync_master_enabled = 1;
Query OK, 0 rows affected (0.03 sec)

mysql> SET GLOBAL rpl_semi_sync_master_timeout = 10000;
Query OK, 0 rows affected (0.00 sec)

rpl_semi_sync_master_timeout is the timeout in millisec. We are setting a timeout of 10 sec.

On slave:


mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)

mysql> start slave io_thread;
Query OK, 0 rows affected (0.00 sec)

we have to bounce IO_THREAD on slave so that it will connect back to master as semi-synchronous slave.

Monitoring Semi-synchronous replication

We have few status variables that we can monitor to check if semi-synchronous replication is working as expected or not.

If you check status variables on master, it shows following status variables.


mysql> SHOW STATUS LIKE 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 0     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 0     |
| Rpl_semi_sync_master_tx_wait_time          | 0     |
| Rpl_semi_sync_master_tx_waits              | 0     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 0     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

On replication slave we have only 1 variable Rpl_semi_sync_slave_status.

Among the variables mentioned above, 3 varaibles are important for us to watch

Rpl_semi_sync_master_status -> Whether semi-synchronous replication currently is operational on the master
Rpl_semi_sync_master_yes_tx -> The number of commits that were acknowledged successfully by a slave
Rpl_semi_sync_master_no_tx  -> The number of commits that were not acknowledged successfully by a slave

Currently we see that Rpl_semi_sync_master_status is ON, that means the semi-synchronous replication is working fine.

Lets create a table in “deo” database and insert few records.

On master:


mysql> use deo;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table test (col1 smallint not null);
Query OK, 0 rows affected (0.14 sec)

mysql> set autocommit=1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test values (1),(2),(3);
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from test;
+------+
| col1 |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)
mysql> show status like 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 1356  |
| Rpl_semi_sync_master_net_wait_time         | 4070  |
| Rpl_semi_sync_master_net_waits             | 3     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 1379  |
| Rpl_semi_sync_master_tx_wait_time          | 4139  |
| Rpl_semi_sync_master_tx_waits              | 3     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 3     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

If you see above, we have 3 transactions done (corrosponding to 3 inserts since autocommit was on) and all are replicated synchronously on slave.

Rpl_semi_sync_master_yes_tx has become 3 and Rpl_semi_sync_master_no_tx is still 0.

Lets stop replication IO thread so that changes in master will not get replicated on slave

on slave:


mysql> stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)

Now, lets insert a record in test table in master

On master:


mysql> insert into test values (4);
Query OK, 1 row affected (10.01 sec)

mysql> show status like 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 1022  |
| Rpl_semi_sync_master_net_wait_time         | 4090  |
| Rpl_semi_sync_master_net_waits             | 4     |
| Rpl_semi_sync_master_no_times              | 1     |
| Rpl_semi_sync_master_no_tx                 | 1     |
| Rpl_semi_sync_master_status                | OFF   |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 1379  |
| Rpl_semi_sync_master_tx_wait_time          | 4139  |
| Rpl_semi_sync_master_tx_waits              | 3     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 3     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

If you see, Rpl_semi_sync_master_no_tx is updated to 1 and Rpl_semi_sync_master_status is updated to off.
It means that replication is changed back to asynchronous.

Also note that insert statement took 10 secs to complete, because semi-sync replication timeout was set to 10 sec (rpl_semi_sync_master_timeout = 10000)

We can now start the slave IO thread and it will be back to semi-synchronous replication

on slave:


mysql> start slave io_thread;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from test;
+------+
| col1 |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
+------+
4 rows in set (0.00 sec)

On master:


mysql> show status like 'Rpl_semi_sync%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 1     |
| Rpl_semi_sync_master_net_avg_wait_time     | 1626  |
| Rpl_semi_sync_master_net_wait_time         | 8131  |
| Rpl_semi_sync_master_net_waits             | 5     |
| Rpl_semi_sync_master_no_times              | 1     |
| Rpl_semi_sync_master_no_tx                 | 1     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 1379  |
| Rpl_semi_sync_master_tx_wait_time          | 4139  |
| Rpl_semi_sync_master_tx_waits              | 3     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 3     |
+--------------------------------------------+-------+
14 rows in set (0.00 sec)

So after starting slave IO thread, it recovered immediately and semi-synchronous replication enabled automatically.

Reference:

http://yoshinorimatsunobu.blogspot.in/2014/04/semi-synchronous-replication-at-facebook.html
http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html
http://www.percona.com/blog/2012/01/19/how-does-semisynchronous-mysql-replication-work/

Hope this help !!