Racing with MySQL online schema changes

The gist of it

Even as of MySQL 5.7 there are ALTER TABLE operations that cannot be done on-line, that is without blocking writes to the table that has its schema changed. Altering a large table can take hours or days and blocking writes for longer than a few seconds is often unacceptable. Fortunately there exist great external tools that fix this gap, the Percona Toolkit for MySQL being one of them.

The principle of their operation is simple and roughly consists of the following steps:

– create a new table with the schema matching the old one
– quickly alter the new, empty table
– create a set of on delete/on update/on insert triggers on the original table that track changes on the original table and apply them on the new one
– copy data from the old table to the new one
– atomically switch the tables around (using rename)
– drop triggers and the old table

This works very well in practice, but in some cases there exists a very short time window when things can go funky if the altered table has an auto_increment column that’s meaningful to the application. Needless to say, auto-incremented integer primary keys that serve as unique object/entity identifiers are quite common.

The problematic sequence of events is as follows:

1) The online migration tool creates the new table
2) The application inserts and shortly thereafter deletes the newly inserted records from the database
3) The online migration tool creates triggers and copies data to the new table, switches the tables around
4) The application inserts new rows into the database

At time (1), the migration tool creates the new table with the auto_increment value matching the original table.

Before the change-capture triggers are created at time (3), new rows appear and are removed at (2). The auto_increment on the original table changes, reflecting that.

Once the migration is done at (3), inserts into the new table at (4) are going to assign previously seen IDs to new, different objects/rows.

It can be a serious problem if information about the short-lived rows from time (2) has ever reached any external systems – from there on there would be two different entities being referenced by the same ID! If our system performs data change capture at the MySQL binary log level and we hit this condition, we’re guaranteed to see both of the two different rows being inserted into “the same” table with the same id values.

Fortunately this race condition is unlikely in practice, even from the security point of view.

Proof?

Fortunately it’s quite easy to reproduce this scenario in a sandbox environment, so let’s do just that! We’ll create a test table and modify the pt-online-schema-change utility to guarantee our ugly race. I used the 2.2.16-1 version for code references.

First, prepare the test database:

mysql> create database pt_test;
mysql> use pt_test;
mysql> create table pt_test (id int auto_increment primary key, data varchar(255));
mysql> insert into pt_test(data) values ("wat");

Next, modify the create_new_table subroutine in the tool’s source code. We’ll add the insert-delete sequence just after creating the new table:

$cxn->dbh()->do("insert into $orig_tbl->{name}(data) values (\"oops\");");
$cxn->dbh()->do("delete from $orig_tbl->{name} order by id desc limit 1;");

just before the following lines:

print $sql, "\n" if $o->get('print'); # the sql that work
print "Created new table $orig_tbl->{db}.$new_table_name OK.\n";

Run it!

$ pt-online-schema-change u=USERNAME,p=PASSWORD,D=pt_test,t=pt_test --alter='modify column data varchar(1024)' --execute

Let’s what happens:

mysql> select * from pt_test;
+----+------+
| id | data |
+----+------+
| 1  | wat  |
+----+------+

Let’s add one more row.

mysql> insert into pt_test(data) values ("one more");
mysql> select * from pt_test;
+----+----------+
| id | data     |
+----+----------+
| 1  | wat      |
| 2  | one more |
+----+----------+

The newly added row will have the id = 2, but we know that a row with the id of 2 has already been inserted and deleted before! If we have binary logging enabled, the mysqlbinlog utility will confirm the short existence of the “oops” entry with id of 2. Oops…

What now?

I cannot think of any ways of fixing this problem other than locking the original table for the time it takes to create the new table *and* the change capture triggers. Transactions are obviously of no use, since they don’t apply to DDL operations.

An alternative solution is not to rely on identifiers obtained with auto increments on the database side and use GUIDs instead.