What causes LOST_EVENTS error in the MySQL binary log?

Using MySQL 5.6.13 under CentOS 6.4, I had a replication failure on one master/slave topology because the master binary log had the following entry that was intentionally written by the MySQL server.

$ mysqlbinlog --start-position=244670849 mysql-bin.000029
...
# at 244670849
#131028 19:31:38 server id 39  end_log_pos 244670906 CRC32 0xc5d084ec
# Incident: LOST_EVENTS
RELOAD DATABASE; # Shall generate syntax error
# at 244670906
...

The question is why? I do not know the answer. Searching online indicates this can occur in a MySQL Cluster environment, and can occur around the use of GRANT statements. Neither of these situations are applicable here. This site runs 4 different master/slave topologies all running the same version, and this has been the only occurrence.

The message, if taken literally could indicate that the master server failed to write all events to the master binary log, but instead of reporting any information (say in the MySQL error log), it basically tells you the binary log (aka the replication stream) is now incomplete and you need to rebuild your entire replication topology. If this is true, then this is a poor explanation, and there is no MySQL documentation I have found to shed light here.

The situation of course does not manifest to the end user on the master (or say monitoring the MySQL error log), but on the slave with a message similar to:

mysql> SHOW SLAVE STATUSG
...
        Relay_Master_Log_File: mysql-bin.000029
          Exec_Master_Log_Pos: 244670849
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 1590
                   Last_Error: The incident LOST_EVENTS occured on the master. Message: error writing to the binary log
...

What appears to happen on the master, is the LOST_EVENTS binary log event is written, the binary logs are flushed, as indicated by the file sizes.

-rw-rw---- 1 mysql mysql 1073742236 Oct 27 23:33 mysql-bin.000027
-rw-rw---- 1 mysql mysql 1073741890 Oct 28 11:28 mysql-bin.000028
-rw-rw---- 1 mysql mysql  244670953 Oct 28 19:31 mysql-bin.000029
-rw-rw---- 1 mysql mysql 1073742184 Oct 29 08:55 mysql-bin.000030

Further analysis of the statements before the event in the master log show nothing of incident. Analysis of the binary logs at the error and subsequently how second timestamps that are identical, so there is no *obvious* loss of operations, but when there are 100s of transactions per second, using second granularity is ineffective.

When confirming there were no events after this position on the slave, and the binary logs had been flushed the solution taken was to reset the slave to the next valid event.

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000030', MASTER_LOG_POS=120;
mysql> START SLAVE:

Comments

  1. Ammon Sutherland says

    I have had this hit me twice in the last 3 weeks now. No idea why either. No real damage appears to be done, just annoying to have to skip an event and start the slaves again.

  2. Lalit says

    i enabled GTID base replication ,CHANGE MASTER TO MASTER_LOG_FILE and MASTER_LOG_POS option will not work as MASTER_AUTO_POSITION = 1 .
    In my case as it’s test server i deleted master binary logs and set master GTID value in slave (using set global GTID_PURGED) and now slave running fine.

    Any other suggestions for production server where deleting binary logs is critical.