March 24, 2025

# MySQL Primary-Replica Replication Error 1594 Analysis

This article analyzes MySQL primary-replica replication Error 1594, exploring causes like relay log corruption and offering troubleshooting methods.

1. Failure Background

A customer reported a MySQL primary-replica replication failure. Upon inspection, it was found that the SQL thread had stopped writing due to error 1594, while the IO thread was still functioning normally, able to fetch logs without issues.

After executing the reset replica command on the replica database and rebuilding the replication, the primary-replica replication returned to normal. The following is a detailed analysis of the cause of this error.

To provide the customer with a comprehensive explanation, we conducted a thorough investigation, analyzing all potential factors that could have contributed to the failure in order to identify the most likely cause.

2. Failure Analysis

MySQL Version: 5.7.26, with GTID enabled.

2.1 Checking Replica Error Information

mysql> show replica status\G
*************************** 1. row ***************************
               Replica_IO_State: Waiting for master to send event
                  Primary_Host: xx.xx.xx.xx
                  Primary_User: universe_op
                  Primary_Port: 3306
                Connect_Retry: 60
              Primary_Log_File: mysql-bin.xxxxxx
          Read_Primary_Log_Pos: xxxxxx
               Relay_Log_File: mysql-relay.xxxxxx
                Relay_Log_Pos: 243865
        Relay_Primary_Log_File: mysql-bin.xxxxxx
             Replica_IO_Running: Yes
            Replica_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the primary's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the replica's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the primary's or replica's MySQL code. If you want to check the primary's binary log or replica's relay log, you will be able to know their names by issuing 'SHOW REPLICA STATUS' on this replica.
                 Skip_Counter: 0

From the show replica status\G command, it is clear that the relay log read failure was due to an inability to parse the transactions in the relay log.

2.2 Checking Replica Error Logs

Review the error logs on the replica database at the corresponding time.

2024-12-29T00:22:19.824988+08:00 241186 [ERROR] Error in Log_event::read_log_event(): 'Event crc check failed! Most likely there is event corruption.', data_len: 8060, event_type: 31
2024-12-29T00:22:19.825014+08:00 241186 [ERROR] Error reading relay log event for channel '': replica SQL thread aborted because of I/O error
2024-12-29T00:22:19.825027+08:00 241186 [ERROR] Replica SQL for channel '': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the primary's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the replica's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the primary's or replica's MySQL code. If you want to check the primary's binary log or replica's relay log, you will be able to know their names by issuing 'SHOW REPLICA STATUS' on this replica. Error_code: 1594
2024-12-29T00:22:21.146920+08:00 241186 [ERROR] Error running query, replica SQL thread aborted. Fix the problem, and restart the replica SQL thread with "REPLICA START". We stopped at log 'mysql-bin.053394' position 264587564.

From the error log information, we can extract the following key points:

The presence of an "Event crc check failed" message indicates a CRC check failure for the log event, which typically means the event data is corrupted or the log file is incomplete.

When the replica reads the relay log and encounters a CRC check error, it can cause the replica's SQL thread to stop and report a "Relay Log Read Failure."

2.3 Further Investigation of Potential Causes

First, we checked the system logs at the corresponding time and found no abnormal outputs, thus ruling out disk failures and MySQL Out Of Memory (OOM) issues during the error period.

We also investigated whether the failure was due to an oversized transaction exceeding the max_allowed_packet limit, causing fragmented writes. However, calculations showed the transaction size was only 20MB, so this was ruled out as a cause.

Finally, we examined the MySQL service status and binary log files on the primary database at the time of the failure, as well as the network monitoring, and found everything to be normal.

At this point, it became clear that the issue was with the relay log, which was causing the inability to parse the transactions in the relay log. Since the customer's environment did not retain the relay log at the time of the error, we could only replicate the customer's failure in our own environment and compare the collected information with that from the customer's faulty environment.

2.4 Local Environment Replication

We first checked the primary-replica status in our local environment.

mysql> show replica status\G
*************************** 1. row ***************************
               Replica_IO_State: Waiting for master to send event
                  Primary_Host: 10.186.63.44
                  Primary_User: universe_op
                  Primary_Port: 3306
                Connect_Retry: 60
              Primary_Log_File: mysql-bin.000001
          Read_Primary_Log_Pos: 3527783
               Relay_Log_File: mysql-relay.000003
                Relay_Log_Pos: 10303
        Relay_Primary_Log_File: mysql-bin.000001
             Replica_IO_Running: Yes
            Replica_SQL_Running: Yes
              Replicate_Do_DB: 

We then manually stopped the SQL thread.

mysql> STOP REPLICA SQL_THREAD;
Query OK, 0 rows affected (0.00 sec)

Checking the replication status again.

mysql> show replica status\G
*************************** 1. row ***************************
               Replica_IO_State: Waiting for master to send event
                  Primary_Host: 10.186.63.44
                  Primary_User: universe_op
                  Primary_Port: 3306
                Connect_Retry: 60
              Primary_Log_File: mysql-bin.000001
          Read_Primary_Log_Pos: 3510430
               Relay_Log_File: mysql-relay.000004
                Relay_Log_Pos: 12695
        Relay_Primary_Log_File: mysql-bin.000001
             Replica_IO_Running: Yes
            Replica_SQL_Running: No

And once more.

mysql> show replica status\G
*************************** 1. row ***************************
               Replica_IO_State: Waiting for master to send event
                  Primary_Host: 10.186.63.44
                  Primary_User: universe_op
                  Primary_Port: 3306
                Connect_Retry: 60
              Primary_Log_File: mysql-bin.000001
          Read_Primary_Log_Pos: 3560144
               Relay_Log_File: mysql-relay.000007
                Relay_Log_Pos: 4675
        Relay_Primary_Log_File: mysql-bin.000001
             Replica_IO_Running: Yes
            Replica_SQL_Running: No

At this point, we observed that the environment information matched the customer's environment, with the SQL thread stopped from writing and the IO thread normally fetching logs.

To simulate an SQL thread failure, we used hexedit to edit the relay log file.

hexedit mysql-relay.000007

We then started the SQL thread and checked the replication status.

mysql> START REPLICA SQL_THREAD;
Query OK, 0 rows affected (0.02 sec)

mysql> show replica status\G
*************************** 1. row ***************************
               Replica_IO_State: Waiting for master to send event
                  Primary_Host: 10.186.63.44
                  Primary_User: universe_op
                  Primary_Port: 3306
                Connect_Retry: 60
              Primary_Log_File: mysql-bin.000001
          Read_Primary_Log_Pos: 32050487
               Relay_Log_File: mysql-relay.000007
                Relay_Log_Pos: 243865
        Relay_Primary_Log_File: mysql-bin.000001
             Replica_IO_Running: Yes
            Replica_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the primary's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the replica's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the primary's or replica's MySQL code. If you want to check the primary's binary log or replica's relay log, you will be able to know their names by issuing 'SHOW REPLICA STATUS' on this replica.
                 Skip_Counter: 0

Checking the MySQL error information.

024-12-31T09:34:36.385535+08:00 1448 [ERROR] Error in Log_event::read_log_event(): 'Event crc check failed! Most likely there is event corruption.', data_len: 80, event_type: 31
2024-12-31T09:34:36.385593+08:00 1448 [ERROR] Error reading relay log event for channel '': replica SQL thread aborted because of I/O error
2024-12-31T09:34:36.385608+08:00 1448 [ERROR] Replica SQL for channel '': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the primary's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the replica's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the primary's or replica's MySQL code. If you want to check the primary's binary log or replica's relay log, you will be able to know their names by issuing 'SHOW REPLICA STATUS' on this replica. Error_code: 1594
2024-12-31T09:34:36.386764+08:00 1448 [ERROR] Error running query, replica SQL thread aborted. Fix the problem, and restart the replica SQL thread with "REPLICA START". We stopped at log 'mysql-bin.000001' position 3795113.

Through the above elimination method and comparing the error information from our local environment replication, it is evident that the cause of the 1594 error was unexpected damage to the relay file during transmission or writing, preventing the SQL thread from fetching the relay log.

2.5 Parsing the Relay Log

We then parsed the relay log to check for any error messages and summarize the findings.

Initially, parsing the relay log showed no obvious abnormalities.

# at 243865
#241231  1:12:35 server id 675216691  end_log_pos 3795178 CRC32 0x886c766d      GTID    last_committed=8108     sequence_number=8109    rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ee2fddba-c6c7-11ef-8e2a-02000aba3f2c:8109'/*!*/;
# at 243930
#241231  1:12:35 server id 675216691  end_log_pos 3795254 CRC32 0x31082f91      Query   thread_id=71    exec_time=0     error_code=0
SET TIMESTAMP=1735578755/*!*/;
BEGIN;44006
#241231  1:12:35 server id 675216691  end_log_pos 3795412 CRC32 0xd1baa47b      Rows_query
# /*ustats-10-186-63-44*/update universe.u_delay set real_timestamp=now(), logic_timestamp = logic_timestamp + 1 where source = 'ustats'
# at 244164
#241231  1:12:35 server id 675216691  end_log_pos 3795471 CRC32 0x9b859b01      Table_map: `universe`.`u_delay` mapped to number 211
# at 244223
#241231  1:12:35 server id 675216691  end_log_pos 3795551 CRC32 0x69696969      Update_rows: table id 211 flags: STMT_END_F
BINLOG '/T4ongAAANTpOQCAAIYvKnVzdGF0cy0xMC0xODYtNjMtNDQqL3VwZGF0ZSB1bml2ZXJz
ZS51X2RlbGF5IHNldCByZWFsX3RpbWVzdGFtcD1ub3coKSwgbG9naWNfdGltZXN0YW1wID0gbG9n
aWNfdGltZXN0YW1wICsgMSB3aGVyZSBzb3VyY2UgPSAndXN0YXRzJ3ukutE=
g9RyZxMz/T4oOwAAAA/qOQAAANMAAAAAAAEACHVuaXZlcnNlAAd1X2RlbGF5AAMPEggD+AIABgGb
hZs=Zx8z/T4oUAAAAF/qOQAAANMAAAAAAAEAAgAD///4BgB1c3RhdHOZtT4TIkQfAAAAAAAA+AYA
dXN0YXRzmbU+aWlpaWlpaWlpaWlpaWk=
'/*!*/;ATE `universe`.`u_delay`
### WHERE
###   @1='ustats' /* VARSTRING(760) meta=760 nullable=0 is_null=0 */
###   @2='2024-12-31 01:12:34' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @3=8004 /* LONGINT meta=0 nullable=1 is_null=0 */
### SET1='ustats' /* VARSTRING(760) meta=760 nullable=0 is_null=0 */
###   @2='2024-12-31 06:37:41' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @3=7595718147998050665 /* LONGINT meta=0 nullable=1 is_null=0 */
# at 244303
#241230 17:35:37 server id 675216691  end_log_pos 3795582 CRC32 0x48e0358c      Xid = 85886
COMMIT/*!*/;

Since the relay log could be parsed normally, we used MySQL's built-in mysqlbinlog to verify the CRC of the events before and after the failure time in the relay log, looking for any abnormal information.

# at 243865
#241231  1:12:35 server id 675216691  end_log_pos 3795178 CRC32 0x886c766d      GTID    last_committed=8108     sequence_number=8109    rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ee2fddba-c6c7-11ef-8e2a-02000aba3f2c:8109'/*!*/;
# at 243930
#241231  1:12:35 server id 675216691  end_log_pos 3795254 CRC32 0x31082f91      Query   thread_id=71    exec_time=0     error_code=0
SET TIMESTAMP=1735578755/*!*/;
BEGIN
/*!*/;
# at 244006
# at 244164
#241231  1:12:35 server id 675216691  end_log_pos 3795471 CRC32 0x9b859b01      Table_map: `universe`.`u_delay` mapped to number 211
ERROR: Error in Log_event::read_log_event(): 'Event crc check failed! Most likely there is event corruption.', data_len: 80, event_type: 31
ERROR: Could not read entry at offset 244223: Error in log format or read error.
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.

We found CRC error outputs, and the reason for the relay log read failure was that the last transaction was not fully written.

3. Summary

When handling MySQL failures, it is important to first preserve necessary information for subsequent investigation into the specific cause before proceeding to fix the failure and restore services. For the 1594 primary-replica failure, it is typically caused by one of the following, and we can investigate and fix the issue by examining the database error logs for potential causes:

Relay Log File Damage: Caused by disk failure or improper shutdown leading to file corruption.

Insufficient Disk Space: The replica cannot write new events to the relay log.

Loss of Primary Log: The primary has purged unsynchronized binary logs, causing the log position requested by the replica to become invalid.

Permission or Path Errors: The MySQL process does not have permission to access the relay log file, or there is a path configuration error.

You will get best features of ChatDBA