MySQL Replication Source
The MySQL Replication Source will replicate all of the row-level changes in the databases on MySQL server.fefe
Setting up MySQL
Enable the binlog
This is done in the MySQL server configuration file and will look similar to this:
server-id = 1
log_bin = mysql-bin
binlog_format = row
binlog_row_image = full
expire_logs_days = 10
Enable GTIDs (optional)
The MySQL server can be configured to use GTID-based replication. Using GTIDs greatly simplifies replication and makes it possible to easily confirm whether masters and slaves are consistent. Note that if you’re using an earlier version before MySQL 5.6.5, you will not be able to enable GTIDs.
gtid_mode = on
enforce_gtid_consistency = on
Set up Session Timeouts (optional)
When initial snapshots of very large databases are executed, it is possible that an established connection will timeout while reading the content of the database tables. We can increase following configs to deal with that:
interactive_timeout = <duration-in-seconds>
wait_timeout = <duration-in-seconds>
Create a MySQL User
A MySQL user must be defined with all the following permissions on any database which wants to be replicated: SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE and REPLICATION CLIENT.
If using a hosted option such as Amazon RDS or Amazon Aurora that do not allow a global read lock, table-level locks are used to create the consistent snapshot. In this case, you need to also grant LOCK_TABLES permissions to the user that you create.
Setting up the JDBC Driver
If it is not already installed, instructions for installing the MySQL JDBC driver can be found on the Hub. The MySQL JDBC driver installed should be version 8 or above.
Plugin Properties
Property | Macro Enabled? | Description |
---|---|---|
Host | No | Required. Hostname of the MySQL server to read from. |
Port | No | Required. Port to use to connect to the MySQL server. |
JDBC Plugin Name | No | Required. Identifier for the MySQL JDBC driver, which is the name used while uploading the MySQL JDBC driver. |
Database Name | No | Required. Name of the database to replicate data from. |
User | No | Required. Username to use to connect to the MySQL server. Actual account used by the source while connecting to the MySQL server will be of the form 'user_name'@'%' where user_name is this field. |
Password | Yes | Required. Password to use to connect to the MySQL server. Note: If you use a macro for the password, it must be in the Secure Store. If it’s not in the secure store, the Replication job fails. For more information, see Using Secure Keys. |
Consumer ID | No | Optional. Unique numeric ID to identify this origin as an event consumer. This number cannot be the same as another replication job that is reading from the server, and it cannot be the same as the server-id for any MySQL slave that is replicating from the server. By default, random number will be used. |
Server Timezone | No | Optional. Time zone of the MySQL server. This is used when converting dates into timestamps. |
Replicate Existing Data | No | Optional. Whether to replicate existing data from the source database. By default, pipeline will replicate the existing data from source tables. If set to false, any existing data in the source tables will be ignored and only changes happening after the pipeline started will be replicated. |
Schema Mapping
For information about data type conversions from supported source databases to the BigQuery destination, see https://cloud.google.com/data-fusion/docs/reference/replication-data-types#mysql .
Schema Evolution
This section lists the data definition language (DDL) operations supported during MySQL replication.
DDL Operation | Supported? |
Create table | Yes (New table is picked up dynamically when no tables are selected in the pipeline config)Â |
Rename table | No |
Truncate table | Yes |
Drop table | Yes |
Add nullable column | Yes |
Add required column | No |
Alter column to make it nullable | Yes |
Alter column to make it required | No |
Alter column type | No |
Rename column | No |
Drop column | No |
Troubleshooting
If the replication job is able to start snapshotting the data, but fails when it switches over to read from the binlog with errors in the log like:
This is most likely caused by the replication user using an incompatible password type. This is especially common starting from MySQL 8 on up. To fix this, run the following command:
to change the user to use a MySQL native password.
Created in 2020 by Google Inc.