Microsoft SQL Server Replication Source
Microsoft SQL Server CDC Source will replicate all of the row-level changes in the databases on Microsoft SQL Server.
For information about using this plugin in Replication jobs, see Microsoft SQL Server Best Practices.
Setting up Microsoft SQL Server
Enable CDC on Database
Make sure you have the required permissions by following this link.
You might need to contact your database administrator to obtain permissions to run the commands.
If the table that is being replicated contains columns of custom (user defined) data types, the table's owner must grant
EXECUTE
permissions for the custom data types to the database user who is specified in the Replication job.Enable CDC for database 'MyDB' will look similar to this:
-- ====
-- Enable Database for CDC template
-- ====
USE MyDB
GO
EXEC sys.sp_cdc_enable_db
GO
Note that CDC cannot be enabled for master database.
Enable CDC on Table
Enable CDC for the table 'MyTable' will look similar like this:
-- =========
-- If there is no 'role_name', you can set it as NULL.
-- =========
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name = N'MyRole',
@supports_net_changes = 1
GO
For more information about enabling CDC for a SQL Server Table, see SQL Server Documentation.
Note: The User specified in the replication job must have MyRole set in SQL Server. If MyRole
is replaced with NULL
in the above command, any user can be specified in the replication job.
Verify Table CDC Accessibility
Run following query to make sure your table has CDC assess.
-- =========
-- Verify the user of the connector have access, this query should not have empty result
-- =========
EXEC sys.sp_cdc_help_change_data_capture
GO
Custom Data Types
If the table to be replicated contains columns of user defined type, the table owner must grant EXECUTE permissions on the custom data types to the database user specified in the replication job.
Setting up JDBC Driver
If it is not already installed, instructions for installing the Microsoft SQL Server JDBC driver can be found on the Hub.
Plugin Properties
Property | Macro Enabled? | Description |
---|---|---|
Host | No | Required. Hostname of the SQL Server to read from. |
Port | No | Required. Port to use to connect to the SQL Server. |
JDBC Plugin Name | No | Required. Identifier for the SQL Server JDBC driver, which is the name used while uploading the SQL Server JDBC driver. |
Database Name | No | Required. Name of the database to replicate data from. |
User | No | Required. Username to use to connect to the SQL Server. |
Password | Yes | Required. Password to use to connect to the SQL Server. Note: If you use a macro for the password, it must be in the Secure Store. If it’s not in the secure store, the Replication job fails. For more information, see Using Secure Keys. |
Server Timezone | No | Optional. Time zone of the SQL Server. This is used when converting dates into timestamps. |
Replicate Existing Data | No | Optional. Whether to replicate existing data from the source database. By default, pipeline will replicate the existing data from source tables. If set to false, any existing data in the source tables will be ignored and only changes happening after the pipeline started will be replicated. |
Schema Mapping
For information about data type conversions from supported source
databases to the BigQuery destination, see Replication data types | Cloud Data Fusion Documentation | Google Cloud .
Schema Evolution
Due to the way CDC is implemented in SQL Server, schema evolution requires manual coordination and intervention to ensure the job continues to replicate data when the schema is updated.
Schema evolution process
Stop replication job.
Apply schema change to source table such as add a non required column.
Create a new capture table using
sys.sp_cdc_enable_table
procedure with a unique value for parameter@capture_instance
.Apply the same schema change on the BigQuery target table.
Resume the replication job stopped in step 1.
When the job starts streaming new changes which occur after step #3, it is possible to drop the old capture table using
sys.sp_cdc_disable_table
stored procedure with parameter@capture_instance
set to the old capture instance name
There is a period between step 2 when the database schema is changed and step 3 when a new capture instance is created, during which all changes are captured by the old capture instance with the old schema. This means that in case a new column is added, any change event produced during this time will not contain data for the new column. If you can not tolerate such transition, follow the procedure outlined below:
Suspend all applications generating database records.
Make sure replication pipeline has processed all existing events
Stop the replication pipeline.
Apply schema change such as add a non required column.
Create a new capture table using
sys.sp_cdc_enable_table
procedure with a unique value for parameter@capture_instance
.Resume applications suspended in step 1.
Apply the same schema change on the BigQuery target table.
Resume the replication pipeline stopped in step 3.
When the job starts streaming new changes which occur after step #5, it is possible to drop the old capture table using
sys.sp_cdc_disable_table
stored procedure with parameter@capture_instance
set to the old capture instance name
Note - When CDC is enabled on a table, SQL Server blocks some DDL operations such as renaming columns, changing its datatype etc. For more information, see Handling changes to source tables.
Below table lists the DDL operations supported with the schema evolution procedure described above
DDL Operation | Supported? |
Create table | No |
Rename table | No |
Truncate table | No |
Drop table | No |
Add nullable column | Yes |
Add required column | No |
Alter column to make it nullable | Yes |
Alter column to make it required | No |
Alter column type | No |
Rename column | No |
Drop column | Yes |
Created in 2020 by Google Inc.