Microsoft SQL Server Replication Source

Microsoft SQL Server CDC Source will replicate all of the row-level changes in the databases on Microsoft SQL Server.

For information about using this plugin in Replication jobs, see Microsoft SQL Server Best Practices.

Setting up Microsoft SQL Server

Enable CDC on Database

  • Make sure you have the required permissions by following this link.

  • You might need to contact your database administrator to obtain permissions to run the commands.

  • If the table that is being replicated contains columns of custom (user defined) data types, the table's owner must grant EXECUTE permissions for the custom data types to the database user who is specified in the Replication job.

  • Enable CDC for database 'MyDB' will look similar to this:

-- ==== -- Enable Database for CDC template -- ==== USE MyDB GO EXEC sys.sp_cdc_enable_db GO

Note that CDC cannot be enabled for master database.

Enable CDC on Table

Enable CDC for the table 'MyTable' will look similar like this:

-- ========= -- If there is no 'role_name', you can set it as NULL. -- ========= USE MyDB GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'MyTable', @role_name = N'MyRole', @supports_net_changes = 1 GO

For more information about enabling CDC for a SQL Server Table, see SQL Server Documentation.

Note: The User specified in the replication job must have MyRole set in SQL Server. If MyRole is replaced with NULL in the above command, any user can be specified in the replication job.

Verify Table CDC Accessibility

Run following query to make sure your table has CDC assess.

-- ========= -- Verify the user of the connector have access, this query should not have empty result -- ========= EXEC sys.sp_cdc_help_change_data_capture GO

Custom Data Types

If the table to be replicated contains columns of user defined type, the table owner must grant EXECUTE permissions on the custom data types to the database user specified in the replication job.

Setting up JDBC Driver

If it is not already installed, instructions for installing the Microsoft SQL Server JDBC driver can be found on the Hub.

Plugin Properties

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Host

No

Required. Hostname of the SQL Server to read from.

Port

No

Required. Port to use to connect to the SQL Server.

JDBC Plugin Name

No

Required. Identifier for the SQL Server JDBC driver, which is the name used while uploading the SQL Server JDBC driver.

Database Name

No

Required. Name of the database to replicate data from.

User

No

Required. Username to use to connect to the SQL Server.

Password

Yes

Required. Password to use to connect to the SQL Server.

Note: If you use a macro for the password, it must be in the Secure Store. If it’s not in the secure store, the Replication job fails. For more information, see Using Secure Keys.

Server Timezone

No

Optional. Time zone of the SQL Server. This is used when converting dates into timestamps.

Replicate Existing Data

No

Optional. Whether to replicate existing data from the source database. By default, pipeline will replicate the existing data from source tables. If set to false, any existing data in the source tables will be ignored and only changes happening after the pipeline started will be replicated.

Schema Mapping

For information about data type conversions from supported source
databases to the BigQuery destination, see https://cloud.google.com/data-fusion/docs/reference/replication-data-types#sql-server .

Schema Evolution

Due to the way CDC is implemented in SQL Server, schema evolution requires manual coordination and intervention to ensure the job continues to replicate data when the schema is updated.

Schema evolution process

  1. Stop replication job.

  2. Apply schema change to source table such as add a non required column. 

  3. Create a new capture table using sys.sp_cdc_enable_table procedure with a unique value for parameter @capture_instance.

  4. Apply the same schema change on the BigQuery target table. 

  5. Resume the replication job stopped in step 1.

  6. When the job starts streaming new changes which occur after step #3, it is possible to drop the old capture table using sys.sp_cdc_disable_table stored procedure with parameter @capture_instance set to the old capture instance name

There is a period between step 2 when the database schema is changed and step 3 when a new capture instance is created, during which all changes are captured by the old capture instance with the old schema. This means that in case a new column is added, any change event produced during this time will not contain data for the new column. If you can not tolerate such transition, follow the procedure outlined below:

  1. Suspend all applications generating database records.

  2. Make sure replication pipeline has processed all existing events

  3. Stop the replication pipeline.

  4. Apply schema change such as add a non required column. 

  5. Create a new capture table using sys.sp_cdc_enable_table procedure with a unique value for parameter @capture_instance.

  6. Resume applications suspended in step 1.

  7. Apply the same schema change on the BigQuery target table. 

  8. Resume the replication pipeline stopped in step 3.

  9. When the job starts streaming new changes which occur after step #5, it is possible to drop the old capture table using sys.sp_cdc_disable_table stored procedure with parameter @capture_instance set to the old capture instance name

Note - When CDC is enabled on a table, SQL Server blocks some DDL operations such as renaming columns, changing its datatype etc. For more information, see Handling changes to source tables.

Below table lists the DDL operations supported with the schema evolution procedure described above

DDL Operation

Supported?

Create table

No

Rename table

No

Truncate table

No

Drop table

No

Add nullable column

Yes

Add required column

No

Alter column to make it nullable

Yes

Alter column to make it required

No

Alter column type 

No

Rename column

No

Drop column

Yes

Created in 2020 by Google Inc.