...
The final target tables will include all the original columns from the source table plus one additional few auxiliary columns detailed below:
Auxiliary Column | Purpose | Populated for source | |
---|---|---|---|
1 | _sequence_num |
...
Used to ensure that data is not duplicated or missed in replicator failure scenarios. | All | ||
2 | _is_deleted | Used for soft deleting rows in BigQuery. This is mandatory for sources generating unordered CDC streams to ensure data consistency, but opt-in otherwise. | Unordered sources or if softDeletes is enabled in target configuration |
3 | _row_id | Used for identifying rows (instead of primary key) to support use case of updates to primary key. | If source plugin supports rowId |
4 | _source_timestamp | Used for ordering for sources generating unordered CDC streams (version <= 6.8) | Unordered sources |
5 | _sort | Used for reliable ordering for sources generating unordered CDC streams (version >= 6.8) | Unordered sources (versions >= 6.8) |
Notes
There is a known issue
where the target plugin creates auxiliary columns even when they are not applicable.Jira Legacy server System JIRA serverId 39880e21-34be-3f5d-91a2-e11c549f905d key CDAP-20210 Oracle (by Datastream) is the only source plugin which generates unordered CDC stream today.
Credentials
If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to 'auto-detect'. Credentials will be automatically read from the cluster environment.
...