Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The final target tables will include all the original columns from the source table plus one additional few auxiliary columns detailed below:

Auxiliary Column

Purpose

Populated for source

1

_sequence_num

...

Used to ensure that data is not duplicated or missed in replicator failure scenarios.

All

2

_is_deleted

Used for soft deleting rows in BigQuery. This is mandatory for sources generating unordered CDC streams to ensure data consistency, but opt-in otherwise.

Unordered sources or if softDeletes is enabled in target configuration

3

_row_id

Used for identifying rows (instead of primary key) to support use case of updates to primary key.

If source plugin supports rowId

4

_source_timestamp

Used for ordering for sources generating unordered CDC streams (version <= 6.8)

Unordered sources

5

_sort

Used for reliable ordering for sources generating unordered CDC streams (version >= 6.8)

Unordered sources (versions >= 6.8)

Notes

  • There is a known issue

    Jira Legacy
    serverSystem JIRA
    serverId39880e21-34be-3f5d-91a2-e11c549f905d
    keyCDAP-20210
    where the target plugin creates auxiliary columns even when they are not applicable.

  • Oracle (by Datastream) is the only source plugin which generates unordered CDC stream today.

Credentials

If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to 'auto-detect'. Credentials will be automatically read from the cluster environment.

...