Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Auxiliary Column

Purpose

Populated for source

1

_sequence_num

Used to ensure that data is not duplicated or missed in replicator failure scenarios.

All

2

_is_deleted

Used for to soft deleting delete rows in BigQuery. This is mandatory for sources generating that generate unordered CDC streams to ensure data consistency, but opt-in optional otherwise.

Unordered sources or if softDeletes is enabled in Enable Soft Deletes is set to Yes in the target configuration

3

_row_id

Used for identifying rows (instead of primary key) to support the use case of updates to primary key.

If source plugin supports rowId

4

_source_timestamp

Used for ordering for sources generating that generate unordered CDC streams (version <= versions 6.8.0 and later)

Unordered sources

5

_sort

Used for reliable ordering for sources generating that generate unordered CDC streams (version >= versions 6.8.0 and later)

Unordered sources (versions >= 6.8.0 and later)

Notes

  • There is a known issue

    Jira Legacy
    serverSystem JIRA
    serverId39880e21-34be-3f5d-91a2-e11c549f905d
    keyCDAP-20210
    where the target plugin creates auxiliary columns even when they are not applicable.

  • Oracle (by Datastream) is the only source plugin which that generates unordered CDC stream streams today.

Credentials

If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to 'auto-detect'. Credentials will be automatically read from the cluster environment.

...

Property

Macro Enabled?

Version Introduced

Description

Project ID

No

Required. Project of the BigQuery dataset. When running on a Dataproc cluster, this can be left blank, which will use the project of the cluster.

Service Account Key

Yes

Required. The contents of the service account key (JSON) to use when interacting with GCS and BigQuery. When running on a Dataproc cluster, this can be left blank, which will use the service account of the cluster.

Dataset Name

No

Optional. Name of the dataset to be created in BigQuery. By default, the dataset name is same as the source database name. A valid name can contain letters, numbers, and underscores with a maximum length is 1024 characters. Any invalid character is replaced with an underscore in the final dataset name and any characters that exceed the length limit are truncated.

Encryption Key Name

No

Optional. The GCP customer managed encryption key (CMEK) used to encrypt data written to any bucket, dataset, or table created by the plugin. If the bucket, dataset, or table already exists, this is ignored. More information can be found here.

Location

No

Optional. The location where the BigQuery dataset and GCS staging bucket will get created. For example, ‘us-east1’ for regional bucket, ‘us’ for multi-regional bucket. A complete list of available locations can be found at https://cloud.google.com/bigquery/docs/locations. This value is ignored if an existing GCS bucket is specified, as the staging bucket and the BigQuery dataset will be created in the same location as that bucket.

Default is us.

Staging Bucket

No

Optional. GCS bucket to write change events to before loading them into staging tables. Changes are written to a directory that contains the replication job name and namespace. It is safe to use the same bucket across multiple replication jobs within the same instance. If it is shared by replication jobs across multiple instances, ensure that the namespace and name are unique, otherwise the behavior is undefined. The bucket must be in the same location as the BigQuery dataset. If not provided, new bucket will be created for each job named as 'df-rbq---'. Note that the user must explicitly delete the bucket once the job is deleted.

Load Interval (seconds)

No

Optional. Number of seconds to wait before loading a batch of data into BigQuery.

Staging Table Prefix

No

Optional. Changes are first written to a staging table before merged to the final table. Staging tables names are generated by prepending this prefix to the target table name.

Required Require Manual Drop Intervention

No

Optional. Whether to require manual administrative action to drop tables and datasets when a drop table or drop database event is encountered. When set to true, the replication job will not delete a table or dataset. Instead, it will fail and retry until the table or dataset does not exist. If the dataset or table does not already exist, no manual intervention is required. The event will be skipped as normal.

Enable Soft Deletes

No

6.7.0

Optional. Whether to enable soft deletes. If set to true, when the delete event is received by the target, ‘_is_deleted’ column for the record will be set to true. Otherwise, the record will be deleted from the BigQuery table. This configuration is no-op for the sources that generate events out of order, in which case records are always soft deleted from the BigQuery table.

Default is No.NEX

...