Google Cloud Bigtable Batch Source

Plugin version: 0.22.0

Reads data from Google Cloud Bigtable. Cloud Bigtable is Google's NoSQL Big Data database service. It's the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail.

Credentials

If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to 'auto-detect'. Credentials will be automatically read from the cluster environment.

If the plugin is not run on a Dataproc cluster, the path to a service account key must be provided. The service account key can be found on the Dashboard in the Cloud Platform Console. Make sure the account key has permission to access Bigtable. The service account key file needs to be available on every node in your cluster and must be readable by all users running the job.

Configuration

Properties

Macro Enabled?

Version Introduced

Description

Properties

Macro Enabled?

Version Introduced

Description

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Project ID

Yes

 

Optional. Google Cloud Project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console.

Default is auto-detect.

Instance ID

Yes

 

Required. Google Cloud Bigtable instance ID.

Table

Yes

 

Required. Database table name.

Key Alias

Yes

 

Optional. Name of the field for row key.

Column Mappings

Yes

 

Required. Mappings from Bigtable column name to record field. Column names must be formatted as 'family:qualifier'.

Bigtable Options

Yes

 

Optional. Additional connection properties for Bigtable. Full list of allowed properties: https://cloud.google.com/bigtable/docs/hbase-client/javadoc/constant-values.

Scan Start Row

Yes

 

Optional. Scan start row.

Scan Stop Row

Yes

 

Optional. Scan stop row.

Scan Time Range Start

Yes

 

Optional. Starting timestamp used to filter columns. Inclusive.

Scan Time Range Stop

Yes

 

Optional. Ending timestamp used to filter columns. Exclusive.

Service Account Type

Yes

6.3.0/0.16.0

Optional. Select one of the following options:

  • File Path. File path where the service account is located.

  • JSON. JSON content of the service account.

Service Account File Path

Yes

 

Optional. Path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect

Service Account JSON

Yes

6.3.0/0.16.0

Optional. Content of the service account.

On Record Error

 

 

Required. Strategy used to handle errors during transformation of a text entry to record. Possible values are:

  • Skip error - Ignores erroneous records.

  • Fail Pipeline - Fails pipeline due to erroneous record.

Default is Skip error.

Output Schema

Yes

 

Required. Specifies the schema that has to be output. Only columns defined in schema will be included into output record.

 

Created in 2020 by Google Inc.