Google Cloud BigQuery Argument Setter Action

The Google Cloud BigQuery Argument Setter action plugin was introduced in CDAP 6.2.3.

Plugin version: 0.22.0

Performs an BigQueryTable Query request to fetch arguments to set in the pipeline.

This is most commonly used when the structure of a pipeline is static, and its configuration needs to be managed outside the pipeline itself.

Arguments name must match column name in BigQueryTable.

Credentials

If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to ‘auto-detect’. Credentials will be automatically read from the cluster environment.

If the plugin is not run on a Dataproc cluster, the path to a service account key must be provided. The service account key can be found on the Dashboard in the Cloud Platform Console. Make sure the account key has permission to access BigQuery and Google Cloud Storage. The service account key file needs to be available on every node in your cluster and must be readable by all users running the job.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Project ID

Yes

 

Optional. Google Cloud Project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console. This is the project that the BigQuery job will run in. If a temporary bucket needs to be created, the service account must have permission in this project to create buckets.

Default is auto-detect.

Dataset Project ID

Yes

6.7.0/0.20.0

Project the dataset belongs to. This is only required if the dataset is not in the same project that the BigQuery job will run in. If no value is given, it will default to the configured Project ID. BigQuery Data Viewer role on this project must be granted to the specified service account to read BigQuery data from this project.

Dataset Name

Yes

 

Required. Dataset the table belongs to. A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to tables and views.

Table

Yes

 

Required. Table to read from. A table contains individual records organized in rows. Each record is composed of columns (also called fields). Every table is defined by a schema that describes the column names, data types, and other information.

Argument Selection Conditions

Yes

 

Required. A set of conditions for identifying the arguments to run a pipeline. A particular use case for this would be feed=marketing; date=20200427.

Arguments Columns

Yes

 

Required. Name of the columns, separated by comma that contains the arguments for this run.

Service Account Type

Yes

6.3.0 / 0.16.0

Optional. Select one of the following options:

  • File Path. File path where the service account is located.

  • JSON. JSON content of the service account.

Service Account File Path

Yes

 

Required. Path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Service Account JSON

Yes

6.3.0 / 0.16.0

Optional. Content of the service account.

Related Topics

Reusable Pipelines Best Practices

Created in 2020 by Google Inc.