...
Property | Macro Enabled? | Version Introduced | Description |
---|---|---|---|
Use Connection | No | 6.7.0/0.20.0 | Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials. |
Connection | Yes | 6.7.0/0.20.0 | Optional. Name of the connection to use. Project and service account information will be provided by the connection. You can also use the macro function |
Project ID | Yes | Optional. Google Cloud Project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console. This is the project that the BigQuery job will run in. Default is auto-detect. | |
Dataset Project ID | Yes | Optional. Project the dataset belongs to. This is only required if the dataset is not in the same project that the BigQuery job will run in. If no value is given, it will default to the configured Project ID. | |
Service Account Type | Yes | 6.3.0 / 0.16.0 | Optional. Select one of the following options:
|
Service Account File Path | Yes | Required. Path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster. Default is auto-detect. | |
Service Account JSON | Yes | 6.3.0 / 0.16.0 | Optional. Content of the service account. |
Reference Name | No | Required. Name used to uniquely identify this sink for lineage, annotating metadata, etc. | |
Dataset | Yes | Optional. Dataset the table belongs to. A dataset is contained within a specific project. Datasets are top-level containers that are used to organize and control access to tables and views. | |
Table | Yes | Required. Table to write to. A table contains individual records organized in rows. Each record is composed of columns (also called fields). Every table is defined by a schema that describes the column names, data types, and other information. | |
Temporary Bucket Name | Yes | Optional. Google Cloud Storage bucket to store temporary data in. It will be automatically created if it does not exist, but will not be automatically deleted. Temporary data will be deleted after it is loaded into BigQuery. If it is not provided, a unique bucket will be created and then deleted after the run finishes. Syntax: gs://bucketname | |
GCS Upload Request Chunk Size | Yes | Optional. GCS upload request chunk size in bytes. Default value is 8388608 bytes. | |
Operation | Yes | Optional. Type of write operation to perform. This can be set to Insert, Update or Upsert.
Default is Insert. | |
Table Key | Yes | Optional. List of fields that determines relation between tables during Update and Upsert operations. | |
Dedupe By | Yes | Optional. Column names and sort order used to choose which input record to update/upsert when there are multiple input records with the same key. For example, if this is set to 'updated_time desc', then if there are multiple input records with the same key, the one with the largest value for 'updated_time' will be applied. | |
Partition Filter | Yes | Optional. Partition filter that can be used for partition elimination during Update or Upsert operations. Should only be used with Update or Upsert operations for tables where require partition filter is enabled. For example, if the table is partitioned the Partition Filter '_PARTITIONTIME > "2020-01-01" and _PARTITIONTIME < "2020-03-01"', the update operation will be performed only in the partitions meeting the criteria. | |
Truncate Table | Yes | Optional. Whether or not to truncate the table before writing to it. Note: If you set both Truncate Table and Update Table Schema to True, when you run the pipeline, only Truncate Table will be applied. Update Table Schema will be ignored. Default is False. | |
Update Table Schema | Yes | Optional. Whether the BigQuery table schema should be modified when it does not match the schema expected by the pipeline.
Compatible changes fall under the following categories:
Incompatible schema changes will result in pipeline failure. Note: If you set both Truncate Table and Update Table Schema to True, when you run the pipeline, only Truncate Table will be applied. Update Table Schema will be ignored. Default is False. | |
Location | Yes | Optional. The location where the BigQuery dataset will get created. This value is ignored if the dataset or temporary bucket already exist. Default is US. | |
Encryption Key Name | Yes | 6.5.1/0.18.1 | Optional. The GCP customer managed encryption key (CMEK) used to encrypt data written to any bucket, dataset, or table created by the plugin. If the bucket, dataset, or table already exists, this is ignored. More information can be found here. |
Create Partitioned Table | Yes | [DEPRECATED] Optional. Whether to create the BigQuery table with time partitioning. This value is ignored if the table already exists.
Default is False. | |
Partitioning Type | Yes | 6.2.3 / 0.15.3 | Optional. Specifies the partitioning type. Can either be Time, Integer, or None. Defaults to Time. This value is ignored if the table already exists.
|
Range Start | Yes | 6.2.3 / 0.15.3 | Optional. For integer partitioning, specifies the start of the range. Only used when table doesn’t exist already, and partitioning type is set to Integer.
|
Range End | Yes | 6.2.3 / 0.15.3 | Optional. For integer partitioning, specifies the end of the range. Only used when table doesn’t exist already, and partitioning type is set to Integer.
|
Range Interval | Yes | 6.2.3 / 0.15.3 | Optional. For integer partitioning, specifies the partition interval. Only used when table doesn’t exist already, and partitioning type is set to Integer.
|
Partition Field | Yes | Optional. Partitioning column for the BigQuery table. Leave blank if the BigQuery table is an ingestion-time partitioned table. | |
Require Partition Filter | Yes | Optional. Whether to create a table that requires a partition filter. This value is ignored if the table already exists.
Default is False. | |
Clustering Order | Yes | Optional. List of fields that determines the sort order of the data. Fields must be of type INT, LONG, STRING, DATE, TIMESTAMP, BOOLEAN or DECIMAL. Tables cannot be clustered on more than 4 fields. This value is only used when the BigQuery table is automatically created and ignored if the table already exists. | |
Output Schema | Yes | Required. Schema of the data to write. If a schema is provided, it must be compatible with the table schema in BigQuery. |
...
xxxx
is the Dataset Project ID
you specified in this plugin. The service account you specified in this plugin doesn’t have the permission to read the dataset you specified in this plugin. You must grant “BigQuery Data Editor” role on the project identified by the Dataset Project ID
you specified in this plugin to the service account. If you think you already granted the role, check if you granted the role on the wrong project (for example the one identified by the Project ID
).
3
BigQuery
0.19.0-SNAPSHOT