Google Cloud PubSub Sink

Plugin version: 0.22.0

This sink writes to a Google Cloud Pub/Sub topic. Cloud Pub/Sub brings the scalability, flexibility, and reliability of enterprise message-oriented middleware to the cloud. By providing many-to-many, asynchronous messaging that decouples senders and receivers, it allows for secure and highly available communication between independently written applications.

Credentials

If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to 'auto-detect'. Credentials will be automatically read from the cluster environment.

If the plugin is not run on a Dataproc cluster, the path to a service account key must be provided. The service account key can be found on the Dashboard in the Cloud Platform Console. Make sure the account key has permission to access Google Cloud Pub/Sub. The service account key file needs to be available on every node in your cluster and must be readable by all users running the job.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Reference Name

No

 

Required. Name used to uniquely identify this sink for lineage, annotating metadata, etc.

Project ID

Yes

 

Optional. Google Cloud Project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console.

Default is auto-detect.

PubSub topic

Yes

 

Required. Name of the Google Cloud PubSub topic to publish to.

Format

Yes

6.4.0 / 0.17.0

Optional. Format of the data to write. Supported formats are avro, blob, tsv, csv, delimited, json, parquet, and text.

Default is text.

Delimiter

Yes

6.4.0 / 0.17.0

Optional. Delimiter to use if the format is 'delimited'. The delimiter will be ignored if the format is anything other than 'delimited'.

Encryption Key Name

Yes

6.5.1/0.18.1

Optional. Used to encrypt data written to any topic created by the plugin. If the topic already exists, this is ignored. More information can be found here.

Service Account Type

Yes

6.3.0/0.16.0

Optional. Select one of the following options:

  • File Path. File path where the service account is located.

  • JSON. JSON content of the service account.

Service Account File Path

Yes

 

Optional. Path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Service Account JSON

Yes

6.3.0/0.16.0

Optional. Content of the service account.

Maximum Batch Count

Yes

 

Optional. Maximum number of messages to publish in a single batch. Messages are published in batches to improve throughput. The default value is 100.

Default is 100.

Maximum Batch Size (KB)

Yes

 

Optional. Maximum combined size of messages in kilobytes to publish in a single batch.

Default is 1 KB.

Publish Delay Threshold (ms)

Yes

 

Optional. Maximum amount of time in milliseconds to wait before publishing a batch of messages.

Default is 1 millisecond.

Retry Timeout (seconds)

Yes

 

Optional. Maximum amount of time in seconds to retry publishing failures.

Default is 30 seconds.

Error Threshold

Yes

 

Optional. Maximum number of messages that failed to publish per partition before the pipeline will be failed.

Default is 0.

 

Created in 2020 by Google Inc.