Google Cloud Pub/Sub Streaming Source

Plugin version: 0.22.0

This source reads from a Google Cloud Pub/Sub subscription in realtime.

Credentials

If the plugin is run on a Dataproc cluster, the service account key doesn’t need to be provided and can be set to 'auto-detect'. Credentials are automatically read from the cluster environment.

If the plugin isn't run on a Dataproc cluster, the path to a service account key must be provided. To find the name of the key go to the Google Cloud console dashboard. The account key must have permission to access Pub/Sub. The service account key file must be available on every node in your cluster and must be readable by all users running the job.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Project ID

Yes

 

Optional. Google Cloud project ID, which uniquely identifies a project. It can be found on the Dashboard in the Google Cloud Platform Console.

Default is auto-detect.

Subscription

Yes

 

Required. Name of the Pub/Sub subscription to subscribe. If the subscription needs to be created then the topic to which the subscription will belong must be provided. Naming Convention for Subscription:

  • Not begin with the string goog

  • Start with a letter

  • Contain between 3 and 255 characters

  • Contain only the following characters:

    • Letters: [A-Za-z]

    • Numbers: [0-9]

    • Dashes: -

    • Underscores: _

    • Periods: .

    • Tildes: ~

    • Plus signs: +

    • Percent signs: %

      The special characters in the above list can be used in resource names without URL-encoding. However, you must ensure that any other special characters are properly encoded/decoded when used in URLs. For example, mi-tópico is an invalid subscription-name. However, mi-t%C3%B3pico is valid.

Topic

Yes

 

Optional. Name of the Pub/Sub topic to subscribe to. If a topic is provided and the given subscription doesn’t exist, the subscription gets created. Only the messages that arrive after the subscription is created are received.

Format

Yes

6.4.0 / 0.17.0

Optional. Format of the data to read. Supported formats are avro, blob, tsv, csv, delimited, json, parquet, and text.

Default is text.

Service Account Type

Yes

6.3.0 / 0.16.0

Optional. Select one of the following options:

  • File Path. File path where the service account is located.

  • JSON. JSON content of the service account.

Service Account File Path

Yes

 

Optional. File path on the local file system of the service account key used for authorization. Can be set to 'auto-detect' when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Service Account JSON

Yes

6.3.0 / 0.16.0

Optional. Content of the service account.

Number of Readers

Yes

6.4.0 / 0.17.0

Optional. Number of Pub/Sub reader workers to run in parallel for this source. Each reader requires a worker in the cluster. The default number of readers per Pub/Sub Streaming Source is 1.

 

 

Created in 2020 by Google Inc.