Google Cloud Speech-to-Text Transformation

Google Cloud Speech-to-Text Transformation

Plugin version: 0.22.0

The Google Cloud Speech-to-Text transformation converts audio files to text by using Google Cloud Speech-to-Text.

Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models.

Credentials

If the plugin is run on a Google Cloud Dataproc cluster, the service account key does not need to be provided and can be set to 'auto-detect'. Credentials will be automatically read from the cluster environment.

If the plugin is not run on a Dataproc cluster, the path to a service account key must be provided. The service account key can be found on the Dashboard in the Cloud Platform Console. Make sure the account key has permission to access Google Cloud Spanner. The service account key file needs to be available on every node in your cluster and must be readable by all users running the job.

Before You Begin

  1. Ensure that you have enabled the Speech-to-Text API.

  2. Ensure the source for the pipeline Contains the speech file. For example, if the speech file is stored in a Google Cloud Storage bucket, use a Google Cloud Storage source and set the Format to blob.

Configuration

Property

Macro Enabled?

Description

Property

Macro Enabled?

Description

Audio Field

Yes

Required. Name of the input field which contains the raw audio data in bytes.

Audio Encoding

Yes

Required. Audio encoding of the data sent in the audio message. All encodings support only 1 channel (mono) audio. Only 'FLAC' and 'WAV' include a header that describes the bytes of audio that follow the header. The other encodings are raw audio bytes with no header.

Default is LINEAR16.

Sampling Rate

Yes

Required. Sample rate in Hertz of the audio data sent in all 'RecognitionAudio' messages. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that's not possible, use the native sample rate of the audio source (instead of re-sampling).

Mask Profanity

Yes

Required. Whether to attempt filtering profanities, replacing all but the initial character in each filtered word with asterisks, e.g. "f***". If set to false, profanities won't be filtered out.

Default is false.

Language

Yes

Required. The language of the supplied audio as a BCP-47 language tag. Example: "en-US". See Language Support for a list of the currently supported language codes.

Default is en-us.

Transcript Parts Field

Yes

Optional. The field to store the transcription parts. It will be an array of records. Each record in the array represents one part of the full audio data and will contain the transcription and confidence for that part.

Transcription Text Field

Yes

Optional. The field to store the transcription of the full audio data. It is generated using the transcription for each part with the highest confidence.

Default is transcript.

Service Account Type

Yes

Optional. Select one of the following options:

  • File Path. File path where the service account is located.

  • JSON. JSON content of the service account.

Service Account File Path

Yes

Optional. Path on the local file system of the service account key used for authorization. Can be set to auto-detect when running on a Dataproc cluster. When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Service Account JSON

Yes

Optional. Content of the service account.

 

Created in 2020 by Google Inc.