Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Writes spreadsheets to specified Google Drive directory via Google Sheets API.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Reference Name

No

Required. Name used to uniquely identify this sink for lineage, annotating metadata, etc.

Directory Identifier

No

Required. Identifier of the destination folder.

This comes after folders/ in the URL. For example, if the URL is

Code Block
https://drive.google.com/drive/folders/1dyUEebJaFnWa3Z4n0BFMVAXQ7mfUH11g?resourcekey=0-XVijrJSp3E3gkdJp20MpCQ

Then the Directory Identifier would be 1dyUEebJaFnWa3Z4n0BFMVAXQ7mfUH11g.

Spreadsheet Name Field

Yes

Optional. Name of the schema field (should be STRING type) which will be used as name of file. Is optional. In the case it is not set Google API will use the value of Default Spreadsheet name property.

Default Spreadsheet Name

Yes

Required. Default spreadsheet file name. Is used if user doesn’t specify schema field with spreadsheet name.

Default is Spreadsheet 1.

Sheet Name Field

Yes

Optional. Name of the schema field (should be STRING type) which will be used as sheet title. Is optional. In the case it is not set Google API will use the value of Default sheet name property.

Default is Sheet 1

Default Sheet Name

Yes

Required. Default sheet title. Is used when user doesn’t specify schema field with sheet title.

Write Schema As First Row

Yes

Required. Toggle that defines if the sink writes out the input schema as first row of an output sheet.

Default is Yes.

Authentication Type

No

Required. Type of authentication used to access Google API.

OAuth2 and Service Account types are available.

Make sure that:

  • Google Drive API and Google Sheets API is enabled in the GCP Project.

  • Google Drive Folder is shared to the service account email used with the required permission.

OAuth2 client credentials can be generated on Google Cloud Credentials Page.

Default is OAuth2.

Client ID

No

Optional. OAuth2 Client ID used to identify the application.

Client Secret

No

Optional. OAuth2 Client Secret used to access the authorization server.

Refresh Token

No

Optional. OAuth2 Refresh Token to acquire new access tokens.

Service Account Type

Yes

6.3.0/1.3.0

Optional. Select one of the following options:

  • File Path. File path where the service account is located.

  • JSON. JSON content of the service account.

Make sure that the Google Drive Folder is shared with the specified service account email. To write files to a Private Google Drive, grant the Editor role to the specified service account. To write files to a Shared Google Drive Folder, grant the Contributor role to the specified service account.

Service Account File Path

Yes

6.3.0/1.3.0

Optional.

Path on the local file system of the service account key used for authorization.

Can be set to 'auto-detect' when running on a Dataproc cluster

. When running on other clusters, the file must be present on every node in the cluster.When set to auto-detect, the GCE VM

which needs to be

configured with the scope required by the Google Drive API. Otherwise, the preview as well as pipeline run will fail with insufficient permission error. The required scopes are

created with the following scopes:

and

When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Service Account JSON

Yes

1.4.0

Optional. Contents of the service account JSON file. Service Account JSON can be generated on Google Cloud Service Account page.

Threads Number

Yes

Required. Number of threads which send batched API requests. The greater value allows to process records quickly, but requires extended Google Sheets API quota.

Default is 5.

Maximal Buffer Size

Yes

Required. Maximal size in records of the batch API request. The greater value allows to reduce the number of API requests, but causes increase of their size.

Default is 50.

Records Queue Length

Yes

Required. Size of the queue used to receive records and for onwards grouping of them to batched API requests. With the greater value it is more likely that the sink will group received records in the batches of maximal size. Also greater value leads to more memory consumption.

Default is 500.

Maximum Flush Interval

Yes

Required. Number of seconds between the sink tries to get batched requests from the records queue and send them to threads for sending to Sheets API.

Default is 10.

Flush Execution Timeout

Yes

Required. Timeout for single thread to process the batched API request. Be careful, the number of retries and maximal retry time also should be taken into account.

Default is 500.

Minimal Page Extension Size

Yes

Required. Minimal size of sheet extension when default sheet size is exceeded.

Default is 1000.

Merge Data Cells

Yes

Required. Toggle that defines if the sink merges data cells created as result of input arrays flattering.

Default is No.

Skip spreadsheet/sheet name fields

Yes

Required. Toggle that defines if the sink skips spreadsheet/sheet name fields during structure record transforming.

Default is No.

Steps to Generate OAuth2 Credentials

...