Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Property

Macro Enabled?

Version Introduced

Description

Reference Name

No

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Directory Identifier

No

Required. Identifier of the source folder.

This comes after folders/ in the URL. For example, if the URL is

Code Block
https://drive.google.com/drive/folders/1dyUEebJaFnWa3Z4n0BFMVAXQ7mfUH11g?resourcekey=0-XVijrJSp3E3gkdJp20MpCQ

Then the Directory Identifier would be 1dyUEebJaFnWa3Z4n0BFMVAXQ7mfUH11g.

File Metadata Properties

Yes

Optional. Properties that represent metadata of files. They will be a part of output structured record. Descriptions for properties can be view at Drive API file reference.

Filter

No

Optional. Filter that can be applied to the files in the selected directory. Filters follow the Google Drive filters syntax.

Modification Date Range

No

Required. Filter that narrows set of files by modified date range. User can select either among predefined or custom entered ranges. For Custom selection the dates range can be specified via Start Date and End Date.

Default is lifetime.

File Types To Pull

Yes

Required. Types of files which should be pulled from a specified directory. The following values are supported: binary (all non-Google Drive formats), Google Documents, Google Spreadsheets, Google Drawings, Google Presentations and Google Apps Scripts. For Google Drive formats user should specify exporting format in Exporting section.

Default is Binary.

Authentication Type

No

Required. Type of authentication used to access Google API.

OAuth2 and Service Account types are available.

Make sure that:

  • Google Drive API is enabled in the GCP Project.

  • Google Drive Folder is shared to the service account email used with the required permission.

OAuth2 client credentials can be generated on Google Cloud Credentials Page.

For more details on OAuth2, see Google Drive API Documentation.

Default is OAuth2.

Client ID

No

Optional. OAuth2 Client ID used to identify the application.

Client Secret

No

Optional. OAuth2 Client Secret used to access the authorization server.

Refresh Token

No

Optional. OAuth2 Refresh Token to acquire new access tokens.

Service Account Type

Yes

Optional. Make sure that the Google Drive Folder is shared with the specified service account email. Viewer role must be granted to the specified service account to read files from the Google Drive Folder.

Service Account File Path

Yes

Optional. Path on the local file system of the service account key used for authorization.

Can be set to 'auto-detect' when running on a Dataproc cluster which needs to be created with the following scopes:

When running on other clusters, the file must be present on every node in the cluster.

Default is auto-detect.

Service Account JSON

Yes

Optional. Contents of the service account JSON file. Service Account JSON can be generated on Google Cloud Service Account page.

Maximum Partition Size

Yes

Required. Maximum body size for each structured record specified in bytes. Default 0 value means unlimited. Is not applicable for files in Google formats.

Default is 0.

Body Output Format

Yes

Required. Output format for body of file. “Bytes” and “String” values are available.

Default is Bytes.

Google Documents Export Format

Yes

Required. MIME type which is used for Google Documents when converted to structured records.

Default is text/plain.

Google Spreadsheets Export Format

Yes

Required. MIME type which is used for Google Spreadsheets when converted to structured records.

Default is text/csv.

Google Drawings Export Format

Yes

Required. MIME type which is used for Google Drawings when converted to structured records.

Default is image/svg+xml.

Google Presentations Export Format

Yes

Required. MIME type which is used for Google Presentations when converted to structured records.

Default is text/plain.

...