Introduction
Google drive plugins will help users move entire files from source to destination. Along the way, users can potentially run transformations on unstructured data such as images, audio and video as well.
User Storie(s)
- As a pipeline developer, I want to move all files from a Google drive directory to a different destination
- As a pipeline developer, I want to move all files from a Google drive directory that satisfy a filter to a different destination
- As a pipeline developer, I want to pull all images from a Google drive directory, so that I can process them using image recognition APIs
- As a pipeline developer, I want to pull all audio and video files from a Google drive directory, so that I can process them to extract metadata and/or generate transcripts, or apply other enrichments.
- As a pipeline developer, I want to move all files from an FTP source into Google drive.
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
Source
Option level | User Facing Name | Type | Description | Optional | Constraints | Default value |
---|
Basic | Directory identifier | String | Identifier of the source folder. | no | Filter | String | A filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter Syntax | Yes | Modification date range | Select | In addition to the filter specified above, also filter files to only pull those that were modified between the date range | Yes | select | Start Date | textbox | Only shown when the "Modification date range" is set to "Custom" value. Accepts start date for modification date range. RFC3339 format, default timezone is UTC, e.g., 2012-06-04T12:00:00-08:00. | No | End date | textbox | Only shown when the "Modification date range" is set to "Custom" value. Accepts end date for modification date range.RFC3339 format, default timezone is UTC, e.g., 2012-06-04T12:00:00-08:00. | No | File properties | Multi-select | Properties which should be get for each file in directory. Allowed names can be get from Google Drive API: Files | Yes | File types to pull | Multi-select | Types of files should be pulled from specified directory. | Yes | binary | Authentication | Client ID | String | OAuth2 client id. | No | Client secret | String | OAuth2 client secret. | No | Refresh token | String | OAuth2 refresh token. | No | Access token | String | OAuth2 access token. | No | Advanced | Maximum partition size
| Number | Maximum partition size specified in bytes. Default 0 value means unlimited. | Yes | 0 | Body output format | Radio-group | Format of body of file. "Bytes" and "String" values are available. | Yes | bytes | Exporting | Google Documents export format | Select | MIME type for Google Documents. Allowed values from Downloading Google Documents. | Yes | text/plain | Google Spreadsheets export format | Select
| MIME type for Google Spreadsheets. | Yes | text/csv | Google Drawings export format | Select
| MIME type for Google Drawings. | Yes | image/svg+xml | Google Presentations export format | Select
| MIME type for Google Presentations. | Yes | text/plain | Sink
Option level | User Facing Name | Type | Description | Optional | Constraints |
BasicFile name field | String | Name of the schema field (should be STRING type) which will be used as name of file. Is optional. In the case it is not set files have randomly generated 16-symbols names.
| Yes | File body field | String | Name of the schema field (should be BYTES type) which will be used as body of file. The minimal input schema should contain only this field.
| No | Directory identifier | String | Identifier of the destination folder. | No | AuthenticationClient ID | String | OAuth2 client id. | No | Client secret | String | OAuth2 client secret. | No | Refresh token | String | OAuth2 refresh token. | No | Access token | String | OAuth2 access token. | No