Google drive plugins will help users move entire files from source to destination. Along the way, users can potentially run transformations on unstructured data such as images, audio and video as well.
User Storie(s)
As a pipeline developer, I want to move all files from a Google drive directory to a different destination
As a pipeline developer, I want to move all files from a Google drive directory that satisfy a filter to a different destination
As a pipeline developer, I want to pull all images from a Google drive directory, so that I can process them using image recognition APIs
As a pipeline developer, I want to pull all audio and video files from a Google drive directory, so that I can process them to extract metadata and/or generate transcripts, or apply other enrichments.
As a pipeline developer, I want to move all files from an FTP source into Google drive.
Plugin Type
Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
Source
Option level
User Facing Name
Type
Description
Optional
Constraints
Default value
Basic
Directory identifier
String
Identifier of the source folder.
no
Filter
String
A filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter Syntax
Yes
Modification date range
Select
In addition to the filter specified above, also filter files to only pull those that were modified between the date range
Yes
select
Start Date
textbox
Only shown when the "Modification date range" is set to "Custom" value. Accepts start date for modification date range. RFC3339 format, default timezone is UTC, e.g., 2012-06-04T12:00:00-08:00.
No
End date
textbox
Only shown when the "Modification date range" is set to "Custom" value. Accepts end date for modification date range.RFC3339 format, default timezone is UTC, e.g., 2012-06-04T12:00:00-08:00.
No
File properties
Multi-select
Properties which should be get for each file in directory. Allowed names can be get from Google Drive API: Files
Yes
File types to pull
Multi-select
Types of files should be pulled from specified directory.
Yes
binary
Authentication
Client ID
String
OAuth2 client id.
No
Client secret
String
OAuth2 client secret.
No
Refresh token
String
OAuth2 refresh token.
No
Access token
String
OAuth2 access token.
No
Advanced
Maximum partition size
Number
Maximum partition size specified in bytes. Default 0 value means unlimited.
Name of the schema field (should be STRING type) which will be used as name of file. Is optional. In the case it is not set files have randomly generated 16-symbols names.
Yes
File body field
String
Name of the schema field (should be BYTES type) which will be used as body of file. The minimal input schema should contain only this field.
No
Directory identifier
String
Identifier of the destination folder.
No
Authentication
Client ID
String
OAuth2 client id.
No
Client secret
String
OAuth2 client secret.
No
Refresh token
String
OAuth2 refresh token.
No
Access token
String
OAuth2 access token.
No
Design / Implementation Tips
Tip #1
Tip #2
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999
Test Case(s)
Test case #1
Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.