Google Sheets plugins will allow users to import data from Google Sheets into their pipeline, so that they can transform and enrich with other data sources.
User Storie(s)
As a pipeline developer, I want to import data from Google Sheets, so that I can transform and enrich it using CDAP
As a pipeline developer, I want to move all sheets from a given Google drive directory to a destination
As a pipeline developer, I want to be able to pick certain sheets from a particular sheet to process using CDAP, so that I do not have to process all sheets all the time. I want to be able to specify the sheet using the sheet name or number.
As a pipeline developer, I want to treat the first row of a sheet as a header, so that CDAP can automatically treat it as schema
As a pipeline developer, I want to be able to specify a section at the top of my Sheet as a header, so that it is extracted as metadata, and not as actual data
As a pipeline developer, I want to be able to specify a section at the bottom of my sheet as a footer, so that it is extracted as metadata, and not as actual data
Plugin Type
Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute
Configurables
Source
This section defines properties that are configurable for this plugin.
Section
User Facing Name
Type
Description
Optional
Constraints
Basic
App Id
string
No
Access token
string
No
Directory Id
string
Directory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV
No
Filter
String
A filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter Syntax
Yes
Modification date range
String
In addition to the filter specified above, also filter files to only pull those that were modified between the date range. Defaults to last year.
Yes
Sheets to pull
multi-select
Select from a list of sheets to pull. Defaults to all.
Yes
Advanced
Header selection
Radio buttons
Choose between No Headers, Treat first row as header, Custom header row. Defaults to Treat first row as header.
Yes
Header row number
Number
Only shown when the header selection is set to Custom header. Accepts the row number of the row to be treated as a header. Defaults to 0.
Yes
Extract metadata
Toggle
Determines if a certain section of the sheet should be treated as metadata. Useful when you want to specify a header or a footer for a sheet. The rows in headers and footers are not available as data records. Instead, they are available in every record as a field called "metadata", which is a record of the specified metadata.
Yes
First header row
Number
The row number of the first row to be treated as header. Defaults to 0.
Yes
Last header row
Number
The row number of the last row to be treated as header.
Yes
First footer row
Number
The row number of the first row to be treated as footer
Yes
Last footer row
Number
The row number of the last row to be treated as footer
Yes
Metadata key cells
KeyValue
Specifies the cells and the corresponding key names for the keys to extract as metadata from the specified metadata sections. Only shown if Capture Metadata is set to true. The cell numbers should be within the header or footer.
E.g. A1 → department, B5 → category
Yes
Metadata value cells
KeyValue
Specifies the cells and the corresponding key names of the values to extract as metadata from the specified metadata sections. Only shown if Capture Metadata is set to true. The cell numbers should be within the header or footer.
E.g. department → A2, category → C5
Yes
Custom footer last row
Number
Only shown when the footer selection is set to Custom footer. Accepts the row number of the last row to be treated as a header.
Yes
Note: The data in the specified header and footer rows should not be available as records to the rest of the pipeline.