Introduction

Google Sheets plugins will allow users to import data from Google Sheets into their pipeline, so that they can transform and enrich with other data sources.

User Storie(s)

As a pipeline developer, I want to import data from Google Sheets, so that I can transform and enrich it using CDAP
As a pipeline developer, I want to move all sheets from a given Google drive directory to a destination
As a pipeline developer, I want to be able to pick certain sheets from a particular sheet to process using CDAP, so that I do not have to process all sheets all the time. I want to be able to specify the sheet using the sheet name or number.
As a pipeline developer, I want to treat the first row of a sheet as a header, so that CDAP can automatically treat it as schema
As a pipeline developer, I want to be able to specify a section at the top of my Sheet as a header, so that it is extracted as metadata, and not as actual data
As a pipeline developer, I want to be able to specify a section at the bottom of my sheet as a footer, so that it is extracted as metadata, and not as actual data

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configurables

Source

This section defines properties that are configurable for this plugin.

Section	User Facing Name	Type	Description	Optional
Basic	App Id	string		No
	Access token	string		No
	Directory Id	string	Directory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV	No
	Filter	String	A filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter Syntax	Yes
	Modification date range	String	In addition to the filter specified above, also filter files to only pull those that were modified between the date range. Defaults to last year.	Yes
	Sheets to pull	multi-select	Select from a list of sheets to pull. Defaults to all.	Yes
Advanced	Column Names Selection	Radio buttons	Choose between "No column names", "Treat first row as column names", "Custom row as column names" Defaults to "Treat first row as column names".	Yes
	Header Custom row numberfor column names	Number	Only shown when the header selection is set to "Custom headerrow as column names". Accepts the row number of the row to be treated as a headercolumn names. Defaults to 0.	Yes
	Extract metadata	Toggle	Determines if a certain section of the sheet should be treated as metadata. Useful when you want to specify a header or a footer for a sheet. The rows in headers and footers are not available as data records. Instead, they are available in every record as a field called "metadata", which is a record of the specified metadata.	Yes
	First header row	Number	The row number of the first row to be treated as header. Defaults to 0.	Yes
	Last header row	Number	The row number of the last row to be treated as header.	Yes
	First footer row	Number	The row number of the first row to be treated as footer	Yes
	Last footer row	Number	The row number of the last row to be treated as footer	Yes
	Metadata key cells	KeyValue	Specifies the cells and the corresponding key names for the keys to extract as metadata from the specified metadata sections. Only shown if Capture Metadata is set to true. The cell numbers should be within the header or footer. E.g. A1 → department, B5 → category	Yes
	Metadata value cells	KeyValue	Specifies the cells and the corresponding key names of the values to extract as metadata from the specified metadata sections. Only shown if Capture Metadata is set to true. The cell numbers should be within the header or footer. E.g. department → A2, category → C5	Yes
	Custom footer last row	Number	Only shown when the footer selection is set to Custom footer. Accepts the row number of the last row to be treated as a header.	Yes

Note: The data in the specified header and footer rows should not be available as records to the rest of the pipeline.

Sink

User Facing Name	Type	Description	Optional
App Id	string		No
Access token	string		No
Directory Id	string	Directory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV	No
Sheet name	string	Name of the sheet. Defaults to Sheet 1	Yes
Write first row as headers	Toggle	If true, the schema is written as the first row of the sheet. Defaults to True.	Yes
Format for nested data	select	Choose amongst JSON, CSV. Format to serialize complex (nested) data as. Defaults to JSON.	Yes

Note: Incoming records should be written to columns in the sheet

Design / Implementation Tips

Tip #1
Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999

Test Case(s)

Test case #1
Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data.

Pipeline #1

Pipeline #2

Table of Contents

Table of Contents

style	circle

Checklist

User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
Examples and guides
Integration tests
Documentation for feature
Short video demonstrating the feature

Version	Old Version 5	New Version 6
Changes made by	Dmitrii Vitiiuk	Dmitrii Vitiiuk
Saved on	Oct 27, 2019	Oct 27, 2019

Content Comparison

Versions Compared

Key