Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
Google Sheets plugins will allow users to import data from Google Sheets into their pipeline, so that they can transform and enrich with other data sources.
User Storie(s)
- As a pipeline developer, I want to import data from Google Sheets, so that I can transform and enrich it using CDAP
- As a pipeline developer, I want to move all sheets from a given Google drive directory to a destination
- As a pipeline developer, I want to be able to pick certain sheets from a particular sheet to process using CDAP, so that I do not have to process all sheets all the time. I want to be able to specify the sheet using the sheet name or number.
- As a pipeline developer, I want to treat the first row of a sheet as a header, so that CDAP can automatically treat it as schema
- As a pipeline developer, I want to be able to specify a section at the top of my Sheet as a header, so that it is extracted as metadata, and not as actual data
- As a pipeline developer, I want to be able to specify a section at the bottom of my sheet as a footer, so that it is extracted as metadata, and not as actual data
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
Source
This section defines properties that are configurable for this plugin.
Section | User Facing Name | Type | Description | Optional | Constraints |
---|---|---|---|---|---|
Basic | App Id | string | No | ||
Access token | string | No | |||
Directory Id | string | Directory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV | No | ||
Filter | String | A filter that can be applied to the files in the selected directory. Filters follow the Google Drive Filter Syntax | Yes | ||
Modification date range | String | In addition to the filter specified above, also filter files to only pull those that were modified between the date range. Defaults to last year. | Yes | ||
Sheets to pull | multi-select | Select from a list of sheets to pull. Defaults to all. | Yes | ||
Advanced | Column Names Selection | Radio buttons | Choose between "No column names", "Treat first row as column names", "Custom row as column names" Defaults to "Treat first row as column names". | Yes | |
Header Custom row numberfor column names | Number | Only shown when the header selection is set to "Custom headerrow as column names". Accepts the row number of the row to be treated as a headercolumn names. Defaults to 0. | Yes | ||
Extract metadata | Toggle | Determines if a certain section of the sheet should be treated as metadata. Useful when you want to specify a header or a footer for a sheet. The rows in headers and footers are not available as data records. Instead, they are available in every record as a field called "metadata", which is a record of the specified metadata. | Yes | ||
First header row | Number | The row number of the first row to be treated as header. Defaults to 0. | Yes | ||
Last header row | Number | The row number of the last row to be treated as header. | Yes | ||
First footer row | Number | The row number of the first row to be treated as footer | Yes | ||
Last footer row | Number | The row number of the last row to be treated as footer | Yes | ||
Metadata key cells | KeyValue | Specifies the cells and the corresponding key names for the keys to extract as metadata from the specified metadata sections. Only shown if Capture Metadata is set to true. The cell numbers should be within the header or footer. E.g. A1 → department, B5 → category | Yes | ||
Metadata value cells | KeyValue | Specifies the cells and the corresponding key names of the values to extract as metadata from the specified metadata sections. Only shown if Capture Metadata is set to true. The cell numbers should be within the header or footer. E.g. department → A2, category → C5 | Yes | ||
Custom footer last row | Number | Only shown when the footer selection is set to Custom footer. Accepts the row number of the last row to be treated as a header. | Yes |
Note: The data in the specified header and footer rows should not be available as records to the rest of the pipeline.
Sink
User Facing Name | Type | Description | Optional | Constraints |
---|---|---|---|---|
App Id | string | No | ||
Access token | string | No | ||
Directory Id | string | Directory ID is the last part of the URL, such as https://drive.google.com/drive/folders/0B2kqcwp2ycGZanhSR3JmREw5VTV | No | |
Sheet name | string | Name of the sheet. Defaults to Sheet 1 | Yes | |
Write first row as headers | Toggle | If true, the schema is written as the first row of the sheet. Defaults to True. | Yes | |
Format for nested data | select | Choose amongst JSON, CSV. Format to serialize complex (nested) data as. Defaults to JSON. | Yes |
Note: Incoming records should be written to columns in the sheet
Design / Implementation Tips
- Tip #1
- Tip #2
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature