Amazon AuroraDB PostgreSQL plugin
Introduction
Amazon Aurora is a PostgreSQL compatible database offered as a service. Users will have needs to write to AuroraDB or read from AuroraDB.
Use-case
Users would like to batch build a data pipeline to read complete table from Amazon Aurora DB instance and write to BigTable.
Users would like to batch build a data pipeline to perform upserts on AuroraDB tables in batch
Users should get relevant information from the tool tip while configuring the AuroraDB source and AuroraDB sink
The tool tip for the connection string should be customized specific to the database.
The tool tip should describe accurately what each field is used for
Users should get field level lineage for the source and sink that is being used
Reference documentation be available from the source and sink plugins
User Stories
User should be able to install AuroraDB PosgreSQL source and sink plugins from the Hub
Users should have each tool tip accurately describe what each field does
Users should get field level lineage information for the AuroraDB PostgreSQL source and sink
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for AuroraDB PostgreSQL source and sink
Users should be able to read all the DB types
Deliverables
Source code in data integrations org
Integration test code
Relevant documentation in the source repo and reference documentation section in plugin
Relevant links
Data-integrations org: https://github.com/data-integrations/
Field level lineage: https://docs.cdap.io/cdap/6.0.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html
Integration test repos: https://github.com/caskdata/cdap-integration-tests
Plugin Type
Design / Implementation Tips
Reuse database-commons module from database-plugins repo.
Design
It is suggested to place plugin code under database-plugin repository to reuse existing database capabilities.
Source Properties
User Facing Name | Type | Description | Constraints |
|---|---|---|---|
Label | String | Label for UI | |
Reference Name | String | Uniquely identified name for lineage | Required |
Driver Name | String | Name of JDBC driver to use | Required (defaults to postgres) |
Cluster endpoint | String | URL of the current master instance of PostgreSQL cluster | Required |
Port | Number | Port of PostgreSQL cluster's master instance | Optional (defaults to 5432) |
Database | String | Database name to connect | Required |
Import Query | String | Query for import data | Valid SQL query |
Username | String | DB username | Required |
Password | String | User password | Required |
Bounding Query | String | Returns max and min of split-By Filed | Valid SQL query |
Split-By Field Name | String | Field name which will be used to generate splits | |
Number of Splits to Generate | Number | Number of splits to generate | |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters |
Sink Properties
User Facing Name | Type | Description | Constraints |
|---|---|---|---|
Label | String | Label for UI | |
Reference Name | String | Uniquely identified name for lineage | Required |
Driver Name | String | Name of JDBC driver to use | Required (defaults to postgres) |
Host | String | URL of the current master instance of PostgreSQL cluster | Required |
Port | Number | Port of PostgreSQL cluster's master instance | Optional (defaults to 5432) |
Database | String | Database name to connect | Required |
Username | String | DB username | Required |
Password | Password | User password | Required |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters | |
Table Name | String | Name of a database table to write to | Requried |
Test Case(s)
Test case #1
Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Created in 2020 by Google Inc.