Database plugin enhancements

Database plugin enhancements

Introduction

As of 5.1, a single Database source and Sink plugin handles different types of databases. To improve the user experience the plugins should be separated out specific to databases (ex: mysql, netezza) with a custom logo, tool tips that help users to configure specific databases (Ex: connection string).

The core database plugin code should be re-used where-ever applicable to minimize the total cost of ownership. 

Use case(s)

  • Users can choose and install source and sink plugins specific to mysql, oracle, SqlServer, Netezza, DB2 and Postgres. 

  • Users should have a customized experience in configuring each of the DB plugins by having a custom logos specific to the database that is being used. 

  • Users should get relevant information from the tool tip 

    • The tool tip for the connection string should be customized specific to the database. 

    • The tool tip should describe accurately what each field is used for

  • User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress in the source and sink plugins

  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)

  • Users should get field level lineage for the source and sink that is being used

  • Reference documentation should be updated to account for the changes 

  • All the DB types should be supported

  • The source code for each of type of database should be separated out in repos under data-integrations org

  • Integration tests for specific plugins should be added in the test repos

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines

User Stories

Note: The same set of user stories applies to other databases: Netezza, SQLServer, Oracle, DB2 and Postgres

  • User should be able to install Mysql specific database source and sink plugins from the Hub

  • Users should have each tool tip accurately describe what each field does

  • Users should know the format for the mysql connection string by hovering over tool tip for connection string

  • Users should get field level lineage information for the mysql source and sink 

  • Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to mysql (within ~15% of the time taken for sqoop)

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for mysql source and sink

  • Users should be able to read all the DB types

Deliverables 

  • Source code in data integrations org

  • Performance test comparison with Sqoop

  • Integration test code 

  • Relevant documentation in the source repo and reference documentation section in plugin

Relevant links 

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configurables

This section defines properties that are configurable for this plugin. 

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

 

 

 

 

 

 

 

 

Design / Implementation Tips

  • Tip #1

  • Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

  • Some future work – HYDRATOR-99999

  • Another future work – HYDRATOR-99999

Test Case(s)

  • Test case #1

  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2

 

 

Table of Contents

Checklist

User stories documented 
User stories reviewed 
Design documented 
Design reviewed 
Feature merged 
Examples and guides 
Integration tests 
Documentation for feature 
Short video demonstrating the feature

Created in 2020 by Google Inc.