/
Database plugin enhancements
Database plugin enhancements
Sree Raman
Illia
Owned by Sree Raman
Last updated: Jan 11, 2019 by Illia
Introduction
As of 5.1, a single Database source and Sink plugin handles different types of databases. To improve the user experience the plugins should be separated out specific to databases (ex: mysql, netezza) with a custom logo, tool tips that help users to configure specific databases (Ex: connection string).
The core database plugin code should be re-used where-ever applicable to minimize the total cost of ownership.
Use case(s)
- Users can choose and install source and sink plugins specific to mysql, oracle, SqlServer, Netezza, DB2 and Postgres.
- Users should have a customized experience in configuring each of the DB plugins by having a custom logos specific to the database that is being used.
- Users should get relevant information from the tool tip
- The tool tip for the connection string should be customized specific to the database.
- The tool tip should describe accurately what each field is used for
- User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress in the source and sink plugins
- Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
- Users should get field level lineage for the source and sink that is being used
- Reference documentation should be updated to account for the changes
- All the DB types should be supported
- The source code for each of type of database should be separated out in repos under data-integrations org
- Integration tests for specific plugins should be added in the test repos
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines
User Stories
Note: The same set of user stories applies to other databases: Netezza, SQLServer, Oracle, DB2 and Postgres
- User should be able to install Mysql specific database source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should know the format for the mysql connection string by hovering over tool tip for connection string
- Users should get field level lineage information for the mysql source and sink
- Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to mysql (within ~15% of the time taken for sqoop)
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for mysql source and sink
- Users should be able to read all the DB types
Deliverables
- Source code in data integrations org
- Performance test comparison with Sqoop
- Integration test code
- Relevant documentation in the source repo and reference documentation section in plugin
Relevant links
- Existing DB plugin code: https://github.com/caskdata/hydrator-plugins/tree/develop/database-plugins
- Data-integrations org: https://github.com/data-integrations/
- Field level lineage: https://docs.cdap.io/cdap/5.1.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html
- Integration test repos: https://github.com/caskdata/cdap-integration-tests
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Design / Implementation Tips
- Tip #1
- Tip #2
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature
, multiple selections available,
Related content
MySQL database plugin
MySQL database plugin
More like this
Microsoft SQL Server database plugin
Microsoft SQL Server database plugin
More like this
Oracle database plugin
Oracle database plugin
More like this
PostgreSQL database plugin
PostgreSQL database plugin
More like this
DB2 database plugin
DB2 database plugin
More like this
MariaDB database plugin
MariaDB database plugin
More like this
Created in 2020 by Google Inc.