Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
As of 5.1, a single Database source and Sink plugin handles different types of databases. To improve the user experience the plugins should be separated out specific to databases (ex: mysql, netezza) with a custom logo, tool tips that help users to configure specific databases (Ex: connection string).
The core database plugin code should be re-used where-ever applicable to minimize the total cost of ownership.
Use case(s)
- Users can choose and install source and sink plugins specific to mysql, oracle, SqlServer, Netezza, DB2 and Postgres.
- Users should have a customized experience in configuring each of the DB plugins by having a custom logos specific to the database that is being used.
- Users should get relevant information from the tool tip
- The tool tip for the connection string should be customized specific to the database.
- The tool tip should describe accurately what each field is used for
- User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress in the source and sink plugins
- Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
- Users should get field level lineage for the source and sink that is being used
- Reference documentation should be updated to account for the changes
- The source code for each of type of database should be separated out in repos under data-integrations org
- Integration tests for specific plugins should be added in the test repos
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines
User Stories
Note: The same set of user stories applies to other databases: Netezza, SQLServer, Oracle, DB2 and Postgres
- User should be able to install Mysql specific database source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should know the format for the mysql connection string by hovering over tool tip for connection string
- Users should get field level lineage information for the mysql source and sink
- Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to mysql (within ~15% of the time taken for sqoop)
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for mysql source and sink
Deliverables
- Source code in data integrations org
- Performance test comparison with Sqoop
- Integration test code
- Relevant documentation in the source repo and reference documentation section in plugin
Relevant links
- Existing DB plugin code: https://github.com/caskdata/hydrator-plugins/tree/develop/database-plugins
- Data-integrations org: https://github.com/data-integrations/
- Field level lineage: https://docs.cdap.io/cdap/5.1.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html
- Integration test repos: https://github.com/caskdata/cdap-integration-tests
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Configurables
This section defines properties that are configurable for this plugin.
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Design / Implementation Tips
- Tip #1
- Tip #2
Design
Approach(s)
Properties
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature