Introduction

A separate database plugin to support Jethro Data features and configurations.

Use case

Users can choose and install Jethro Data plugin.
Users should see Jethro Data logo on plugin configuration page for better experience.
Users should get relevant information from the tool tip:
- The tool tip should describe accurately what each field is used for.
Users should not have to specify any redundant configuration.
Users should get field level lineage for the source and sink that is being used.
Reference documentation should be updated to account for the changes.
The source code for Jethro Data database plugin should be placed in repo under data-integrations.org.

User Storie

Users should be able to install Jethro Data specific database plugin from the Hub.
Users should have each tool tip accurately describe what each field does.
Users should get field level lineage information for the Jethro Data plugin.
Users should be able to setup a pipeline avoiding specifying redundant information.
Users should get updated reference document for Jethro Data plugin.
Users should be able to read all the DB types.

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design Tips

Reference to the Jethro Data jdbc driver: https://jethro.io/driver-downloads
Reference to the Jethro Data jdbc driver documentation: http://docs.jethro.io/display/JethroLatest/JDBC+Driver

Design

Jethro Data Overview

Customers use Jethro for interactive BI on Big Data. Jethro is a transparent middle tier that requires no changes to existing apps or data. It is self-driving with no maintenance required.
Jethro is compatible with BI tools like Tableau, Qlik and Microstrategy and is data source agnostic.
Jethro delivers on the demands of business users allowing for thousands of concurrent users to run complicated queries over billions of records while delivering the interactive speed that they expect.

Powerful Architecture

Jethro combines two systems to cover the widest range of queries with the highest performance: full indexing and auto cubes. Together they deliver the fastest query performance regardless of query type or repeatability.

Self Driving

Jethro requires no human maintenance or tuning. Indexes, auto cubes and query caches are automatically maintained and kept current by background services.

Source Properties

Section	User Facing Name	Widget Type	Constraints
General	Label	textbox
	Reference Name	textbox	Required
	Driver Name	textbox	Required
	Host	textbox	Required
	Port	textbox	Required
	Instance	textbox	Required
	Import Query	textarea
	Bounding Query	textarea

Credentials	Username	textbox	Required
	Password	password	Required

Advanced	Split-By Field Name	textbox
	Number of Splits to Generate	textbox

Source Data Types Mapping

Jethro Data Types	CDAP Schema Data Types
INTEGER	int
BIGINT	long
FLOAT	float
DOUBLE	double
STRING	string
TIMESTAMP	timestamp-micros

Approach

Create a module jethro-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Jethro-specific properties to configuration, add support for Jethro-specific data types. Update UI widgets JSON definitions.

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data.

Releases

Release X.Y.Z

Related Work

Database plugin enhancements

Jethro Data plugin