Jethro Data plugin

Introduction

A separate database plugin to support Jethro Data features and configurations.

Use case

  • Users can choose and install Jethro Data plugin.
  • Users should see Jethro Data logo on plugin configuration page for better experience.
  • Users should get relevant information from the tool tip:
    • The tool tip should describe accurately what each field is used for.
  • Users should not have to specify any redundant configuration.
  • Users should get field level lineage for the source and sink that is being used.
  • Reference documentation should be updated to account for the changes.
  • The source code for Jethro Data database plugin should be placed in repo under data-integrations.org.

User Storie

  • Users should be able to install Jethro Data specific database plugin from the Hub.
  • Users should have each tool tip accurately describe what each field does.
  • Users should get field level lineage information for the Jethro Data plugin.
  • Users should be able to setup a pipeline avoiding specifying redundant information.
  • Users should get updated reference document for Jethro Data plugin.
  • Users should be able to read all the DB types.

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design Tips

Design

Jethro Data Overview

Customers use Jethro for interactive BI on Big Data. Jethro is a transparent middle tier that requires no changes to existing apps or data. It is self-driving with no maintenance required.
Jethro is compatible with BI tools like Tableau, Qlik and Microstrategy and is data source agnostic.
Jethro delivers on the demands of business users allowing for thousands of concurrent users to run complicated queries over billions of records while delivering the interactive speed that they expect.

Powerful Architecture

Jethro combines two systems to cover the widest range of queries with the highest performance: full indexing and auto cubes. Together they deliver the fastest query performance regardless of query type or repeatability.

Self Driving

Jethro requires no human maintenance or tuning. Indexes, auto cubes and query caches are automatically maintained and kept current by background services.

Source Properties

SectionUser Facing NameWidget TypeDescriptionConstraints
GeneralLabeltextbox


Reference Nametextbox
Required

Driver Nametextbox
Required

Hosttextbox
Required

Porttextbox
Required

Instancetextbox
Required

Import Querytextarea


Bounding Querytextarea






CredentialsUsernametextbox
Required

Passwordpassword
Required





AdvancedSplit-By Field Nametextbox


Number of Splits to Generatetextbox

Source Data Types Mapping

Jethro Data TypesCDAP Schema Data Types
INTEGERint
BIGINTlong
FLOATfloat
DOUBLEdouble
STRINGstring
TIMESTAMPtimestamp-micros

Action Properties

SectionUser Facing NameWidget TypeDescriptionConstraints
BasicLabeltextbox


Driver Nametextbox
Required

Hosttextbox
Required

Porttextbox
Required

Instancetextbox
Required

Database Commandtextarea
Required





CredentialsUsernametextbox
Required

Passwordpassword
Required

Post-Run Action Properties

SectionUser Facing NameWidget TypeDescriptionConstraints
BasicLabeltextbox


Run Conditionselect
Required

Driver Nametextbox
Required

Hosttextbox
Required

Porttextbox
Required

Instancetextbox
Required

Querytextarea
Required





CredentialsUsernametextbox
Required

Passwordpassword
Required

Approach

Create a module jethro-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Jethro-specific properties to configuration, add support for Jethro-specific data types. Update UI widgets JSON definitions.

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data.

Releases

Release X.Y.Z

Related Work

Database plugin enhancements



Created in 2020 by Google Inc.