Oracle database plugin

Oracle database plugin

Introduction

A separate database plugin to support Oracle\-specific features and configurations.

Use-Case

  • Users can choose and install Oracle source and sink plugins.

  • Users should see Oracle logo on plugin configuration page for better experience.

  • Users should get relevant information from the tool tip:

    • The tool tip for the connection string should be customized specifically to the Oracle database,

    • The tool tip should describe accurately what each field is used for.

  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).

  • Users should get field level lineage for the source and sink that is being used.

  • Reference documentation should be updated to account for the changes.

  • The source code for Oracle database plugin should be placed in repo under data-integrations org.

  • Integration tests for Oracle database plugin should be  placed in repo under  data-integrations org.

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

  • User should be able to install Oracle specific database source and sink plugins from the Hub

  • Users should have each tool tip accurately describe what each field does

  • Users should get field level lineage information for the Oracle source and sink

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for Oracle source and sink

  • Users should be able to read all the DB types

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design Tips

Oracle connector reference: https://www.oracle.com/technetwork/database/application-development/jdbc/downloads/index.html

Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins

Oracle datatypes mappings and conversions:

 

Oracle has two types of driver thin and oci.

The JDBC Thin client is a pure Java, Type IV driver. It is lightweight and easy to install. More https://docs.oracle.com/cd/E11882_01/java.112/e16548/jdbcthin.htm#JJDBC28195

Oci driver requires native libraries to be installed, but provides some additional features like OCI Connection Pooling, Client Result Cache etc.

More https://www.oracle.com/database/technologies/appdev/oci.html,

https://docs.oracle.com/cd/E11882_01/java.112/e16548/instclnt.htm#JJDBC28218

 

Also oracle support tnsnames.ora file on client machine. More https://docs.oracle.com/database/121/NETRF/tnsnames.htm#NETRF007

Design

The suggestion is to create maven submodule oracle-plugin under database-plugins repo.

 

Sink Properties

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI

 

Reference Name

String

Uniquely identified name for lineage

 

Host

String

Oracle host

Required (defaults to localhost on UI)

Port

Number

Specific port where Oracle running on

Optional

(default 1521)

SID

String

SID name to connect

Required

Service name

String

Service name to connect

Required

Username

String

DB username

Required

Password

Password

User password

Required

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html

 

Table Name

String

Name of a database table to write to

 

Driver type

Select

Oracle driver type

Possible values (thin, oci)

Source Properties

 

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI

 

Reference Name

String

Uniquely identified name for lineage

 

Host

String

Oracle host

Required (defaults to localhost on UI)

Port

Number

Specific port where Oracle running on

Optional

(default 1521)

SID

String

SID name to connect

Required

Service name

String

Service name to connect

Required

Import Query

String

Query for import data

Valid SQL query

Username

String

DB username

Required

Password

String

User password

Required

Bounding Query

String

Returns max and min of split-By Filed

Valid SQL query

Split-By Field Name

String

Field name which will be used to generate splits

 

Number of Splits to Generate

Number

Number of splits to generate

 

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html

 

Driver type

Select

Oracle driver type

Possible values (thin, oci)

 

Action Properties

 

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI

 

Host

String

Oracle host

Required (defaults to localhost on UI)

Port

Number

Specific port where Oracle running on

Optional

(default 1521)

SID

String

SID name to connect

Required

Service name

String

Service name to connect

Required

Username

String

DB username

Required

Password

String

User password

Required

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties 

https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html

 

Database Command

String

Database command to run

Valid SQL query

Driver type

Select

Oracle driver type

Possible values (thin, oci)

 

Data Types Mapping

Oracle Data Type

CDAP Schema Data Type

Support

Comment

Oracle Data Type

CDAP Schema Data Type

Support

Comment

VARCHAR2

Schema.Type.STRING

+

 

NVARCHAR2

Schema.Type.STRING

+

 

VARCHAR

Schema.Type.STRING

+

 

NUMBER

Schema.LogicalType.DECIMAL

+

 

FLOAT

Schema.Type.DOUBLE

+

FLOAT(126) by default value is represented internally as NUMBER

LONG

Schema.Type.STRING

+

Character data of variable length.

DATE

Schema.LogicalType.TIMESTAMP_MICROS

+

 

BINARY_FLOAT

Schema.Type.FLOAT

+

 

BINARY_DOUBLE

Schema.Type.DOUBLE

+

 

TIMESTAMP

Schema.LogicalType.TIMESTAMP_MICROS

+

 

TIMESTAMP WITH TIME ZONE

Schema.LogicalType.TIMESTAMP_MICROS

*

Currently converted to UTC time, modifying original time zone.

TIMESTAMP WITH LOCAL TIME ZONE

Schema.LogicalType.TIMESTAMP_MICROS

+

 

INTERVAL YEAR TO MONTH

Schema.Type.STRING

+

 

INTERVAL DAY TO SECOND

Schema.Type.STRING

+

 

RAW

Schema.Type.BYTES

+

 

LONG RAW

Schema.Type.BYTES

+

 

ROWID

Schema.Type.STRING

+

 

UROWID

Schema.Type.STRING

+

 

CHAR

Schema.Type.STRING

+

 

NCHAR

Schema.Type.STRING

+

 

CLOB

Schema.Type.STRING

+

 

NCLOB

Schema.Type.STRING

+

 

BLOB

Schema.Type.BYTES

+

 

BFILE

Schema.Type.BYTES

*

Deprecated by Oracle in java api, added mapping to Type.BYTES

 

 

Approach

Create a module oracle-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Oracle-specific properties to configuration, add support for Oracle-specific datatypes. Update UI widgets JSON definitions.

The default driver should be used for connection to oracle, otherwise user should connect via generic-database plugin.

Pipeline Samples



API changes

Deprecated Programmatic APIs

database-plugins is moved to Data Integrations

UI Impact or Changes

Configurable database properties are presented as named text fields instead of arbitrary key value pairs. Oracle source and sink are separate entries with Oracle logo in source and sink lists.

Test Scenarios

TODO

Releases

Release X.Y.Z

Related Work

Database plugin enhancements

Future work

MSSQL database plugin

 

Created in 2020 by Google Inc.