Netezza database plugin

Netezza database plugin

Introduction

A separate database plugin to support Netezza-specific features and configurations.

Use-Case

  • Users can choose and install Netezza source and sink plugins.

  • Users should see Netezza logo on plugin configuration page for better experience.

  • Users should get relevant information from the tool tip:

    • The tool tip for the connection string should be customized specifically to the Netezza database,

    • The tool tip should describe accurately what each field is used for.

  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).

  • Users should get field level lineage for the source and sink that is being used.

  • Reference documentation should be updated to account for the changes.

  • The source code for Netezza database plugin should be placed in repo under data-integrations org.

  • Integration tests for Netezza database plugin should be added in the test repo.

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

  • User should be able to install Netezza specific database source and sink plugins from the Hub

  • Users should have each tool tip accurately describe what each field does

  • Users should get field level lineage information for the Netezza source and sink

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for Netezza source and sink

  • Users should be able to read all the DB types

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design Tips

Netezza connector reference: https://www.ibm.com/support/knowledgecenter/en/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.container.nav.doc/topics/properties_reference_netezza_connector.html

Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins

Netezza datatypes mappings and conversions: https://www.ibm.com/support/knowledgecenter/en/SSEP7J_10.1.1/com.ibm.swg.ba.cognos.vvm_reference_guide.10.1.1.doc/c_netezzads.html



Design

The suggestion is to create maven submodule netezza-plugin under database-plugins repo.



Sink Properties

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI



Reference Name

String

Uniquely identified name for lineage



Host

String

Netezza host

Required (defaults to localhost on UI)

Port

Number

Specific port which Netezza is listening to

Optional

(default 5480)

Database

String

Database name to connect

Required

Username

String

DB username

Required

Password

Password

User password

Required

Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink



Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties:

Netezza connection properties



Table Name

String

Name of a database table to write to



Source Properties



User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI



Reference Name

String

Uniquely identified name for lineage



Host

String

Netezza host

Required (defaults to localhost on UI)

Port

Number

Specific port which Netezza is listening to

Optional

(default 5480)

Database

String

Database name to connect

Required

Import Query

String

Query for import data

Valid SQL query

Username

String

DB username

Required

Password

String

User password

Required

Bounding Query

String

Returns max and min of split-By Filed

Valid SQL query

Split-By Field Name

String

Field name which will be used to generate splits



Number of Splits to Generate

Number

Number of splits to generate



Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink



Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties: Netezza connection properties





Action Properties



User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI



Host

String

Netezza host

Required (defaults to localhost on UI)

Port

Number

Specific port which Netezza is listening to

Optional

(default 5480)

Database

String

Database name to connect

Required

Username

String

DB username

Required

Password

String

User password

Required

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties:

Netezza connection properties



Database Command

String

Database command to run

Valid SQL query



Data Types Mapping

Netezza Data Type

CDAP Schema Data Type

Support

Comment

Netezza Data Type

CDAP Schema Data Type

Support

Comment

BOOLEAN

Schema.Type.BOOLEAN

+



BYTEINT

Schema.Type.INT

+



CHAR

Schema.Type.STRING

+



DATE

Schema.LogicalType.DATE

+



NUMERIC/DECIMAL

Schema.LogicalType.DECIMAL

+



DOUBLE PRECISION/FLOAT(15)

Schema.Type.DOUBLE

+



FLOAT(N)

Schema.Type.FLOAT/Schema.Type.DOUBLE

+

Can be mapped to FLOAT or DOUBLE, depends on N

INTEGER

Schema.Type.INT

+



SMALLINT

Schema.Type.INT

+



BIGINT

Schema.Type.LONG

+



NCHAR

Schema.Type.STRING

+



NVARCHAR

Schema.Type.STRING

+



REAL/FLOAT(6)

Schema.Type.FLOAT

+



TIME

Schema.LogicalType.TIME_MICROS

+



TIMETZ/TIME WITH TIME ZONE

Schema.Type.STRING

+



TIMESTAMP

Schema.LogicalType.TIMESTAMP_MICROS

+



VARCHAR

Schema.Type.STRING

+



INTERVAL

Schema.Type.STRING

+



VARBINARY

Schema.Type.BYTES

+



ST_GEOMETRY

Schema.Type.BYTES

+







Approach

Create a module netezza-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Netezza-specific properties to configuration, add support for Netezza-specific datatypes. Update UI widgets JSON definitions.

Pipeline Samples



API changes

Deprecated Programmatic APIs

database-plugins is moved to Data Integrations

UI Impact or Changes

Configurable database properties are presented as named text fields instead of arbitrary key value pairs. Netezza source and sink are separate entries with Netezza logo in source and sink lists.

Test Scenarios

TODO

Releases

Release X.Y.Z

Related Work

Database plugin enhancements

Future work

DB2 database plugin

Created in 2020 by Google Inc.