MySQL database plugin

MySQL database plugin

Introduction

A separate database plugin to support MySQL-specific features and configurations.

Use-Case

  • Users can choose and install MySQL source and sink plugins.

  • Users should see MySQL logo on plugin configuration page for better experience.

  • Users should get relevant information from the tool tip:

    • The tool tip for the connection string should be customized specifically to the MySQL database,

    • The tool tip should describe accurately what each field is used for.

  • User should get a performance comparable to Sqoop by utilizing sqoop libraries for the data ingestion and egress.

  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).

  • Users should get field level lineage for the source and sink that is being used.

  • Reference documentation should be updated to account for the changes.

  • The source code for MySQL database plugin should be placed in repo under data-integrations org.

  • Integration tests for MySQL database plugin should be added in the test repo.

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

User Stories

  • User should be able to install MySQL specific database source and sink plugins from the Hub

  • Users should have each tool tip accurately describe what each field does

  • Users should know the format for the MySql connection string by hovering over tool tip for connection string

  • Users should get field level lineage information for the MySQL source and sink 

  • Users should get a performance comparable to Sqoop when ingesting data from mysql and while writing data to MySQL (within ~15% of the time taken for sqoop)

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for MySQL source and sink

  • Users should be able to read all the DB types

Plugin Type

Batch Source
Batch Sink 
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Design Tips

MySQL Connector/J 8.0 reference: https://dev.mysql.com/doc/connector-j/8.0/en/

Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins

MySQL datatypes mappings and conversions: https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-type-conversions.html



Connecting Securely Using SSL
Configuring Connector/J client to use SSL can be accomplished by the following steps:
1) Import server certificate into the Java default truststore (although tampering the default truststore is not recommended) or by importing it into a custom Java truststore file. Use 'trustCertificateKeyStoreUrl' property to point the driver to the trusted root certificate keystore.
2) Generate the client private key and certificate or use keys and certificate files generated by the MySQL server. Convert the client key and certificate files to a PKCS #12 archive and import the archive into a Java keystore. Use 'clientCertificateKeyStoreUrl' property to point the driver to the client certificate keystore.
3) Use 'clientCertificateKeyStorePassword' and 'trustCertificateKeyStorePassword' properties to specify passwords for the client and trusted certificates keystores.

See:
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html
https://dev.mysql.com/doc/refman/8.0/en/encrypted-connections.html
https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-using-ssl.html

Support of the 'Use ANSI quotes to quote identifiers' property
Should not be specified via JDBC URL parameter since it will override the default SQL_MODE system variable, instead of appending 'ANSI_QUOTES' to it's value. Proper implementation has to read default SQL_MODE(which depends on the version of the MySQL Server), append 'ANSI_QUOTES' and update the value using "SET SESSION sql_mode = 'modes';" statement.

Design

Currently the two major MySQL versions supported are 5 and 8. We suggest using MySQL Connector/J 8.0 since it is backward compatible with older versions of MySQL and supports all the new features of recent releases.

The suggestion is to move database-plugins module from hydrator-plugins repository to database-plugins repo in data-integrations organization as described in Plugins Repo Split. There is existing code in database-plugins that may be reused for MySQL plugin. We suggest creating a multi-module Maven project where existing `database-plugins` will be a common functionality module for all subsequent DB plugins and each plugin for a specific database (in this case MySQL) will depend on it. Having each DB plugin in a dedicated module allows us to create separately deliverable artifacts, so user can upload only those plugins they need.

Sink Properties

User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI



Reference Name

String

Uniquely identified name for lineage



Host

String

Mysql host

Required (defaults to localhost on UI)

Port

Number

Specific port where mysql running on

Optional

(default 3306)

Database

String

Database name to connect

Required

Username

String

DB username

Required

Password

Password

User password

Required

Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink



Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html



Table Name

String

Name of a database table to write to



Use SSL

Select

Turns on SSL encryption. The connection will fail if SSL is not available.



Keystore URL

String

URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.



Keystore password

String

Password for the client certificates KeyStore.



Truststore URL

String

URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.



Truststore password

String

Password for the trusted root certificates KeyStore



Use compression protocol

Boolean

Use zlib compression when communicating with the server. Select this option for WAN connections.



SQL_MODE

String

Override the default SQL_MODE session variable used by the server.



Source Properties



User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI



Reference Name

String

Uniquely identified name for lineage



Host

String

Mysql host

Required (defaults to localhost on UI)

Port

Number

Specific port where mysql running on

Optional

(default 3306)

Database

String

Database name to connect

Required

Import Query

String

Query for import data

Valid SQL query

Username

String

DB username

Required

Password

String

User password

Required

Bounding Query

String

Returns max and min of split-By Filed

Valid SQL query

Split-By Field Name

String

Field name which will be used to generate splits



Number of Splits to Generate

Number

Number of splits to generate



Transaction Isolation Level

Select

Transaction isolation level for queries run by this sink



Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html



Use SSL

Select

Turns on SSL encryption. The connection will fail if SSL is not available.



Keystore URL

String

URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.



Keystore password

String

Password for the client certificates KeyStore.



Truststore URL

String

URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.



Truststore password

String

Password for the trusted root certificates KeyStore



Use compression protocol

Boolean

Use zlib compression when communicating with the server. Select this option for WAN connections.



SQL_MODE

String

Override the default SQL_MODE session variable used by the server.



Use ANSI quotes to quote identifiers

Boolean

Treats " as an identifier quote character and not as a string quote character.





Action Properties



User Facing Name

Type

Description

Constraints

User Facing Name

Type

Description

Constraints

Label

String

Label for UI



Host

String

Mysql host

Required (defaults to localhost on UI)

Port

Number

Specific port where mysql running on

Optional

(default 3306)

Database

String

Database name to connect

Required

Username

String

DB username

Required

Password

String

User password

Required

Connection Arguments

Keyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html



Database Command

String

Database command to run

Valid SQL query

Use SSL

Select

Turns on SSL encryption. The connection will fail if SSL is not available.



Keystore URL

String

URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.



Keystore password

String

Password for the client certificates KeyStore.



Truststore URL

String

URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.



Truststore password

String

Password for the trusted root certificates KeyStore



Use compression protocol

Boolean

Use zlib compression when communicating with the server. Select this option for WAN connections.



SQL_MODE

String

Override the default SQL_MODE session variable used by the server.



Use ANSI quotes to quote identifiers

Boolean

Treats " as an identifier quote character and not as a string quote character.





Data Types Mapping

MySQL Data Type

CDAP Schema Data Type

Support

Comment

MySQL Data Type

CDAP Schema Data Type

Support

Comment

BIT

Schema.Type.BOOLEAN

+



TINYINT

Schema.Type.INT

+



BOOL, BOOLEAN

Schema.Type.BOOLEAN

+



SMALLINT

Schema.Type.INT

+



MEDIUMINT

Created in 2020 by Google Inc.