MySQL Batch Source

Plugin version: 1.10.0

Reads from a MySQL instance using a configurable SQL query. Outputs one record for each row returned by the query.

The source is used whenever you need to read from a MySQL instance. For example, you may want to create daily snapshots of a database table by using this source and writing to Amazon S3.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use connection

No

6.7.0/1.8.0

Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials.

Connection

Yes

6.7.0/1.8.0

Required. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}.

JDBC Driver Name

No

 

Required. Select the JDBC driver to use.

Default is mysql.

Host

Yes

 

Required. Host that MySQL is running on.

Default is localhost.

Port

Yes

 

Required. Port that MySQL is running on.

Default is 3306.

Username

Yes

 

Optional. User identity for connecting to the specified database.

Password

Yes

 

Optional. Password to use to connect to the specified database.

Connection Arguments

Yes

 

Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Database

Yes

 

Required. MySQL database name.

Import Query

Yes

 

Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the ‘$CONDITIONS’ string. For example, ‘SELECT * FROM table WHERE $CONDITIONS’. The ‘$CONDITIONS’ string will be replaced by Split-By Field Name field limits specified by the bounding query. The ‘$CONDITIONS’ string is not required if Number of Splits to Generate is set to 1.

Bounding Query

Yes

 

Required. Bounding Query should return the min and max of the values of the Split-By Field Name field. For example, ‘SELECT MIN(id),MAX(id) FROM table’. Not required if Number of Splits to Generate is set to 1.

Split-By Field Name

Yes

 

Optional. Field Name which will be used to generate splits. Not required if Number of Splits to Generate is set to 1.

Number of Splits to Generate

Yes

 

Optional. Number of splits to generate.

Default is 1.

Fetch Size

Yes

6.6.0/1.7.0

Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.

Default is 1000.

Use SSL

No

 

Optional. Turns on SSL encryption. The connection will fail if SSL is not available.

Default is if available.

Keystore URL

No

 

Optional. URL to the client certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Keystore Password

No

 

Optional. Password for the client certificates KeyStore.

Truststore URL

No

 

Optional. URL to the trusted root certificate KeyStore (if not specified, use defaults). Must be accessible at the same location on host where CDAP Master is running and all hosts on which at least one HDFS, MapReduce, or YARN daemon role is running.

Truststore Password

No

 

Optional. Password for the trusted root certificates KeyStore

Use Compression

No

 

Optional. Use zlib compression when communicating with the server. Select this option for WAN connections.

Default is No.

Use ANSI Quotes

No

 

Optional. Treats “ as an identifier quote character and not as a string quote character.

Default is No.

SQL_MODE

No

 

Optional. Override the default SQL_MODE session variable used by the server.

Auto Reconnect

No

 

Optional. Should the driver try to re-establish stale and/or dead connections.

Default is No.

Output Schema

No

 

Required. The schema of records output by the source. This will be used in place of whatever schema comes back from the query. However, it must match the schema that comes back from the query, except it can mark fields as nullable and can contain a subset of the fields.

Data Type Mapping

MySQL Data Type

CDAP Schema Data Type

MySQL Data Type

CDAP Schema Data Type

BIT

boolean

TINYINT

int

BOOL, BOOLEAN

boolean

SMALLINT

int

MEDIUMINT

double

INT, INTEGER

int

BIGINT

long

FLOAT

float

DOUBLE

double

DECIMAL

decimal

DATE

date

DATETIME

timestamp

TIMESTAMP

timestamp

TIME

time

YEAR

date

CHAR

string

VARCHAR

string

BINARY

bytes

VARBINARY

bytes

TINYBLOB

bytes

TINYTEXT

string

BLOB

bytes

TEXT

string

MEDIUMBLOB

bytes

MEDIUMTEXT

string

LONGBLOB

bytes

LONGTEXT

string

ENUM

string

SET

string

Example

Suppose you want to read data from MySQL database named “prod” that is running on “localhost” port 3306, as “root” user with “root” password (Ensure that the driver for MySQL is installed. You can also provide driver name for some specific driver, otherwise “mysql” will be used), then configure plugin with:

Property

Value

Property

Value

Reference Name

src1

Driver Name

mysql

Host

localhost

Port

3306

Database

prod

Import Query

select id, name, email, phone from users

Number of Splits to Generate

1

Username

root

Password

root

For example, if the ‘id’ column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

field_name

type

field_name

type

id

int

name

string

email

string

phone

string



Created in 2020 by Google Inc.