Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note: Although you can change table names with runtime arguments and the JavaScript transformation, you cannot view lineage for the pipeline if you do this. It’s recommended to avoid changing table names in plugins downstream from the source plugins.

Configuration

Property

Macro Enabled?

Description

Reference Name

No

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

JDBC Connection String

Yes

Required. The JDBC connection string to the database. For example: jdbc:mysql://HOST/DATABASE.

JDBC Plugin Name

No

Optional. The name of the JDBC plugin to use.

Database User Name

Yes

Optional. The username to use when connecting to the database.

Database User Password

Yes

Optional. The password to use when connecting to the database.

Data Selection Mode

Yes

The operation mode for this plugin. Select one of the following:

  • Allow List of Table Names: Define the list of database tables to include.

  • Block List of Table Names: Define the list of database tables to exclude.

  • Custom SQL Statements: Manually define each of the SQL Statement to execute.

Depending on the selected Data Selection Mode, the properties listed in the rest of the table will be available.

Schema Name Pattern

Yes

Optional. A pattern that defines which schemas should be used to list the tables. Any schema whose name matches the pattern will read. If not specified, all schema will be read. Pattern syntax is specific to the type of database that is being connected to.

Table Name Pattern

Yes

Optional. A pattern that defines which tables should be read from. Any table whose name matches the pattern will read. If not specified, all tables will be read. Pattern syntax is specific to the type of database that is being connected to.

Where Clause

Yes

Optional. Filters which records needs to be consumed from each table: i.e. where updated_at > '2018-08-20 00:00:00'. The where clause will be applied to every table that is being read. Therefore, all the columns that are mentioned in the where clause should be present in each table.

Allow List of Table Names

Yes

Optional. Used in conjunction with Table Name Pattern, this configuration specifies tables to be read. If no value is specified in the white list, all tables matching the Table Name Pattern will be read. By default, reads all tables matching the Table Name Pattern.

Block List of Table Names

Yes

Optional. Used in conjunction with Table Name Pattern, this configuration specifies the tables to be skipped. By default, the black list is empty which means no tables will be skipped.

SQL Statements

Yes

Optional. List of SQL statements to execute. Each statement will be handled as a different partition. When submitting this statements using the API, use a semicolon ; as a separator. If the SQL statement includes a semicolon ; character, you will need to escape it using \;. This option is only displayed when the Data Selection Mode is SQL Statements.

Table Aliases

Yes

Optional. List of aliases to use for the datasets generated by the supplied SQL statements. If supplied, the Table Alias for each SQL statement will be used as the Table Name value for each record. The SQL statements will be matched to a Table Alias based on the order in which they appear on the list. This option is only displayed when the Data Selection Mode is SQL Statements.

Table Name Field

No

Optional. The name of the field that holds the table name. Must not be the name of any table column that will be read.

Default is ‘tablename’.

Enable Auto Commit

Yes

Optional. Whether to enable auto commit for queries run by this source. This setting should only matter if you are using a JDBC driver that does not support a false value for auto commit, or a driver that does not support the commit call. For example, the Hive JDBC driver will throw an exception whenever a commit is called. For drivers like that, this should be set to true.

Defaults to false.

Splits Per Table

Yes

Optional. The number of splits per table.

This option is only displayed when the Data Selection Mode is either Allow List ot Tables or Block List of Tables.

Default is 1.

Query Timeout (Seconds)

No

Optional. The query timeout in seconds.

Transaction Isolation Level

Yes

Optional. The transaction isolation level for queries run by this sink. Defaults to TRANSACTION_SERIALIZABLE. See java.sql.Connection#setTransactionIsolation for more details. The Phoenix jdbc driver will throw an exception if the Phoenix database does not have transactions enabled and this setting is set to true. For drivers like that, this should be set to TRANSACTION_NONE.

Error Handling Mode

No

Optional. How to handle error handling.

Default is fail pipeline.

Custom SQL Statements

When using the Data Selection Mode called SQL Statements, the supplied list of SQL statements will be executed as supplied using the specified database connection.

...

If a Table Alias for a given SQL statement is spacifiedspecified, this value will be returned as the Table Name for the records generated using this SQL statement.

...

This example reads from all tables in the ‘customers’ database on host ‘host123.example.net’:

Property

Value

Reference Name

src1

JDBC Connection String

jdbc:mysql://host123.example.net/customers

JDBC Plugin Name

mysql

Splits Per Table

2

Suppose you have two tables in the ‘customers’ database, where ID column is the primary key in both tables. The first table is named ‘accounts’ and contains:

ID

name

email

0

Samuel

sjax@example.net

1

Alice

a@example.net

2

Bob

b@example.net

3

John

j@example.net

The second is named ‘activity’ and contains:

ID

userid

item

action

0

0

shirt123

view

1

0

carxyz

view

2

0

shirt123

buy

3

0

coffee

view

4

1

cola

buy

5

1

pepsi

buy

You will have 4 splits (2 per each table) with such queries:

...

The output of the the source will be the following records:

ID

name

email

tablename

0

Samuel

sjax@example.net

accounts

1

Alice

a@example.net

accounts

2

Bob

b@example.net

accounts

3

John

j@example.net

accounts

ID

userid

item

action

tablename

0

0

shirt123

view

activity

1

0

carxyz

view

activity

2

0

shirt123

buy

activity

3

0

coffee

view

activity

4

1

cola

buy

activity

5

1

pepsi

buy

activity

Derived Table Names

Only one table per query

...

For example, if the JDBC driver is able to provide the Table Name from a SQL statement ResultSet, the derived table name will look as follows:

SQL Statement

Table Alias

JDBC Driver support

Derived Table Name

SELECT * FROM table1

my_alias_1

Yes

my_alias_1

SELECT * FROM table2

<empty>

Yes

table2

SELECT * FROM users JOIN orders ON …

<empty?

Yes

users_orders

In case the JDBC driver is NOT able to provide the Table Name from a SQL statement ResultSet, the derived table name will look as follows:

SQL Statement

Table Alias

JDBC Driver support

Derived Table Name

SELECT * FROM table1

my_alias_1

No

my_alias_1

SELECT * FROM table2

<empty>

No

sql_statement_2

SELECT * FROM users JOIN orders ON …

<empty?

No

sql_statement_3