Database Batch Source

Plugin version: 2.11.0

Reads from a database using a configurable SQL query. Outputs one record for each row returned by the query.

The source is used whenever you need to read from a database. For example, you may want to create daily snapshots of a database table by using this source and writing to a partitioned table on BigQuery.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use Connection

No

6.7.0/2.9.0

Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials.

Connection

Yes

6.7.0/2.9.0

Optional. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}.

JDBC Driver name

No

 

Required. Select the JDBC driver to use.

Connection String

Yes

 

Required. JDBC connection string including database name.

Username

Yes

 

Optional. User identity for connecting to the specified database. Required for databases that need authentication. Optional for databases that do not require authentication.

Password

Yes

 

Optional. Password to use to connect to the specified database. Required for databases that need authentication. Optional for databases that do not require authentication.

Connection Arguments

Yes

 

Optional. A list of arbitrary string tag/value pairs as connection arguments. These arguments will be passed to the JDBC driver, as connection arguments, for JDBC drivers that may need additional configurations. This is a semicolon-separated list of key-value pairs, where each pair is separated by a equals ‘=’ and specifies the key and value for the argument. For example, ‘key1=value1;key2=value’ specifies that the connection will be given arguments ‘key1’ mapped to ‘value1’ and the argument ‘key2’ mapped to ‘value2’. 

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc. Typically, the name of the table name or view name in the Inbound Query.

Import Query

Yes

 

Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'. The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query. The '$CONDITIONS' string is not required if numSplits is set to one.

Bounding Query

Yes

 

Optional. Bounding Query should return the min and max of the values of the 'splitBy' field. For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one.

Split-By Field Name

Yes

 

Optional. Field Name which will be used to generate splits. Not required if numSplits is set to one.

Number of Splits to Generate

Yes

 

Optional. Number of splits to generate.

Default is 1.

Fetch Size

Yes

 

Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.

Default is 1000.

Enable Auto-Commit

No

 

Optional. Whether to enable auto-commit for queries run by this source. In most cases, set to false. If you use a JDBC driver that results in an error when the commit operation is run, set to true.

Default is false.

Column Name Case

Yes

 

Optional. Sets the case of the column names returned from the query. Possible options are upper or lower. By default or for any other input, the column names are not modified and the names returned from the database are used as-is. Note that setting this property provides predictability of column name cases across different databases but might result in column name conflicts if multiple column names are the same when the case is ignored.

Default is no change.

Transaction Isolation Level

Yes

 

Optional. The transaction isolation level for queries run by this sink. See java.sql.Connection#setTransactionIsolation for more details. The Phoenix jdbc driver will throw an exception if the Phoenix database does not have transactions enabled and this setting is set to true. For drivers like that, this should be set to TRANSACTION_NONE.

Default is TRANSACTION_SERIALIZABLE.

Pattern To Replace

No

 

Optional. The pattern to replace in the field name in the table, it is typically used with the Replace With config. If Replace With is not set, the pattern will be removed in the field name.

Replace With

No

 

Optional. The string that will be replaced in the field name in the table, it must be used with the Pattern To Replace config.

Output Schema

No

 

Required. Specifies the schema that has to be output. Only columns defined in schema will be included into output record.

Using Third-Party JARs

For information on how to use the JDBC jar to talk to the database source, see Using Third-Party JARs.

Example

This example connects to a database using the specified Connection String, which means it will connect to the prod database of a PostgreSQL instance running on localhost. It will run the Import Query against the users table to read four columns from the table. The column types will be used to derive the record field types output by the source.

Property

Value

Property

Value

Reference Name

database

Plugin Name

postgres

Plugin Type

jdbc

Connection String

jdbc:postgresql://localhost:5432/prod

Import Query

select id,name,email,phone from users where $CONDITIONS

Bounding Query

select min(id),max(id) from users

Split-By Field Name

id

Username

user123

Password

password-abc

For example, if the id column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

field name

type

field name

type

id

int

name

string

email

string

phone

string

Created in 2020 by Google Inc.