Database Batch Source
Plugin version: 2.11.0
Reads from a database using a configurable SQL query. Outputs one record for each row returned by the query.
The source is used whenever you need to read from a database. For example, you may want to create daily snapshots of a database table by using this source and writing to a partitioned table on BigQuery.
Configuration
Property | Macro Enabled? | Version Introduced | Description |
---|---|---|---|
Use Connection | No | 6.7.0/2.9.0 | Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials. |
Connection | Yes | 6.7.0/2.9.0 | Optional. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}. |
JDBC Driver name | No |
| Required. Select the JDBC driver to use. |
Connection String | Yes |
| Required. JDBC connection string including database name. |
Username | Yes |
| Optional. User identity for connecting to the specified database. Required for databases that need authentication. Optional for databases that do not require authentication. |
Password | Yes |
| Optional. Password to use to connect to the specified database. Required for databases that need authentication. Optional for databases that do not require authentication. |
Connection Arguments | Yes |
| Optional. A list of arbitrary string tag/value pairs as connection arguments. These arguments will be passed to the JDBC driver, as connection arguments, for JDBC drivers that may need additional configurations. This is a semicolon-separated list of key-value pairs, where each pair is separated by a equals ‘=’ and specifies the key and value for the argument. For example, ‘key1=value1;key2=value’ specifies that the connection will be given arguments ‘key1’ mapped to ‘value1’ and the argument ‘key2’ mapped to ‘value2’. |
Reference Name | No |
| Required. Name used to uniquely identify this source for lineage, annotating metadata, etc. Typically, the name of the table name or view name in the Inbound Query. |
Import Query | Yes |
| Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'. The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query. The '$CONDITIONS' string is not required if numSplits is set to one. |
Bounding Query | Yes |
| Optional. Bounding Query should return the min and max of the values of the 'splitBy' field. For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one. |
Split-By Field Name | Yes |
| Optional. Field Name which will be used to generate splits. Not required if numSplits is set to one. |
Number of Splits to Generate | Yes |
| Optional. Number of splits to generate. Default is 1. |
Fetch Size | Yes |
| Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage. Default is 1000. |
Enable Auto-Commit | No |
| Optional. Whether to enable auto-commit for queries run by this source. In most cases, set to Default is |
Column Name Case | Yes |
| Optional. Sets the case of the column names returned from the query. Possible options are Default is no change. |
Transaction Isolation Level | Yes |
| Optional. The transaction isolation level for queries run by this sink. See java.sql.Connection#setTransactionIsolation for more details. The Phoenix jdbc driver will throw an exception if the Phoenix database does not have transactions enabled and this setting is set to true. For drivers like that, this should be set to TRANSACTION_NONE. Default is TRANSACTION_SERIALIZABLE. |
Pattern To Replace | No |
| Optional. The pattern to replace in the field name in the table, it is typically used with the Replace With config. If Replace With is not set, the pattern will be removed in the field name. |
Replace With | No |
| Optional. The string that will be replaced in the field name in the table, it must be used with the Pattern To Replace config. |
Output Schema | No |
| Required. Specifies the schema that has to be output. Only columns defined in schema will be included into output record. |
Using Third-Party JARs
For information on how to use the JDBC jar to talk to the database source, see Using Third-Party JARs.
Example
This example connects to a database using the specified Connection String, which means it will connect to the prod
database of a PostgreSQL instance running on localhost
. It will run the Import Query against the users
table to read four columns from the table. The column types will be used to derive the record field types output by the source.
Property | Value |
---|---|
Reference Name |
|
Plugin Name |
|
Plugin Type |
|
Connection String |
|
Import Query |
|
Bounding Query |
|
Split-By Field Name |
|
Username |
|
Password |
|
For example, if the id
column is a primary key of type int
and the other columns are non-nullable varchars
, output records will have this schema:
field name | type |
---|---|
id | int |
name | string |
string | |
phone | string |
Created in 2020 by Google Inc.