PostgreSQL Batch Source
Plugin version: 1.10.0
Reads from a PostgreSQL using a configurable SQL query. Outputs one record for each row returned by the query. For example, you might want to create daily snapshots of a database table by using this source and writing to a Redshift table.
Configuration
Property | Macro Enabled? | Version Introduced | Description |
---|---|---|---|
Use connection | No | 6.7.0/1.8.0 | Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials. |
Connection | Yes | 6.7.0/1.8.0 | Required. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}. |
JDBC Driver Name | No |
| Required. Select the JDBC driver to use. Default is postgresql. |
Host | Yes |
| Required. Host that PostgreSQL is running on. Default is localhost. |
Port | Yes |
| Required. Port that PostgreSQL is running on. Default is 5432. |
Username | Yes |
| Optional. User identity for connecting to the specified database. |
Password | Yes |
| Optional. Password to use to connect to the specified database. |
Connection Arguments | Yes |
| Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations. |
Database | Yes |
| Required. PostgreSQL database name. |
Reference Name | No |
| Required. Name used to uniquely identify this source for lineage, annotating metadata, etc. |
Import Query | Yes |
| Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'. The '$CONDITIONS' string will be replaced by Split-By Field Name field limits specified by the bounding query. The '$CONDITIONS' string is not required if Number of Splits to Generate is set to 1. |
Bounding Query | Yes |
| Optional. Bounding Query should return the min and max of the values of the Split-By Field Name field. For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if Number of Splits to Generate is set to 1. |
Split-By Field Name | Yes |
| Optional. Field Name which will be used to generate splits. Not required if Number of Splits to Generate is set to 1. |
Number of Splits to Generate | Yes |
| Optional. Number of splits to generate. Default is 1. |
Fetch Size | Yes | 6.6.0/1.7.0 | Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage. Default is 1000. |
Connection Timeout | No |
| Optional. The timeout value used for socket connect operations. If connecting to the server takes longer than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is disabled. Default is 100. |
Example
You want to read data from PostgreSQL database named "prod" that is running on "localhost" port 5432, as "postgres" user with "postgres" password. Ensure that the driver for PostgreSQL is installed. You can also provide driver name for some specific driver, otherwise "postgresql" will be used, and then configure plugin with:
Property | Value |
---|---|
Reference Name |
|
Driver Name |
|
Host |
|
Port |
|
Database |
|
Import Query |
|
Number of Splits to Generate |
|
Username |
|
Password |
|
For example, if the 'id' column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:
field name | type |
---|---|
id | int |
name | string |
string | |
phone | string |
Data Type Mapping
All PostgreSQL specific data types mapped to string and can have multiple input formats and one 'canonical' output form. Refer to PostgreSQL data types documentation to figure out proper formats.
PostgreSQL Data Type | CDAP Schema Data Type | Comments |
---|---|---|
bigint | int |
|
bigserial | long |
|
bit(n) | string | string with '0' and '1' chars exact n length |
bit varying(n) | string | string with '0' and '1' chars max n length |
boolean | boolean |
|
bytea | bytes |
|
character | string |
|
character varying | string |
|
double precision | double |
|
integer | int |
|
numeric(precision, scale)/decimal(precision, scale) | decimal |
|
real | float |
|
smallint | int |
|
smallserial | int |
|
serial | int |
|
text | string |
|
date | date |
|
time [ (p) ] [ without time zone ] | time |
|
time [ (p) ] with time zone | string |
|
timestamp [ (p) ] [ without time zone ] | timestamp |
|
timestamp [ (p) ] with time zone | timestamp | stored in UTC format in database |
xml | string |
|
tsquery | string |
|
tsvector | string |
|
uuid | string |
|
box | string |
|
cidr | string |
|
circle | string |
|
inet | string |
|
interval | string |
|
json | string |
|
jsonb | string |
|
line | string |
|
lseg | string |
|
macaddr | string |
|
macaddr8 | string |
|
money | string |
|
path | string |
|
point | string |
|
polygon | string |
|
Created in 2020 by Google Inc.