PostgreSQL Batch Source

Plugin version: 1.10.0

Reads from a PostgreSQL using a configurable SQL query. Outputs one record for each row returned by the query. For example, you might want to create daily snapshots of a database table by using this source and writing to a Redshift table.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use connection

No

6.7.0/1.8.0

Optional. Whether to use a connection. If a connection is used, you do not need to provide the credentials.

Connection

Yes

6.7.0/1.8.0

Required. Name of the connection to use. Project and service account information will be provided by the connection. You also can use the macro function ${conn(connection-name)}.

JDBC Driver Name

No

 

Required. Select the JDBC driver to use.

Default is postgresql.

Host

Yes

 

Required. Host that PostgreSQL is running on.

Default is localhost.

Port

Yes

 

Required. Port that PostgreSQL is running on.

Default is 5432.

Username

Yes

 

Optional. User identity for connecting to the specified database.

Password

Yes

 

Optional. Password to use to connect to the specified database.

Connection Arguments

Yes

 

Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

Database

Yes

 

Required. PostgreSQL database name.

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Import Query

Yes

 

Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'. The '$CONDITIONS' string will be replaced by Split-By Field Name field limits specified by the bounding query. The '$CONDITIONS' string is not required if Number of Splits to Generate is set to 1.

Bounding Query

Yes

 

Optional. Bounding Query should return the min and max of the values of the Split-By Field Name field. For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if Number of Splits to Generate is set to 1.

Split-By Field Name

Yes

 

Optional. Field Name which will be used to generate splits. Not required if Number of Splits to Generate is set to 1.

Number of Splits to Generate

Yes

 

Optional. Number of splits to generate.

Default is 1.

Fetch Size

Yes

6.6.0/1.7.0

Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.

Default is 1000.

Connection Timeout

No

 

Optional. The timeout value used for socket connect operations. If connecting to the server takes longer than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is disabled.

Default is 100.

Example

You want to read data from PostgreSQL database named "prod" that is running on "localhost" port 5432, as "postgres" user with "postgres" password. Ensure that the driver for PostgreSQL is installed. You can also provide driver name for some specific driver, otherwise "postgresql" will be used, and then configure plugin with:

Property

Value

Property

Value

Reference Name

src1

Driver Name

postgresql

Host

localhost

Port

5432

Database

prod

Import Query

select id, name, email, phone from users

Number of Splits to Generate

1

Username

postgres

Password

postgres

For example, if the 'id' column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

field name

type

field name

type

id

int

name

string

email

string

phone

string

Data Type Mapping

All PostgreSQL specific data types mapped to string and can have multiple input formats and one 'canonical' output form. Refer to PostgreSQL data types documentation to figure out proper formats.

PostgreSQL Data Type

CDAP Schema Data Type

Comments

PostgreSQL Data Type

CDAP Schema Data Type

Comments

bigint

int

 

bigserial

long

 

bit(n)

string

string with '0' and '1' chars exact n length

bit varying(n)

string

string with '0' and '1' chars max n length

boolean

boolean

 

bytea

bytes

 

character

string

 

character varying

string

 

double precision

double

 

integer

int

 

numeric(precision, scale)/decimal(precision, scale)

decimal

 

real

float

 

smallint

int

 

smallserial

int

 

serial

int

 

text

string

 

date

date

 

time [ (p) ] [ without time zone ]

time

 

time [ (p) ] with time zone

string

 

timestamp [ (p) ] [ without time zone ]

timestamp

 

timestamp [ (p) ] with time zone

timestamp

stored in UTC format in database

xml

string

 

tsquery

string

 

tsvector

string

 

uuid

string

 

box

string

 

cidr

string

 

circle

string

 

inet

string

 

interval

string

 

json

string

 

jsonb

string

 

line

string

 

lseg

string

 

macaddr

string

 

macaddr8

string

 

money

string

 

path

string

 

point

string

 

polygon

string

 

Created in 2020 by Google Inc.