Google CloudSQL PostgreSQL Sink
Writes records to a CloudSQL PostgreSQL table. Each record will be written to a row in the table. For example, you periodically build a recommendation model for products on your online store. The model is stored in a GCS bucket and you want to export the contents of the bucket to a CloudSQL PostgreSQL table where it can be served to your users.
Column names would be auto detected from input schema.
Configuration
Property | Macro Enabled? | Version Introduced | Description |
---|---|---|---|
Use Connection | No | 6.7.0/1.8.0 | Optional. Whether to use an existing connection. If you use a connection, connection related properties do not appear in the plugin properties. |
Connection | Yes | 6.7.0/1.8.0 | Optional. Name of the connection to use. Project and service account information will be provided by the connection. You can also use the macro function |
JDBC Driver Name | No |
| Required. Name of the JDBC driver to use. Default is cloudsql-postgresql. |
CloudSQL Instance Type | No |
| Optional. Whether the CloudSQL instance to connect to is private or public. Default is Public. |
Connection Name | Yes |
| Required. The CloudSQL instance to connect to in the format <PROJECT_ID>:<REGION>:<INSTANCE_NAME>. Can be found in the instance overview page. |
Port | Yes | 6.9.0/1.10.5 | Optional. Port that PostgreSQL is running on. |
Username | Yes |
| Optional. User identity for connecting to the specified database. |
Password | Yes |
| Optional. Password to use to connect to the specified database. |
Connection Arguments | Yes |
| Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations. |
Reference Name | No |
| Required. Name used to uniquely identify this sink for lineage, annotating metadata, etc. |
Database | Yes (6.9.0/1.10.5) |
| Required. CloudSQL PostgreSQL database name. |
Table Name | Yes |
| Required. Name of the table to export to. |
Transaction Isolation Level | Yes |
| Transaction isolation level for queries run by this sink. Default is TRANSACTION_READ_COMMITTED. |
Connection Timeout | Yes |
| The timeout value used for socket connect operations. If connecting to the server takes longer than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is disabled. |
Data Type Mapping
All PostgreSQL specific data types mapped to string and can have multiple input formats and one ‘canonical’ output form. Please, refer to PostgreSQL data types documentation to figure out proper formats.
PostgreSQL Data Type | CDAP Schema Data Type |
---|---|
bigint | long |
bigserial | long |
bit(n) | string |
bit varying(n) | string |
boolean | boolean |
bytea | bytes |
character | string |
character varying | string |
double precision | double |
integer | int |
numeric(precision, scale)/decimal(precision, scale) | decimal |
real | float |
smallint | int |
smallserial | int |
serial | int |
text | string |
date | date |
time [ (p) ] [ without time zone ] | time |
time [ (p) ] with time zone | string |
timestamp [ (p) ] [ without time zone ] | timestamp |
timestamp [ (p) ] with time zone | timestamp |
xml | string |
tsquery | string |
tsvector | string |
uuid | string |
box | string |
cidr | string |
circle | string |
inet | string |
interval | string |
json | string |
jsonb | string |
line | string |
lseg | string |
macaddr | string |
macaddr8 | string |
money | string |
path | string |
point | string |
polygon | string |
Examples
Connecting to a public CloudSQL PostgreSQL instance
You want to write output records to “users” table of CloudSQL PostgreSQL database named “prod”, as “postgres” user with “postgres” password. Get the latest version of the CloudSQL socket factory jar with driver and dependencies here), and then configure plugin with:
Property | Value |
---|---|
Reference Name | sink1 |
Driver Name | cloudsql-postsgresql |
Database | prod |
CloudSQL Instance Type | Public |
Connection Name | [PROJECT_ID]:[REGION]:[INSTANCE_NAME] |
Import Query | "select id, name, email, phone from users;" |
Number of Splits | 1 |
Username | postgresql |
Password | postgresql |
Connecting to a private CloudSQL PostgreSQL instance
If you want to connect to a private CloudSQL PostgreSQL instance, create a Compute Engine VM that runs the CloudSQL Proxy docker image using the following command:
# Set the environment variables
export PROJECT=[project_id]
export REGION=[vm-region]
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit
1 --uri --project=${PROJECT}| sed 's/.*\///'`
export SUBNET=[vpc-subnet-name]
export NAME=[gce-vm-name]
export POSTGRESQL_CONN=[postgresql-instance-connection-name]
# Create a Compute Engine VM
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME}
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy
-instances=${POSTGRESQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE
--scopes=https://www.googleapis.com/auth/cloud-platform
--image=cos-69-10895-385-0 --image-project=cos-cloud
Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using:
# Get the VM internal IP