Google CloudSQL MySQL Batch Source

Reads from a Cloud SQL for MySQL instance using a configurable SQL query. Outputs one record for each row returned by the query. For example, you may want to create daily snapshots of a database table by using this source and writing to a partitioned table on BigQuery.

Configuration

Property

Macro Enabled?

Version Introduced

Description

Property

Macro Enabled?

Version Introduced

Description

Use Connection

No

6.7.0/1.8.0

Optional. Whether to use an existing connection. If you use a connection, connection related properties do not appear in the plugin properties.

Connection

Yes

6.7.0/1.8.0

Optional. Name of the connection to use. Project and service account information will be provided by the connection. You can also use the macro function ${conn(connection_name)}

JDBC Driver Name

No

 

Required. Name of the JDBC driver to use.

Default is cloudsql-mysql.

CloudSQL Instance Type

No

 

Optional. Whether the CloudSQL instance to connect to is private or public. 

Default is Public.

Connection Name

Yes
(6.10.0)

 

Required. The CloudSQL instance to connect to in the format <PROJECT_ID>:<REGION>:<INSTANCE_NAME>. Can be found in the instance Overview page.

Port

Yes

6.9.0/1.10.5

Optional. Port that MySQL is running on.

Default is 3306.

Username

Yes

 

Optional. User identity for connecting to the specified database.

Password

Yes

 

Optional. Password to use to connect to the specified database.

Connection Arguments

Yes

 

Optional. A list of arbitrary string key/value pairs as connection arguments. These arguments will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

Reference Name

No

 

Required. Name used to uniquely identify this source for lineage, annotating metadata, etc.

Database

Yes
(6.10.0/1.10.5)

 

Required. MySQL database name.

Import Query

Yes

 

Required. The SELECT query to use to import data from the specified table. You can specify an arbitrary number of columns to import, or import all columns using *. The Query should contain the ‘$CONDITIONS’ string. For example, ‘SELECT * FROM table WHERE $CONDITIONS’. The ‘$CONDITIONS’ string will be replaced by Split Column field limits specified by the Bounding Query. The ‘$CONDITIONS’ string is not required if Number of Splits is set to 1.

Bounding Query

Yes

 

Bounding Query should return the minimum and maximum of the values of the Split Column field. For example, ‘SELECT MIN(id),MAX(id) FROM table’. Not required if Number of Splits is set to 1.

Split Column

Yes

 

Field Name which will be used to generate splits. Not required if Number of Splits is set to 1.

Number of Splits

Yes

 

Number of splits to generate.

Fetch Size

Yes

6.6.0/1.7.0

Optional. The number of rows to fetch at a time per split. Larger Fetch Size can result in faster import with the trade-off of higher memory usage.

Default is 1000.

Data Type Mapping

MySQL Data Type

CDAP Schema Data Type

MySQL Data Type

CDAP Schema Data Type

BIT

boolean

TINYINT

int

BOOL, BOOLEAN

boolean

SMALLINT

int

MEDIUMINT

double

INT, INTEGER

int

BIGINT

long

FLOAT

float

DOUBLE

double

DECIMAL

decimal

DATE

date

DATETIME

timestamp

TIMESTAMP

timestamp

TIME

time

YEAR

date

CHAR

string

VARCHAR

string

BINARY

bytes

VARBINARY

bytes

TINYBLOB

bytes

TINYTEXT

string

BLOB

bytes

TEXT

string

MEDIUMBLOB

bytes

MEDIUMTEXT

string

LONGBLOB

bytes

LONGTEXT

string

ENUM

string

SET

string

Examples

Connecting to a public CloudSQL MySQL instance

Suppose you want to read data from CloudSQL MySQL database named “prod”, as “root” user with “root” password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies here), then configure plugin with:

Property

Value

Property

Value

Reference Name

src1

Driver Name

cloudsql-mysql

Database

prod

CloudSQL Instance Type

Public

Connection Name

[PROJECT_ID]:[REGION]:[INSTANCE_NAME]

Import Query

"select id, name, email, phone from users;"

Number of Splits

1

Username

root

Password

root

For example, if the ‘id’ column is a primary key of type int and the other columns are non-nullable varchars, output records will have this schema:

Field Name

Type

Field Name

Type

id

int

name

string

email

string

phone

string

Connecting to a private CloudSQL MySQL instance

If you want to connect to a private CloudSQL MySQL instance, create a Compute Engine VM that runs the CloudSQL Proxy docker image using the following command:

# Set the environment variables export PROJECT=[project_id] export REGION=[vm-region] export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit 1 --uri --project=${PROJECT}| sed 's/.*\///'` export SUBNET=[vpc-subnet-name] export NAME=[gce-vm-name] export MYSQL_CONN=[mysql-instance-connection-name] # Create a Compute Engine VM gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME} --zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address --metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306 gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy -instances=${MYSQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE --scopes=https://www.googleapis.com/auth/cloud-platform --image=cos-69-10895-385-0 --image-project=cos-cloud

Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using:

# Get the VM internal IP export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} | grep "networkIP" | awk '{print $2}'` # Promote the VM internal IP to static IP gcloud compute addresses create mysql-proxy --addresses ${IP} --region ${REGION} --subnet ${SUBNET}

Get the latest version of the CloudSQL socket factory jar with driver and dependencies from here, and then configure plugin with:

Property

Value

Property

Value

Reference Name

src1

Driver Name

cloudsql-mysql

Database

prod

CloudSQL Instance Type

Private

Connection Name

[PROJECT_ID]:[REGION]:[INSTANCE_NAME]

Import Query

"select id, name, email, phone from users;"

Number of Splits

1

Username

root

Password

root

 

 

Created in 2020 by Google Inc.