Amazon AuroraDB MySQL plugin

Introduction

Amazon Aurora is a MySQL compatible database offered as a service. Users will have needs to write to AuroraDB or read from AuroraDB

Use-case

  • Users would like to batch build a data pipeline to read complete table from Amazon Aurora DB instance and write to BigTable. 
  • Users would like to batch build a data pipeline to perform upserts on AuroraDB tables in batch 
  • Users should get relevant information from the tool tip while configuring the AuroraDB source and AuroraDB sink
    • The tool tip for the connection string should be customized specific to the database. 
    • The tool tip should describe accurately what each field is used for
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation be available from the source and sink plugins

User Stories

  • User should be able to install AuroraDB MySQL source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does
  • Users should get field level lineage information for the AuroraDB MySQL source and sink 
  • Users should be able to setup a pipeline avoiding specifying redundant information
  • Users should get updated reference document for AuroraDB MySQL source and sink
  • Users should be able to read all the DB types

Deliverables 

  • Source code in data integrations org
  • Integration test code 
  • Relevant documentation in the source repo and reference documentation section in plugin

Relevant links 

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design / Implementation Tips

Design

  • It is suggested to place plugin code under database-plugin repository to reuse existing database capabilities.

Source Properties

User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI
Reference NameStringUniquely identified name for lineageRequired
Driver NameStringName of JDBC driver to use

Required

(defaults to mysql)

Cluster endpointStringURL of the current master instance of MySQL clusterRequired
PortNumberPort of MySQL cluster's master instance

Optional

(defaults to 3306)
DatabaseStringDatabase name to connectRequired
Import QueryStringQuery for import dataValid SQL query
UsernameStringDB usernameRequired
PasswordStringUser passwordRequired
Bounding QueryStringReturns max and min of split-By FiledValid SQL query
Split-By Field NameStringField name which will be used to generate splits
Number of Splits to GenerateNumberNumber of splits to generate




Connection ArgumentsKeyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties 

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html


Sink Properties

User Facing NameTypeDescriptionConstraints
LabelStringLabel for UI
Reference NameStringUniquely identified name for lineageRequired
Driver NameStringName of JDBC driver to use

Required

(defaults to mysql)

HostStringURL of the current master instance of MySQL clusterRequired
PortNumberPort of MySQL cluster's master instance

Optional

(defaults to 3306)

DatabaseStringDatabase name to connectRequired
UsernameStringDB usernameRequired
PasswordPasswordUser passwordRequired
Connection ArgumentsKeyvalue

A list of arbitrary string tag/value pairs as connection arguments, list of properties

https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html


Table NameStringName of a database table to write toRequried


Future Work

  • Amazon AuroraDB PostgreSQL plugin

Test Case(s)

  • Test case #1
  • Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data. 

Pipeline #1

Pipeline #2


Created in 2020 by Google Inc.