Introduction

CDAP offers change data capture via three different approaches

Golden gate for Oracle
Log miner for Oracle
Change tracking for SQL server

All these CDC mechanisms are supported via Realtime data pipelines and the plugins are available from Hub. The CDC solution currently runs on Spark 1.x and has experimental support for BigTable.

Use case(s)

The scope of work involves making the CDC solution work with Spark 2.x and being able to write to BigTable. The performance numbers throughput and latency should be published for these two destinations with all the three CDC approaches.

ETL developers should be able to set up realtime pipelines to write data to BigTable/BigQuery
Users should get field level lineage for the source and sink that is being used
Reference documentation should be updated to account for the changes
The solution should run with all versions of Spark 2.x
Integration tests for specific plugins should be added in the test repos
Reference document should be updated for the CDC plugins

Deliverables

Source code in cask-solution/cdc repo
Performance tests for the three approaches with BigTable
Integration test code
Relevant documentation in the source repo and reference documentation section in plugin

Relevant links

Existing CDC plugin code: https://github.com/cask-solutions/cdc
Experimental CDC for Big Table in a branch: https://github.com/cask-solutions/cdc/tree/bigtable-cdc-sink
Field level lineage: https://docs.cdap.io/cdap/5.1.0-SNAPSHOT/en/developer-manual/metadata/field-lineage.html

Plugin Type

Batch Source
Batch Sink
Real-time Source
Real-time Sink
Action
Post-Run Action
Aggregate
Join
Spark Model
Spark Compute

Configurables

This section defines properties that are configurable for this plugin.

User Facing Name	Type	Description	Constraints

Design / Implementation Tips

Tip #1
Tip #2

Design

Approach(s)

Properties

Security

Limitation(s)

Future Work

Some future work – HYDRATOR-99999
Another future work – HYDRATOR-99999

Test Case(s)

Test case #1
Test case #2

Sample Pipeline

Please attach one or more sample pipeline(s) and associated data.

Pipeline #1

Pipeline #2

Table of Contents

Checklist

User stories documented
User stories reviewed
Design documented
Design reviewed
Feature merged
Examples and guides
Integration tests
Documentation for feature
Short video demonstrating the feature

CDC Solution Enhancements