CDAP offers change data capture via three different approaches
Golden gate for Oracle
Log miner for Oracle
Change tracking for SQL server
All these CDC mechanisms are supported via Realtime data pipelines and the plugins are available from Hub. The CDC solution currently runs on Spark 1.x and has experimental support for BigTable.
Use case(s)
The scope of work involves making the CDC solution work with Spark 2.x and being able to write to BigTable. The performance numbers throughput and latency should be published for these two destinations with all the three CDC approaches.
ETL developers should be able to set up realtime pipelines to write data to BigTable/BigQuery
Users should get field level lineage for the source and sink that is being used
Reference documentation should be updated to account for the changes
The solution should run with all versions of Spark 2.x
Integration tests for specific plugins should be added in the test repos
Reference document should be updated for the CDC plugins
Deliverables
Source code in cask-solution/cdc repo
Performance tests for the three approaches with BigTable
Integration test code
Relevant documentation in the source repo and reference documentation section in plugin