We're updating the issue view to help you get more done. 

Update hydrator sources and sinks to enable tracking of external datasets

Description

We'll need to update all hydrator sources/sinks (both batch and realtime) to enable tracking of external datasets.

Release Notes

Introduced a 'referenceName' property (used for lineage, annotation metadata) in all external sources/sinks which needs to be set while using that plugin

Activity

Show:
Poorna Chandra
April 20, 2016, 6:25 PM
Edited

The change is basically for hydrator plugins to define a "tracking name" configuration parameter, and use the new APIs with name and alias -

  • BatchSourceContext.setInput(Input input)

  • BatchSinkContext.setOutput(Output output)

Poorna Chandra
April 20, 2016, 6:27 PM

Also, it could would be good to add an integration test after the sources/sinks are updated. We need to test that lineage gets recorded for a hydrator pipeline with external source and external sink.

Andreas Neumann
April 20, 2016, 8:18 PM

I am not sure whether "tracking" is the best term here. This will be used to create a dataset which will be used to uniquely refer to the source/sink for various purposes: Lineage, Metadata, Audit, perhaps even use it in programs.

Perhaps "reference name" is better?

Assignee

Gokul Gunasekaran

Reporter

Poorna Chandra

Labels

None

Docs Impact

None

UX Impact

None

Components

Fix versions

Priority

Blocker
Configure