Table of Contents |
---|
Introduction
Metadata is data about datedata, in other words, data that describes other data. There are many kinds of meta data, including:
...
- associate meta data with a file
- associate meta data with a field of a dataset (’s schema)
- retrieve meta data for non-CDAP entities
- search meta data for non-CDAP entities
- retrieve the change history for all meta data of an entity (and its sub-entities)
Lineage
- File to file lineage
- Field lineage
- collect per plugin/transform/directive
- present as graph or similar navigable UI
Pipeline
- propagate meta data from source to sink
- map input files to output files 1:1
- conditional processing based on meta data
- explicitly set meta data for en entity
- associate processing metrics as meta data for the sink
- define meta data based on condition
Integrations
- query meta data for an entity from an external meta data system
- publish meta data to an external meta data system
- all meta data operations via message bus
- batch import/export of meta data (only changes)
- authorization for meta data through Ranger/Sentry/external auth provider
Current Roadmap
5.0:
- Field-level meta data : Metadata Custom Entities and Authorization
- Field-level lineage
5.1:
- File/Partition/custom entity meta data
- Integration with external meta data systems
5.2:
- Metadata provenance
- Operational metadata
- Catalog of all data by metadata