Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

 

Metadata is data about datedata, in other words, data that describes other data. There are many kinds of meta data, including:

...

  • associate meta data with a file
  • associate meta data with a field of a dataset (’s schema)
  • retrieve meta data for non-CDAP entities
  • search meta data for non-CDAP entities
  • retrieve the change history for all meta data of an entity (and its sub-entities)
Lineage
  • File to file lineage
  • Field lineage
    • collect per plugin/transform/directive
    • present as graph or similar navigable UI
Pipeline
  • propagate meta data from source to sink
  • map input files to output files 1:1
  • conditional processing based on meta data
  • explicitly set meta data for en entity
  • associate processing metrics as meta data for the sink
  • define meta data based on condition
Integrations
  • query meta data for an entity from an external meta data system
  • publish meta data to an external meta data system
  • all meta data operations via message bus
  • batch import/export of meta data (only changes)
  • authorization for meta data through Ranger/Sentry/external auth provider

Current Roadmap

5.0:
5.1:
  • File/Partition/custom entity meta data
  • Integration with external meta data systems
5.2:
  • Metadata provenance
  • Operational metadata 
  • Catalog of all data by metadata