Metadata User Guide

Metadata is an important capability of CDAP. It helps show how datasets and programs are related to each other and helps in understanding the impact of a change before the change is made.

These features provide full visibility into the impact of changes while providing an audit trail of access to datasets by programs and applications. Together, they give a clear view when identifying trusted data sources and enable the ability to track the trail of sensitive data.

CDAP captures metadata from many different sources, as well as those specified by a user, on different entities and objects. The container model of CDAP provides for the seamless aggregation of a wide variety of machine-generated metadata that is automatically associated with datasets. This gives developers and data scientists flexibility when innovating and building solutions on Hadoop, without the worry of maintaining compliance and governance for every application.

CDAP metadata, consisting of properties (a list of key-value pairs) or tags (a list of keys), can be used to annotate artifacts, applications, programs, datasets, views, and custom entities.

Using the CDAP Metadata Microservices, you can set, retrieve, and delete these metadata annotations.

Metadata keys, values, and tags must conform to the CDAP supported characters and are limited to 50 characters in length. The entire metadata object associated with a single entity is limited to 10K bytes in size.

  • System Metadata: While CDAP allows users to tag entities with metadata properties and tags, it also tags entities with system properties and tags (system metadata) by default.

  • Discovery and Lineage: Metadata can be used to tag different CDAP components so that they are easily discovered, identifiable, and managed. Lineage shows, for a specified time range, all data access of the entity and details of where that access originated from.

  • Audit Logging: Provides a chronological ledger containing evidence of operations or changes on CDAP entities.

Created in 2020 by Google Inc.