Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Metadata consists of properties (a list of key-value pairs) or tags (a list of keys). Metadata and their use are described in the Metadata and Lineage section.

The Microservices is divided into these sections:

  • metadata Metadata properties

  • metadata Metadata tags

  • searching Searching metadata

  • viewing Viewing lineage

  • field Field level lineage

  • metadata Metadata for a run of a program

Metadata keys, values, and tags must conform to the CDAP alphanumeric extra extended character set, and are limited to 50 characters in length. The entire metadata object associated with a single entity is limited to 10K bytes in size.

...

All methods or endpoints described in this API have a base URL (typically http://<host>:11015 or https://<host>:10443) that precedes the resource identifier, as described in the Microservices Conventions. These methods return a status code, as listed in the Microservices Status Codes.

Note: Datasets are deprecated and will be removed in CDAP 7.0.0.

Metadata Properties

Annotating Properties

...

Parameter

Description

namespace-id

Namespace ID.

entity-details

Hierarchical key-value representation of the entity.

app-id

Name of the application.

program-type

One of mapreducesparkworkflowsservices, or workers.

program-id

Name of the program.

artifact-id

Name of the artifact.

artifact-version

Version of the artifact.

dataset-id

Name of the dataset.

field-name

Name of the field.

HTTP Responses

Status Codes

Description

200 OK

The properties were set.

Note: When using this API, properties can be added to the metadata of the specified entity only in the user scope.

...

Parameter

Description

namespace-id

Namespace ID.

entity-details

Hierarchical key-value representation of the entity.

app-id

Name of the application.

program-type

One of mapreducesparkworkflowsservices, or workers.

program-id

Name of the program.

artifact-id

Name of the artifact.

artifact-version

Version of the artifact.

dataset-id

Name of the dataset.

field-name

Name of the field.

scope

Optional scope filter. If not specified, properties in the user and system scopes are returned. Otherwise, only properties in the specified scope are returned.

...

Status Codes

Description

200 OK

The properties requested were returned as a JSON string in the body of the response which can be empty if there are no properties associated with the entity, or the entity does not exist.

Deleting Properties

To delete all user metadata properties for an application, dataset, or other entities including custom entities, submit an HTTP DELETE request:

...

Parameter

Description

namespace-id

Namespace ID.

entity-details

Hierarchical key-value representation of the entity.

app-id

Name of the application.

program-type

One of mapreducesparkworkflowsservices, or workers.

program-id

Name of the program.

artifact-id

Name of the artifact.

artifact-version

Version of the artifact.

dataset-id

Name of the dataset.

field-name

Name of the field.

key

Metadata property key.

HTTP Responses

Status Codes

Description

200 OK

The method was successfully called, and the properties were deleted, or in the case of a specific key, were either deleted or the key was not present, or the entity itself was not present.

...

Parameter

Description

namespace-id

Namespace ID.

entity-details

Hierarchical key-value representation of the entity.

app-id

Name of the application.

program-type

One of mapreducesparkworkflowsservices, or workers.

program-id

Name of the program.

artifact-id

Name of the artifact.

artifact-version

Version of the artifact.

dataset-id

Name of the dataset.

field-name

Name of the field.

HTTP Responses

Status Codes

Description

200 OK

The tags were set.

Note: When using this API, tags can be added to the metadata of the specified entity only in the user scope.

...

Parameter

Description

namespace-id

Namespace ID.

entity-details

Hierarchical key-value representation of the entity.

app-id

Name of the application.

program-type

One of mapreducesparkworkflowsservices, or workers.

program-id

Name of the program.

artifact-id

Name of the artifact.

artifact-version

Version of the artifact.

dataset-id

Name of the dataset.

field-name

Name of the field.

scope

Optional scope filter. If not specified, properties in the user and system scopes are returned. Otherwise, only properties in the specified scope are returned.

...

Status Codes

Description

200 OK

The tags requested were returned as a JSON string in the body of the response which can be empty if there are no tags associated with the entity or entity does not exist.

Removing Tags

To delete all user metadata tags for an application, dataset, or other entities including custom entities, submit an HTTP DELETE request:

...

Parameter

Description

namespace-id

Namespace ID.

entity-details

Hierarchical key-value representation of the entity.

app-id

Name of the application.

program-type

One of mapreducesparkworkflowsservices, or workers.

program-id

Name of the program.

artifact-id

Name of the artifact.

artifact-version

Version of the artifact.

dataset-id

Name of the dataset.

field-name

Name of the field.

tag

Metadata tag.

HTTP Responses

Status Codes

Description

200 OK

The method was successfully called, and the tags were deleted, or in the case of a specific tag, was either deleted or the tag was not present, or the entity itself was not present.

Note: When using this API, only tags in the user scope can be deleted.

...

Parameter

Description

namespace-id

Namespace ID.

query

Query term, as described below. Query terms are case-insensitive.

entity-type

Restricts the search to either all or specified entity types: allartifactappdatasetprogramview.

option

Options for controlling cursors, limits, offsets, the inclusion of hidden and custom entities, and sorting:

Option NameOption Value, Description, and NotessortThe sorting order for the results being returned. Default is to sort search results as a function of relative weights for the specified search query. Specify the sort order as the field name followed by the sort order (either asc or desc) with a space separating the two. Using URL-encoding, an example: &sort=creation-time+asc. Note that this field is only applicable when the search query is *.offsetThe number of search results to skip before including them in the returned results. Default is 0.limitThe number of metadata search entities to return in the results. By default, there is no limit.cursorCursor to move to in the search results. This would be a value returned in the cursors field of a response of a previous metadata search request. Note that this field is only applicable when the search query is *.numCursorsDetermines the number of chunks of search results of size limit to fetch after the first chunk of size limit. This parameter can be used to roughly estimate the total number of results that match the search query. Only used when the search query is *.showHiddenBy default, metadata search hides entities whose name starts with an _ (underscore) from the search results. Set this to true to include these hidden entities in search results. Default is false.showCustomBy default, metadata search hides custom entities from the search results for backward compatibility. Set this to true to include these custom entities in search results. Default is false.entityScopeThe scope of entities for the metadata search. By default, all entities will be returned. Set this to USER to include only user entities; set this to SYSTEM to include only system entities.

Format for an option: &<option-name>=<option-value>

...

Status Codes

Description

200 OK

Entity ID and metadata of entities that match the query and entity type(s) are returned in the body of the response.

Query Terms

CDAP supports prefix-based search of metadata properties and tags across both user and system scopes. Search metadata of entities by using either a complete or partial name followed by an asterisk *.

Search for properties and tags by specifying one of:

  • a complete Complete property key-value pair, separated by a colon, such as type:production

  • a complete Complete property key with a partial value, such as type:prod*

  • a complete Complete tags key with a complete or partial value, such as tags:production or tags:prod* to search for tags only

  • a complete Complete or partial value, such as prod*; this will return both properties and tags

  • multiple Multiple search terms separated by space, such as type:prod* author:joe; this will return entities having either of the terms in their metadata.

Since CDAP also annotates system metadata to entities by default as mentioned at System Metadata, the following special search queries are also supported:

  • artifacts Artifacts or applications containing a specific plugin: plugin:<plugin-name>

  • programs Programs with a specific mode: batch or realtime

  • applications Applications with a specific program type:service:<service-name>mapreduce:<mapreduce-name>spark:<spark-name>worker:<worker-name>workflow:<workflow-name>

  • datasets Datasets or views with schema field:

    • field name only: field-name

    • field name with a type: <field-name>:<field-type>, where field-type can be:

      • simple types: intlongbooleanfloatdoublebytesstringenum

      • complex types: arraymaprecordunion

...

To view the lineage of a dataset or , submit an HTTP GET request:

...

Parameter

Description

namespace-id

Namespace ID.

entity-type

datasets

entity-id

Name of the dataset 

start-ts

Starting time-stamp of lineage (inclusive), in seconds. Supports nownow-1h, etc. syntax.

end-ts

Ending time-stamp of lineage (exclusive), in seconds. Supports nownow-1h, etc. syntax.

levels

Number of levels of lineage output to return. Defaults to 10. Determines how far back the provenance of the data in the lineage chain is calculated.

collapse

An optional set of collapse types (any of accessrun, or component) by which to collapse the lineage output. By default, lineage output is not collapsed. Multiple collapse parameters are supported.

rollup

An optional rollup type to use to rollup the lineage output. By default, lineage output is not rolled up. Currently supports the value workflow.

...

For more information about collapsing lineage output, please refer to see the following section below on Collapsing Lineage Output.

...

Parameter

Description

namespace-id

Namespace ID.

dataset-id

Name of the dataset.

start-ts

Starting time-stamp (inclusive), in seconds. Supports nownow-1h, etc. syntax.

end-ts

Ending time-stamp (exclusive), in seconds. Supports nownow-1h, etc. syntax.

prefix

Optional prefix, when provided only fields that have given prefix will be returned.

includeCurrent

Optional flag, when set to true the current fields of the dataset will be be included irrespective of whether they have any lineage information or not.

...

Status Codes

Description

200 OK

Fields of dataset are returned as a list of strings in the body of the response.

400 BAD REQUEST

Failure to parse the time range provided.

Field Lineage Summary

Gets the field lineage summary for a specified field of a dataset. The field lineage summary consists of the sets of datasets and their respective fields used to compute the specified field of a dataset:

...

Parameter

Description

namespace-id

Namespace ID.

dataset-id

Name of the dataset.

field-name

Name of the field.

start-ts

Starting time-stamp (inclusive), in seconds. Supports nownow-1h, etc. syntax.

end-ts

Ending time-stamp (exclusive), in seconds. Supports nownow-1h, etc. syntax.

direction

incoming, to return the set of dataset and fields which participated in the computation of the given field. outgoing, to return the set of dataset and fields to whose computation the given field participated. both, to return both incoming and outgoing.

...

Status Codes

Description

200 OK

Fields of dataset are returned as a list of strings in the body of the response.

400 BAD REQUEST

Failure to parse the time range provided.

Field Lineage Operations

Gets the details of operations responsible for computation of a specified field of a dataset for a specified range of time:

...

Parameter

Description

namespace-id

Namespace ID.

dataset-id

Name of the dataset.

field-name

Name of the field.

start-ts

Starting time-stamp (inclusive), in seconds. Supports nownow-1h, etc. syntax.

end-ts

Ending time-stamp (exclusive), in seconds. Supports nownow-1h, etc. syntax.

direction

incoming, to return the operations which participated in the computation of the given field. outgoing, to return the operations in which this field participated. both, to return both incoming and outgoing.

...

Status Codes

Description

200 OK

Fields of dataset are returned as a list of strings in the body of the response.

400 BAD REQUEST

Failure to parse the time range provided.

Metadata for Custom Entities

...

Custom Entities are represented as a hierarchical key-value pair and can optionally have a an explicitly defined type.

If a type is not specified then the last key in the hierarchy is considered as the type.

...