Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note: Datasets are deprecated and will be removed in CDAP 7.0.0.

The CDAP Dataset Microservices allows you to interact with datasets through HTTP. You can list, create, delete, and truncate datasets.

All methods or endpoints described in this API have a base URL (typically http://<host>:11015 or https://<host>:10443) that precedes the resource identifier, as described in the Microservices Conventions. These methods return a status code, as listed in the Microservices Status Codes.

Listing all Datasets

You can list all datasets in CDAP by issuing an HTTP GET request to the URL:

...

Code Block
{
   "name": "cdap.user.purchases",
   "type": "io.cdap.cdap.api.dataset.lib.ObjectStore",
   "description": "Purchases Dataset",
   "properties": {
      "schema":"...",
      "type":"..."
   },
   "datasetSpecs": {
      ...
   }
 }

Creating a Dataset

You can create a dataset by issuing an HTTP PUT request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-name

Name of the new dataset

type-name

Type of the new dataset

properties

Dataset properties, map of String to String

description

Dataset description

principal (optional)

Kerberos principal with which the dataset should be created; once a dataset has been created, this cannot be changed (optional)

HTTP Responses

Status Codes

Description

200 OK

Requested dataset was successfully created

403 Forbidden

The dataset already exist with a different Kerberos principal

404 Not Found

Requested dataset type was not found

409 Conflict

Dataset with the same name already exists

...

HTTP Request

PUT /v3/namespaces/default/data/datasets/mydataset

Body

{"typeName":"io.cdap.cdap.api.dataset.table.Table", "properties":{"dataset.table.ttl":"3600"}, "description":"My Dataset Description", "principal":"user/example.net@EXAMPLEKDC.NET"}

Description

Creates a dataset named mydataset of the type Table in the namespace default with the time-to-live property set to 1 hour, a description of My Dataset Description, owned by the principal identified by user/example.net@EXAMPLEKDC.NET.

Properties of an Existing Dataset

You can retrieve the properties with which a dataset was created or last updated by issuing an HTTP GET request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-name

Name of the existing dataset

HTTP Responses

Status Codes

Description

200 OK

Requested dataset was successfully updated

404 Not Found

Requested dataset instance was not found

...

Note that this returns the original properties that were submitted when the dataset was created or updated. You can use these properties to create a clone of the dataset, or as a basis for updating some properties of this dataset without modifying the remaining properties.

Metadata of an Existing Dataset

You can retrieve the metadata with which a dataset was created by issuing an HTTP GET request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-name

Name of the existing dataset

HTTP Responses

Status Codes

Description

200 OK

Metadata for the requested dataset instance was successfully returned

404 Not Found

Requested dataset instance was not found

...

Code Block
{
  "spec": {
    "name": "ownedDataset",
    "type": "datasetType1",
    "originalProperties": {},
    "properties": {},
    "datasetSpecs": {}
  },
  "type": {
    "name": "datasetType1",
    "modules": [
      {
        "name": "module1",
        "className": "io.cdap.cdap.data2.datafabric.dataset.service.TestModule1",
        "jarLocationPath": "/path/data/module1/archive/module1.jar",
        "types": [
          "datasetType1"
        ],
        "usesModules": [],
        "usedByModules": []
      }
    ]
  },
  "principal": "user/example.net@EXAMPLEKDC.NET"
}

Updating an Existing Dataset

You can update an existing dataset's table and properties by issuing an HTTP PUT request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-name

Name of the existing dataset

HTTP Responses

Status Codes

Description

200 OK

Requested dataset was successfully updated

404 Not Found

Requested dataset instance was not found

...

HTTP Request

PUT /v3/namespaces/default/data/datasets/mydataset/properties

Body

{"dataset.table.ttl":"7200"}

Description

For the mydataset of type Table of the namespace default, update the dataset and its time-to-live property to 2 hours

Deleting a Dataset

You can delete a dataset by issuing an HTTP DELETE request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-name

Dataset name

HTTP Responses

Status Codes

Description

200 OK

Dataset was successfully deleted

404 Not Found

Dataset named dataset-name could not be found

...

HTTP Request

DELETE /v3/namespaces/default/data/datasets/mydataset

Description

Deletes the dataset mydataset in the namespace default

Deleting all Datasets

If the property enable.unrecoverable.reset in cdap-site.xml is set to true, you can delete all Datasets (in a namespace) by issuing an HTTP DELETE request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

HTTP Responses

Status Codes

Description

200 OK

All Datasets were successfully deleted

403 Forbidden

Property to enable unrecoverable methods is not enabled

409 Conflict

Programs are currently running in the namespace

...

This method must be exercised with extreme caution, as there is no recovery from it.

Truncating a Dataset

You can truncate a dataset by issuing an HTTP POST request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-name

Dataset name

HTTP Responses

Status Codes

Description

200 OK

Dataset was successfully truncated

Datasets used by an Application

You can retrieve a list of datasets used by an application by issuing a HTTP GET request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

app-id

Application ID

HTTP Responses

Status Codes

Description

200 OK

Request was successful

Datasets used by a Program

You can retrieve a list of datasets used by a program by issuing a HTTP GET request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

app-id

Application ID

program-type

Program type, one of mapreduceservicesspark, or workflows

program-id

Program ID

HTTP Responses

Status Codes

Description

200 OK

Request was successful

Programs using a Dataset

You can retrieve a list of programs that are using a dataset by issuing a HTTP GET request to the URL:

...

Parameter

Description

namespace-id

Namespace ID

dataset-id

Dataset ID

HTTP Responses

Status Codes

Description

200 OK

Request was successful

...