Note: Datasets are deprecated and will be removed in CDAP 7.0.0.
Note: Datasets are deprecated and will be removed in CDAP 7.0.0.
The CDAP Dataset Microservices allows you to interact with datasets through HTTP. You can list, create, delete, and truncate datasets.
All methods or endpoints described in this API have a base URL (typically http://<host>:11015
or https://<host>:10443
) that precedes the resource identifier, as described in the Microservices Conventions. These methods return a status code, as listed in the Microservices Status Codes.
You can list all datasets in CDAP by issuing an HTTP GET request to the URL:
GET /v3/namespaces/<namespace-id>/data/datasets |
Parameter | Description |
---|---|
| Namespace ID |
The response body will contain a JSON-formatted list of the existing datasets:
{ "name": "cdap.user.purchases", "type": "io.cdap.cdap.api.dataset.lib.ObjectStore", "description": "Purchases Dataset", "properties": { "schema":"...", "type":"..." }, "datasetSpecs": { ... } } |
You can create a dataset by issuing an HTTP PUT request to the URL:
PUT /v3/namespaces/<namespace-id>/data/datasets/<dataset-name> |
with JSON-formatted name of the dataset type, properties, and description in a body:
{ "typeName": "<type-name>", "properties":{ "<properties>" }, "description": "Dataset Description", "principal": "user/example.net@EXAMPLEKDC.NET" } |
Parameter | Description |
---|---|
| Namespace ID |
| Name of the new dataset |
| Type of the new dataset |
| Dataset properties, map of String to String |
| Dataset description |
| Kerberos principal with which the dataset should be created; once a dataset has been created, this cannot be changed (optional) |
Status Codes | Description |
---|---|
| Requested dataset was successfully created |
| The dataset already exist with a different Kerberos principal |
| Requested dataset type was not found |
| Dataset with the same name already exists |
Example
HTTP Request |
|
---|---|
Body |
|
Description | Creates a dataset named mydataset of the type |
You can retrieve the properties with which a dataset was created or last updated by issuing an HTTP GET request to the URL:
GET /v3/namespaces/<namespace-id>/data/datasets/<dataset-name>/properties |
Parameter | Description |
---|---|
| Namespace ID |
| Name of the existing dataset |
Status Codes | Description |
---|---|
| Requested dataset was successfully updated |
| Requested dataset instance was not found |
The response, if successful, will contain the JSON-formatted properties:
{ "key1":"value1", "key2":"value2", ... } |
Note that this returns the original properties that were submitted when the dataset was created or updated. You can use these properties to create a clone of the dataset, or as a basis for updating some properties of this dataset without modifying the remaining properties.
You can retrieve the metadata with which a dataset was created by issuing an HTTP GET request to the URL:
GET /v3/namespaces/<namespace-id>/data/datasets/<dataset-name> |
Parameter | Description |
---|---|
| Namespace ID |
| Name of the existing dataset |
Status Codes | Description |
---|---|
| Metadata for the requested dataset instance was successfully returned |
| Requested dataset instance was not found |
The response body will contain JSON-formatted metadata of the existing dataset:
{ "spec": { "name": "ownedDataset", "type": "datasetType1", "originalProperties": {}, "properties": {}, "datasetSpecs": {} }, "type": { "name": "datasetType1", "modules": [ { "name": "module1", "className": "io.cdap.cdap.data2.datafabric.dataset.service.TestModule1", "jarLocationPath": "/path/data/module1/archive/module1.jar", "types": [ "datasetType1" ], "usesModules": [], "usedByModules": [] } ] }, "principal": "user/example.net@EXAMPLEKDC.NET" } |
You can update an existing dataset's table and properties by issuing an HTTP PUT request to the URL:
PUT /v3/namespaces/<namespace-id>/data/datasets/<dataset-name>/properties |
with JSON-formatted properties in the body:
{ "key1":"value1", "key2":"value2", ... } |
Notes:
The dataset must already exist.
The properties given in this request replace all existing properties; that is, if you have set other properties for this table, such as time-to-live (dataset.table.ttl
), you must also include those properties in the update request.
You can retrieve the existing properties using the Properties of an Existing Dataset and use that as the basis for constructing your request.
Once a dataset has been created, the principal
cannot be changed.
Parameter | Description |
---|---|
| Namespace ID |
| Name of the existing dataset |
Status Codes | Description |
---|---|
| Requested dataset was successfully updated |
| Requested dataset instance was not found |
Example
HTTP Request |
|
---|---|
Body |
|
Description | For the mydataset of type |
You can delete a dataset by issuing an HTTP DELETE request to the URL:
DELETE /v3/namespaces/<namespace-id>/data/datasets/<dataset-name> |
Parameter | Description |
---|---|
| Namespace ID |
| Dataset name |
Status Codes | Description |
---|---|
| Dataset was successfully deleted |
| Dataset named dataset-name could not be found |
Example
HTTP Request |
|
---|---|
Description | Deletes the dataset mydataset in the namespace default |
If the property enable.unrecoverable.reset
in cdap-site.xml
is set to true
, you can delete all Datasets (in a namespace) by issuing an HTTP DELETE request to the URL:
DELETE /v3/unrecoverable/namespaces/<namespace-id>/datasets |
Parameter | Description |
---|---|
| Namespace ID |
Status Codes | Description |
---|---|
| All Datasets were successfully deleted |
| Property to enable unrecoverable methods is not enabled |
| Programs are currently running in the namespace |
This command will only work if all programs in the namespace are not running.
If the property enable.unrecoverable.reset
in cdap-site.xml
is not set to true
, this operation will return a Status Code 403 Forbidden
. Note that this operation can only be performed if all programs are stopped. If there's at least one program that is running, this operation will return a Status Code 409 Conflict
.
This method must be exercised with extreme caution, as there is no recovery from it.
You can truncate a dataset by issuing an HTTP POST request to the URL:
POST /v3/namespaces/<namespace-id>/data/datasets/<dataset-name>/admin/truncate |
This will clear the existing data from the dataset. This cannot be undone.
Parameter | Description |
---|---|
| Namespace ID |
| Dataset name |
Status Codes | Description |
---|---|
| Dataset was successfully truncated |
You can retrieve a list of datasets used by an application by issuing a HTTP GET request to the URL:
GET /v3/namespaces/<namespace-id>/apps/<app-id>/datasets |
Parameter | Description |
---|---|
| Namespace ID |
| Application ID |
Status Codes | Description |
---|---|
| Request was successful |
You can retrieve a list of datasets used by a program by issuing a HTTP GET request to the URL:
GET /v3/namespaces/<namespace-id>/apps/<app-id>/<program-type>/<program-id>/datasets |
Parameter | Description |
---|---|
| Namespace ID |
| Application ID |
| Program type, one of |
| Program ID |
Status Codes | Description |
---|---|
| Request was successful |
You can retrieve a list of programs that are using a dataset by issuing a HTTP GET request to the URL:
GET /v3/namespaces/<namespace-id>/data/datasets/<dataset-id>/programs |
Parameter | Description |
---|---|
| Namespace ID |
| Dataset ID |
Status Codes | Description |
---|---|
| Request was successful |