Query Microservices (Deprecated)

Warning: This topic is no longer supported.

Use the CDAP Query Microservices to submit SQL-like queries over datasets. Queries are processed asynchronously; to obtain query results, perform these steps:

  • first, submit the query;

  • then poll for the query's status until it is finished;

  • once finished, retrieve the result schema and the results;

  • finally, close the query to free the resources that it holds.

Additional details on querying can be found in the Data Exploration.

All methods or endpoints described in this API have a base URL (typically http://<host>:11015 or https://<host>:10443) that precedes the resource identifier, as described in the Microservices Conventions. These methods return a status code, as listed in the Microservices Status Codes.

Submitting a Query

To submit a SQL query, post the query string to the queries URL:

POST /v3/namespaces/<namespace-id>/data/explore/queries

Parameter

Description

Parameter

Description

namespace-id

Namespace ID

The body of the request must contain a JSON string of the form:

{ "query": "<SQL-query-string>" }

where SQL-query-string is the actual SQL query. If you are running a version of Hive that uses reserved keywords, and a column in your query is a Hive reserved keyword, you must enclose the column name in backticks.

For example:

{ "query": "select `date` from stream_events" }

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The query execution was successfully initiated, and the body will contain the query-handle used to identify the query in subsequent requests

400 Bad Request

The query is not well-formed or contains an error, such as a nonexistent table name

Comments

If the query execution was successfully initiated, the body of the response will contain a handle that can be used to identify the query in subsequent requests:

Example

HTTP Request

PUT /v3/namespaces/default/data/explore/queries

HTTP Body

{"query":"SELECT * FROM dataset_mydataset LIMIT 5"}

HTTP Response

{"handle":"57cf1b01-8dba-423a-a8b4-66cd29dd75e2"}

Description

Submit a query in the namespace default to get the first 5 entries from the dataset, mydataset in the namespace default

Status of a Query

The status of a query is obtained using a HTTP GET request to the query's URL:

Note: This endpoint is not namespaced, as all query-handles are globally unique.

Parameter

Description

Parameter

Description

query-handle

Handle obtained when the query was submitted

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The query exists and the body contains its status

404 Not Found

The query handle does not match any current query

Comments

If the query exists, the body will contain the status of its execution and whether the query has a results set:

Status can be one of the following: INITIALIZEDRUNNINGFINISHEDCANCELEDCLOSEDERRORUNKNOWN, and PENDING.

Example

HTTP Request

GET /v3/data/explore/queries/57cf1b01-8dba-423a-a8b4-66cd29dd75e2/status

HTTP Response

{"status":"FINISHED","hasResults":true}

Description

Retrieve the status of the query in the namespace default which has the handle 57cf1b01-8dba-423a-a8b4-66cd29dd75e2

Obtaining the Result Schema

If the query's status is FINISHED and it has results, you can obtain the schema of the results:

Note: This endpoint is not namespaced, as all query-handles are globally unique.

Parameter

Description

Parameter

Description

query-handle

Handle obtained when the query was submitted

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The query was successfully received and the query schema was returned in the body

400 Bad Request

The query is not well-formed or contains an error, or the query status is not FINISHED

404 Not Found

The query handle does not match any current query

Comments

The query's result schema is returned in a JSON body as a list of columns, each given by its name, type and position; if the query has no result set, this list is empty:

The type of each column is a data type as defined in the Hive language manual.

Example

HTTP Request

GET /v3/namespaces/default/data/explore/queries/57cf1b01-8dba-423a-a8b4-66cd29dd75e2/schema

HTTP Response

[{"name":"dataset_mydataset.key","type":"array<tinyint>","position":1}, {"name":"dataset_mydataset.value","type":"array<tinyint>","position":2}]

Description

Retrieve the schema of the result of the query in the namespace default which has the handle 57cf1b01-8dba-423a-a8b4-66cd29dd75e2

Retrieving Query Results

Query results can be retrieved in batches after the query is finished, optionally specifying the batch size in the body of the request:

The body of the request can contain a JSON string specifying the batch size:

If the batch size is not specified, the default is 20.

Parameter

Description

Parameter

Description

query-handle

Handle obtained when the query was submitted

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The event was successfully received and the result of the query was returned in the body

404 Not Found

The query handle does not match any current query

Comments

The results are returned in a JSON body as a list of columns, each given as a structure containing a list of column values:

The value at each position has the type that was returned in the result schema for that position. For example, if the returned type was INT, then the value will be an integer literal, whereas for STRING or VARCHAR the value will be a string literal.

Repeat the query to retrieve subsequent results. If all results of the query have already been retrieved, then the returned list is empty.

Example

HTTP Request

POST /v3/namespaces/default/data/explore/queries/57cf1b01-8dba-423a-a8b4-66cd29dd75e2/next

HTTP Response

[

  {"columns": [10, 5]},

  {"columns": [ 20, 27]},

  {"columns": [ 50, 6]},

  {"columns": [ 90, 30]},

  {"columns": [ 95, 91]}

]

Description

Retrieve the results of the query which has the handle 57cf1b01-8dba-423a-a8b4-66cd29dd75e2

Closing a Query

The query can be closed by issuing an HTTP DELETE against its URL:

This frees all resources that are held by this query.

Parameter

Description

Parameter

Description

query-handle

Handle obtained when the query was submitted

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The query was closed

400 Bad Request

The query was not in a state that could be closed; either wait until it is finished, or cancel it

404 Not Found

The query handle does not match any current query

Example

HTTP Request

DELETE /v3/namespaces/default/data/explore/queries/57cf1b01-8dba-423a-a8b4-66cd29dd75e2

Description

Close the query in the namespace default which has the handle 57cf1b01-8dba-423a-a8b4-66cd29dd75e2

List of Queries

To return a list of queries, use:

Parameter

Description

Parameter

Description

namespace-id

Namespace ID

limit

Optional number indicating how many results to return in the response; by default, 50 results are returned

cursor

Optional string specifying if the results returned should be in the forward or reverse direction; should be one of next or prev

offset

Optional offset for pagination; returns the results that are greater than offset if the cursor is next or results that are less than offset if cursor is prev

Comments

The results are returned as a JSON array, with each element containing information about a query:

Example

HTTP Request

GET /v3/namespaces/default/data/explore/queries

HTTP Response

[ {

 "timestamp": 1411266478717,

 "statement": "SELECT * FROM dataset_mydataset",

 "status": "FINISHED",

 "query_handle": "57cf1b01-8dba-423a-a8b4-66cd29dd75e2",

 "has_results": true,

 "is_active": false

} ]

Description

Retrieves all queries

Count of Active Queries

To return the count of active queries, use:

Parameter

Description

Parameter

Description

namespace-id

Namespace ID

The results are returned in the body as a JSON string:

Download Query Results

To download the results of a query, use:

The results of the query are returned in CSV format.

Note: this endpoint is not namespaced, as all query-handles are globally unique.

Parameter

Description

Parameter

Description

query-handle

Handle obtained when the query was submitted or via a list of queries

Comments

The query results can be downloaded only once. The Microservices will return a Status Code 409 Conflict if results for the query-handle are attempted to be downloaded again.

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The HTTP call was successful

404 Not Found

The query handle does not match any current query

409 Conflict

The query results were already downloaded

Enabling and Disabling Querying

Querying (or exploring) of datasets can be enabled and disabled using these endpoints.

Exploration of data in CDAP is governed by a combination of enabling the CDAP Explore Service and then creating datasets that are explorable. The CDAP Explore Service is enabled by a setting in the CDAP configuration file (explore.enabled in cdap-site.xml file).

Datasets, that were created while the Explore Service was not enabled, can, once the service is enabled and CDAP restarted, be enabled for exploration by using these endpoints.

You can also use these endpoints to disable exploration of a specific dataset. The dataset will still be accessible programmatically; it just won't respond to queries or be available for exploration using the CDAP UI.

For datasets:

Each of these endpoints returns a query handle that can be used to submit requests tracking the status of the query.

Parameter

Description

Parameter

Description

namespace-id

Namespace ID

dataset-name

Name of the dataset

table-name

Name of the table

HTTP Responses

Status Codes

Description

Status Codes

Description

200 OK

The query execution was successfully initiated, and the body will contain the query-handle used to identify the query in subsequent requests

404 Not Found

The query is not well-formed or contains an error such as a nonexistent table name

Comments

If the request was successful, the body will contain a query handle that can be used to identify the query in subsequent requests, such as a status request:

Example

HTTP Request

POST /v3/namespaces/default/data/explore/datasets/logEventStream_converted/disable

HTTP Response

{"handle":"57cf1b01-8dba-423a-a8b4-66cd29dd75e2"}

Description

Submits a request in the namespace default to disable the dataset logEventStream_converted from being explored. The handle can be used to check the status.

 

Created in 2020 by Google Inc.