Query Microservices (Deprecated)
Warning: This topic is no longer supported.
Use the CDAP Query Microservices to submit SQL-like queries over datasets. Queries are processed asynchronously; to obtain query results, perform these steps:
first, submit the query;
then poll for the query's status until it is finished;
once finished, retrieve the result schema and the results;
finally, close the query to free the resources that it holds.
Additional details on querying can be found in the Data Exploration.
All methods or endpoints described in this API have a base URL (typically http://<host>:11015
or https://<host>:10443
) that precedes the resource identifier, as described in the Microservices Conventions. These methods return a status code, as listed in the Microservices Status Codes.
Submitting a Query
To submit a SQL query, post the query string to the queries
URL:
POST /v3/namespaces/<namespace-id>/data/explore/queries
Parameter | Description |
---|---|
| Namespace ID |
The body of the request must contain a JSON string of the form:
{
"query": "<SQL-query-string>"
}
where SQL-query-string
is the actual SQL query. If you are running a version of Hive that uses reserved keywords, and a column in your query is a Hive reserved keyword, you must enclose the column name in backticks.
For example:
{
"query": "select `date` from stream_events"
}
HTTP Responses
Status Codes | Description |
---|---|
| The query execution was successfully initiated, and the body will contain the query-handle used to identify the query in subsequent requests |
| The query is not well-formed or contains an error, such as a nonexistent table name |
Comments
If the query execution was successfully initiated, the body of the response will contain a handle that can be used to identify the query in subsequent requests:
Example
HTTP Request |
|
---|---|
HTTP Body |
|
HTTP Response |
|
Description | Submit a query in the namespace default to get the first 5 entries from the dataset, mydataset in the namespace default |
Status of a Query
The status of a query is obtained using a HTTP GET request to the query's URL:
Note: This endpoint is not namespaced, as all query-handles are globally unique.
Parameter | Description |
---|---|
| Handle obtained when the query was submitted |
HTTP Responses
Status Codes | Description |
---|---|
| The query exists and the body contains its status |
| The query handle does not match any current query |
Comments
If the query exists, the body will contain the status of its execution and whether the query has a results set:
Status can be one of the following: INITIALIZED
, RUNNING
, FINISHED
, CANCELED
, CLOSED
, ERROR
, UNKNOWN
, and PENDING
.
Example
HTTP Request |
|
---|---|
HTTP Response |
|
Description | Retrieve the status of the query in the namespace default which has the handle |
Obtaining the Result Schema
If the query's status is FINISHED
and it has results, you can obtain the schema of the results:
Note: This endpoint is not namespaced, as all query-handles are globally unique.
Parameter | Description |
---|---|
| Handle obtained when the query was submitted |
HTTP Responses
Status Codes | Description |
---|---|
| The query was successfully received and the query schema was returned in the body |
| The query is not well-formed or contains an error, or the query status is not |
| The query handle does not match any current query |
Comments
The query's result schema is returned in a JSON body as a list of columns, each given by its name, type and position; if the query has no result set, this list is empty:
The type of each column is a data type as defined in the Hive language manual.
Example
HTTP Request |
|
---|---|
HTTP Response |
|
Description | Retrieve the schema of the result of the query in the namespace default which has the handle |
Retrieving Query Results
Query results can be retrieved in batches after the query is finished, optionally specifying the batch size in the body of the request:
The body of the request can contain a JSON string specifying the batch size:
If the batch size is not specified, the default is 20.
Parameter | Description |
---|---|
| Handle obtained when the query was submitted |
HTTP Responses
Status Codes | Description |
---|---|
| The event was successfully received and the result of the query was returned in the body |
| The query handle does not match any current query |
Comments
The results are returned in a JSON body as a list of columns, each given as a structure containing a list of column values:
The value at each position has the type that was returned in the result schema for that position. For example, if the returned type was INT
, then the value will be an integer literal, whereas for STRING
or VARCHAR
the value will be a string literal.
Repeat the query to retrieve subsequent results. If all results of the query have already been retrieved, then the returned list is empty.
Example
HTTP Request |
|
---|---|
HTTP Response |
|
Description | Retrieve the results of the query which has the handle 57cf1b01-8dba-423a-a8b4-66cd29dd75e2 |
Closing a Query
The query can be closed by issuing an HTTP DELETE against its URL:
This frees all resources that are held by this query.
Parameter | Description |
---|---|
| Handle obtained when the query was submitted |
HTTP Responses
Status Codes | Description |
---|---|
| The query was closed |
| The query was not in a state that could be closed; either wait until it is finished, or cancel it |
| The query handle does not match any current query |
Example
HTTP Request |
|
---|---|
Description | Close the query in the namespace default which has the handle |
List of Queries
To return a list of queries, use:
Parameter | Description |
---|---|
| Namespace ID |
| Optional number indicating how many results to return in the response; by default, 50 results are returned |
| Optional string specifying if the results returned should be in the forward or reverse direction; should be one of |
| Optional offset for pagination; returns the results that are greater than offset if the cursor is |
Comments
The results are returned as a JSON array, with each element containing information about a query:
Example
HTTP Request |
|
---|---|
HTTP Response |
|
Description | Retrieves all queries |
Count of Active Queries
To return the count of active queries, use:
Parameter | Description |
---|---|
| Namespace ID |
The results are returned in the body as a JSON string:
Download Query Results
To download the results of a query, use:
The results of the query are returned in CSV format.
Note: this endpoint is not namespaced, as all query-handles are globally unique.
Parameter | Description |
---|---|
| Handle obtained when the query was submitted or via a list of queries |
Comments
The query results can be downloaded only once. The Microservices will return a Status Code 409 Conflict
if results for the query-handle
are attempted to be downloaded again.
HTTP Responses
Status Codes | Description |
---|---|
| The HTTP call was successful |
| The query handle does not match any current query |
| The query results were already downloaded |
Enabling and Disabling Querying
Querying (or exploring) of datasets can be enabled and disabled using these endpoints.
Exploration of data in CDAP is governed by a combination of enabling the CDAP Explore Service and then creating datasets that are explorable. The CDAP Explore Service is enabled by a setting in the CDAP configuration file (explore.enabled
in cdap-site.xml
file).
Datasets, that were created while the Explore Service was not enabled, can, once the service is enabled and CDAP restarted, be enabled for exploration by using these endpoints.
You can also use these endpoints to disable exploration of a specific dataset. The dataset will still be accessible programmatically; it just won't respond to queries or be available for exploration using the CDAP UI.
For datasets:
Each of these endpoints returns a query handle that can be used to submit requests tracking the status of the query.
Parameter | Description |
---|---|
| Namespace ID |
| Name of the dataset |
| Name of the table |
HTTP Responses
Status Codes | Description |
---|---|
| The query execution was successfully initiated, and the body will contain the query-handle used to identify the query in subsequent requests |
| The query is not well-formed or contains an error such as a nonexistent table name |
Comments
If the request was successful, the body will contain a query handle that can be used to identify the query in subsequent requests, such as a status request:
Example
HTTP Request |
|
---|---|
HTTP Response |
|
Description | Submits a request in the namespace default to disable the dataset logEventStream_converted from being explored. The handle can be used to check the status. |
Created in 2020 by Google Inc.