Metrics Microservices
Use the CDAP Metrics Microservices to retrieve the metrics created and saved by CDAP.
As applications process data, CDAP collects metrics about the application’s behavior and performance. Some of these metrics are similar for every application, such as how many events are processed and how many data operations are performed, and are called system or CDAP metrics.
Other metrics are user-defined and differ from application to application.
All methods or endpoints described in this API have a base URL (typically http://<host>:11015
or https://<host>:10443
) that precedes the resource identifier, as described in the Microservices Conventions. These methods return a status code, as listed in the Microservices Status Codes.
Metrics Data
Metrics data is identified by a combination of context and name.
A metrics context consists of a collection of tags. Each tag is composed of a tag name and a tag value.
Metrics contexts are hierarchal, rooted in the CDAP instance, and extend through namespaces, applications, and down to the individual components.
For example, the metrics context:
namespace:default app:PurchaseHistory spark:PurchaseTracker
is a context that identifies a Spark program. It has a parent context, namespace:default app:PurchaseHistory
, which identifies the parent application.
Each level of the context is described by a pair, composed of a tag name and a value, such as:
namespace:default
(tag name: namespace, value: default)app:PurchaseHistory
(tag name: app, value: PurchaseHistory)spark:PurchaseTracker
(tag name: spark, value: PurchaseTracker)
A metrics name is either a name generated by CDAP, and pre-pended with system
, or is a name set by a developer when writing an application, which are pre-pended with user
.
The system metrics vary depending on the context; a list is available of common system metrics for different contexts.
User metrics are defined by the application developer and thus are completely dependent on what the developer sets.
In both cases, searches using this API show, for a given context, all available metrics.
Available Contexts
The context of a metric is typically enclosed into a hierarchy of contexts. For example, the Spark context is enclosed in the application context, which in turn is enclosed in the namespace context. A metric can always be queried (and aggregated) relative to any enclosing context.
System Metric | Context |
---|---|
All Mappers of a MapReduce |
|
All Reducers of a MapReduce |
|
One Run of a MapReduce |
|
One MapReduce |
|
All MapReduce of an application |
|
One service |
|
All services of an application |
|
One Spark program |
|
All Spark programs of an application |
|
One worker |
|
All workers of an application |
|
All components of an application |
|
All components of all applications |
|
Dataset metrics are available at the dataset level, but they can also be queried down to the worker, service, Mapper, or Reducer level:
Dataset Metric | Context |
---|---|
A single dataset in the context of a specific application |
|
A single dataset |
|
All datasets |
|
Available System Metrics
Note: A user metric may have the same name as a system metric. They are distinguished by prepending the respective prefix when querying: user
or system
.
Dataset Metrics
These metrics are available in a dataset context:
Dataset Metric | Description |
---|---|
| Number of bytes written |
| Operations (reads and writes) performed |
| Read operations performed |
| Write operations performed |
Mappers or Reducer Metrics
These metrics are available in a Mappers or Reducers context (specify whether a Mapper or Reducer context is desired, as shown above):
Mappers or Reducers Metrics | Description |
---|---|
| A number from 0 to 100 indicating the progress of the Map or Reduce phase |
| Number of entries read in by the Map or Reduce phase |
| Number of entries written out by the Map or Reduce phase |
Service Metrics
These metrics are available in a service context:
Service Metrics | Description |
---|---|
| Number of requests made to the service |
| Number of successful requests completed by the service |
| Number of failures seen by the service |
Spark Metrics
These metrics are available in a Spark context, where <spark-id>
depends on the Spark program being queried:
Spark Metrics | Description |
---|---|
| Disk space used by the Block Manager |
| Maximum memory given to the Block Manager |
| Memory used by the Block Manager |
| Memory remaining to the Block Manager |
| Number of active jobs |
| Total number of jobs |
| Number of failed stages |
| Number of running stages |
| Number of waiting stages |
Request and Response Metrics
These metrics are available for services, for the system services component context or the user services context:
Request and Response Metrics | Description |
---|---|
| Number of requests received for the service |
| Number of successful responses sent |
| Number of |
Application Logging Metrics
These metrics are available for every application context:
Application Logging Metrics | Description |
---|---|
| Number of |
System Services Logging Metrics
These logging metrics are available for system services, in the system component context:
System Services Logging Metrics | Description |
---|---|
| Number of |
System Services Metric Processor Metrics
These processing metrics are available for system services, in the system component context:
System Services Metric Processor Metrics | Description |
---|---|
| Number of metrics processed by metric processor instance |
| Metrics processing delay in milliseconds. Difference between last metric's timestamp and current time |
Transaction Metrics
These metrics are available for the CDAP transaction service:
Transaction Metrics | Description |
---|---|
| Number of |
| Time taken (in milliseconds) to start |
| Number of transaction edits added to the write-ahead log |
| Number of transactions in a specified transaction state |
| Time taken (in milliseconds) to perform a specified transaction state update |
| Number of transactions of a specified type that are active |
Transactional Messaging System Metrics
These metrics are available for the CDAP transactional messaging service:
Transactional Messaging System Metrics | Description |
---|---|
| Number of message persist requests |
| Number of message persist requests succeeded |
| Number of message persist requests failed |
| Number of messages in the queue that are persisted in one batch |
| Number of entries requested to add to the messaging cache |
| Number of entries added to the messaging cache |
| Number of entries removed from the messaging cache |
| Number of times that the cache reduce weight logic was executed while adding entries to the cache. This number ideally should be very small for the cache to have good performance. |
| Number of times that the cache reduce weight logic was executed while scanning the cache. This number ideally should be relative small and steady over time. |
| Number of scan requests on the messaging cache |
| The current weight of the cache, measured in bytes |
YARN Cluster Metrics
These metrics are available for the YARN cluster resources:
YARN Cluster Metrics | Description |
---|---|
| Size (in megabytes) of total, available, or used cluster memory |
| Number of total, available, or used cluster virtual cores |
CDAP Program Metrics
These metrics are available for CDAP programs:
Program Metrics | Version Introduced | Description |
---|---|---|
|
| Measures time taken by program to transition from provisioning to starting |
| 6.6.0 | Measures time taken by program to transition from provisioning to running |
|
| Measures time taken by program to transition from provisioning to any complete state |
|
| Number of successful program runs |
|
| Number of failed programs runs |
|
| Number of killed program runs |
|
| Number of rejected program runs |
| 6.6.0 | Number of top-level program launching requests in the system |
| 6.6.0 | Number of running top-level programs in the system |
User Metrics
These metrics are available for pipeline connections:
Pipeline Connection Metrics
Pipeline Connection Metrics | Version Introduced | Description |
---|---|---|
| 6.7.0 | Number of create connection requests. |
| 6.7.0 | Number of delete connection requests. |
| 6.7.0 | Number of get connection requests. |
| 6.7.0 | Number of browse connection requests. |
| 6.7.0 | Number of sample connection requests. |
| 6.7.0 | Number of specification connection requests. |
| 6.7.0 | Number of upload file connection requests. |
Query Tips
Global Count Example
POST v3/metrics/query?target=metric&metric=user.connections.count
Group By connection Type Example
POST v3/metrics/query?target=metric&metric=user.connections.count&groupBy=tpe
Searches and Queries
The process of retrieving a metric involves these steps:
Obtain (usually through a search) the correct context for a metric;
Obtain (usually through a search within the context) the available metrics;
Querying for a specific metric, supplying the context and any parameters.
Search for Contexts
To search for the available contexts, perform an HTTP request:
POST /v3/metrics/search?target=tag[&tag=<context>]
The optional <context>
defines a metrics context to search within. If it is not provided, the search is performed across all data. The available contexts that are returned can be used to query for a lower-level of contexts.
You can also define the query to search in a given context across all values of one or more tags provided in the context by specifying *
as a value for a tag. See the examples below for its use.
Parameter | Description |
---|---|
| Metrics context to search within. If not provided, the search is provided across all contexts. Consists of a collection of tags. |
Examples
HTTP Method |
|
---|---|
Returns |
|
Description | Returns all first-level tags; in this case, two namespaces. |
|
|
HTTP Method |
|
Returns |
|
Description | Returns all tags of the of the given parent context; in this case, all entities in the default namespace. |
|
|
HTTP Method |
|
Returns |
|
Description | Queries all available contexts within the PurchaseHistory for any run. |
Search for Metrics
To search for the available metrics within a given context, perform an HTTP POST request:
POST /v3/metrics/search?target=metric&tag=<context>
Parameter | Description |
---|---|
| Metrics context to search within. Consists of a collection of tags. |
Example
HTTP Method |
|
---|---|
Returns |
|
Description | Returns all metrics in the context of the application PurchaseHistory of the default namespace; in this case, returns a list of system and (possibly) user-defined metrics. |
|
|
HTTP Method |
|
Returns |
|
Description | Returns all metrics in the context of the service UploadService of the application SportResults of the default namespace; in this case, returns a list of system and user-defined metrics. |
Querying a Metric
Once you know the context and the metric to query, you can formulate a request for the metrics data.
In general, a metrics query is performed by making an HTTP POST request, with parameters supplied either in the URL or in the body of the request. If you submit the parameters in the body, you can make multiple queries with a single request.
Metric parameters include:
tag values for filtering by context;
metric names (multiple metric names can be queried in each request);
time range or
aggregate=true
for an aggregated result; andtag values for grouping results (optional)
To query a metric within a given context, perform an HTTP POST request:
Parameter | Description |
---|---|
| Metrics context to search within, a collection of tags |
| Metric(s) being queried, a collection of metric names |
| A time range or |
| Tag list by which to group results (optional) |
Query Examples
HTTP Method |
|
---|---|
Returns |
|
Description | Using a System metric, system.process.events.processed |
|
|
HTTP Method |
|
Returns |
|
Description | Querying the User-defined metric names.bytes by its run-ID |
|
|
HTTP Method |
|
Returns |
|
Description | Using a User-defined metric, names.bytes in a service's Handler, called before any data entered, returning an empty series |
|
|
HTTP Method |
|
Returns |
|
Description | Using a User-defined metric, names.bytes in a service's Handler |
Query Results
Results from a query are returned as a JSON string, in the format:
Name |
---|