...
The context of a metric is typically enclosed into a hierarchy of contexts. For example, the Spark context is enclosed in the application context, which in turn is enclosed in the namespace context. A metric can always be queried (and aggregated) relative to any enclosing context.
System Metric | Context |
---|---|
All Mappers of a MapReduce |
|
All Reducers of a MapReduce |
|
One Run of a MapReduce |
|
One MapReduce |
|
All MapReduce of an application |
|
One service |
|
All services of an application |
|
One Spark program |
|
All Spark programs of an application |
|
One worker |
|
All workers of an application |
|
All components of an application |
|
All components of all applications |
|
Dataset metrics are available at the dataset level, but they can also be queried down to the worker, service, Mapper, or Reducer level:
Dataset Metric | Context |
---|---|
A single dataset in the context of a specific application |
|
A single dataset |
|
All datasets |
|
Available System Metrics
Note: A user metric may have the same name as a system metric. They are distinguished by prepending the respective prefix when querying: user
or system
.
These metrics are available in a dataset context:
Dataset Metric | Description |
---|---|
| Number of bytes written |
| Operations (reads and writes) performed |
| Read operations performed |
| Write operations performed |
These metrics are available in a Mappers or Reducers context (specify whether a Mapper or Reducer context is desired, as shown above):
Mappers or Reducers Metric | Description |
---|---|
| A number from 0 to 100 indicating the progress of the Map or Reduce phase |
| Number of entries read in by the Map or Reduce phase |
| Number of entries written out by the Map or Reduce phase |
These metrics are available in a service context:
Service Metric | Description |
---|---|
| Number of requests made to the service |
| Number of successful requests completed by the service |
| Number of failures seen by the service |
These metrics are available in a Spark context, where <spark-id>
depends on the Spark program being queried:
Spark Metric | Description |
---|---|
| Disk space used by the Block Manager |
| Maximum memory given to the Block Manager |
| Memory used by the Block Manager |
| Memory remaining to the Block Manager |
| Number of active jobs |
| Total number of jobs |
| Number of failed stages |
| Number of running stages |
| Number of waiting stages |
These metrics are available for services, for the system services component context or the user services context:
Request and Response Metric | Description |
---|---|
| Number of requests received for the service |
| Number of successful responses sent |
| Number of |
These metrics are available for every application context:
Application Logging Metric | Description |
---|---|
| Number of |
These logging metrics are available for system services, in the system component context:
System Services Logging Metric | Description |
---|---|
| Number of |
These processing metrics are available for system services, in the system component context:
System Services Metric Processor Metric | Description |
---|---|
| Number of metrics processed by metric processor instance |
| Metrics processing delay in milliseconds. Difference between last metric's timestamp and current time |
These metrics are available for the CDAP transaction service:
Transaction Metric | Description |
---|---|
| Number of |
| Time taken (in milliseconds) to start |
| Number of transaction edits added to the write-ahead log |
| Number of transactions in a specified transaction state |
| Time taken (in milliseconds) to perform a specified transaction state update |
| Number of transactions of a specified type that are active |
These metrics are available for the CDAP transactional messaging service:
Transactional Messaging System Metric | Description |
---|---|
| Number of message persist requests |
| Number of message persist requests succeeded |
| Number of message persist requests failed |
| Number of messages in the queue that are persisted in one batch |
| Number of entries requested to add to the messaging cache |
| Number of entries added to the messaging cache |
| Number of entries removed from the messaging cache |
| Number of times that the cache reduce weight logic was executed while adding entries to the cache. This number ideally should be very small for the cache to have good performance. |
| Number of times that the cache reduce weight logic was executed while scanning the cache. This number ideally should be relative small and steady over time. |
| Number of scan requests on the messaging cache |
| The current weight of the cache, measured in bytes |
These metrics are available for the YARN cluster resources:
YARN Cluster Metric | Description |
---|---|
| Size (in megabytes) of total, available, or used cluster memory |
| Number of total, available, or used cluster virtual cores |
Searches and Queries
The process of retrieving a metric involves these steps:
...
You can also define the query to search in a given context across all values of one or more tags provided in the context by specifying *
as a value for a tag. See the examples below for its use.
Parameter | Description |
---|---|
| Metrics context to search within. If not provided, the search is provided across all contexts. Consists of a collection of tags. |
Examples
HTTP Method |
|
---|---|
Returns |
|
Description | Returns all first-level tags; in this case, two namespaces. |
|
|
HTTP Method |
|
Returns |
|
Description | Returns all tags of the of the given parent context; in this case, all entities in the default namespace. |
HTTP Method |
|
| |
Returns |
|
---|---|
Description | Queries all available contexts within the PurchaseHistory for any run. |
Search for Metrics
To search for the available metrics within a given context, perform an HTTP POST request:
Code Block |
---|
POST /v3/metrics/search?target=metric&tag=<context> |
Parameter | Description |
---|---|
| Metrics context to search within. Consists of a collection of tags. |
Example
HTTP Method |
|
---|---|
Returns |
|
Description | Returns all metrics in the context of the application PurchaseHistory of the default namespace; in this case, returns a list of system and (possibly) user-defined metrics. |
|
|
HTTP Method |
|
Returns |
|
Description | Returns all metrics in the context of the service UploadService of the application SportResults of the default namespace; in this case, returns a list of system and user-defined metrics. |
Querying a Metric
Once you know the context and the metric to query, you can formulate a request for the metrics data.
...
Code Block |
---|
POST /v3/metrics/query?tag=<context>&metric=<metric>&<time-range>[&groupBy=<tags>] |
Parameter | Description |
---|---|
| Metrics context to search within, a collection of tags |
| Metric(s) being queried, a collection of metric names |
| A time range or |
| Tag list by which to group results (optional) |
Query Examples
HTTP Method |
|
---|---|
Returns |
|
Description | Using a System metric, system.process.events.processed |
|
|
HTTP Method |
|
Returns |
|
Description | Querying the User-defined metric names.bytes by its run-ID |
|
|
HTTP Method |
|
Returns |
|
Description | Using a User-defined metric, names.bytes in a service's Handler, called before any data entered, returning an empty series |
|
|
HTTP Method |
|
Returns |
|
Description | Using a User-defined metric, names.bytes in a service's Handler |
Query Results
Results from a query are returned as a JSON string, in the format:
Code Block |
---|
{"startTime":<start-time>, "endTime":<end-time>, "series":<series-array>} |
Name | Description |
---|---|
| Start time, in seconds, with 0 being from the beginning of the query records |
| End time, in seconds |
| An array of metric results, which can be one series, a multiple time series, or none (an empty array) |
If a particular metric has no value, a query will return an empty array in the "series"
of the results, such as:
...
In a query, the optional groupBy
parameter defines a list of tags whose values are used to build multiple time series. All data points that have the same values in tags specified in the groupBy
parameter will form a single time series. You can define multiple tags for grouping by providing a list, similar to a tag combination list.
Tag List | Description |
---|---|
| Retrieves the time series for each application |
| Retrieves a time series for each App and Spark combination. |
An example method (re-formatted to fit):
...
By default, queries without a time range retrieve a value based on aggregate=true
.
Parameter | Description |
---|---|
| Total aggregated value for the metric since the application was deployed. If the metric is a gauge type, the aggregate will return the latest value set for the metric. |
| Time range defined by start and end times, where the times are either in seconds since the start of the Epoch, or a relative time, using |
| Number of time intervals since start with length of time interval defined by resolution. If |
| Time resolution in seconds, minutes or hours; or if "auto", one of |
With a specific time range, a resolution
can be included to retrieve a series of data points for a metric. By default, 1 second resolution is used. Acceptable values are noted above. If resolution=auto
, the resolution will be determined based on a time difference calculated between the start and end times:
(endTime - startTime) > 36000 seconds
(ten hours), resolution will be 1 hour;(endTime - startTime) > 600 seconds
(ten minutes), resolution will be 1 minute;otherwise, resolution will be 1 second.
Time Range | Description |
---|---|
| The last 30 seconds. The start time is given in seconds relative to the current time. You can apply simple math, using |
| From |
| The same as before, the count given as a number of time intervals, each 1 second. |
| From |
| From |
Example:
Code Block |
---|
POST /v3/metrics/query?tag=namespace:default&tag=app:CountRandom& metric=system.process.events.processed&start=now-1h&end=now&resolution=1m |
...