Page Comparison

...

The context of a metric is typically enclosed into a hierarchy of contexts. For example, the Spark context is enclosed in the application context, which in turn is enclosed in the namespace context. A metric can always be queried (and aggregated) relative to any enclosing context.

System Metric	Context
All Mappers of a MapReduce	`namespace:<namespace-id> app:<app-id> mapreduce:<mapreduce-id> tasktype:m`
All Reducers of a MapReduce	`namespace:<namespace-id> app:<app-id> mapreduce:<mapreduce-id> tasktype:r`
One Run of a MapReduce	`namespace:<namespace-id> app:<app-id> mapreduce:<mapreduce-id> run:<run-id>`
One MapReduce	`namespace:<namespace-id> app:<app-id> mapreduce:<mapreduce-id>`
All MapReduce of an application	`namespace:<namespace-id> app:<app-id> mapreduce:*`
One service	`namespace:<namespace-id> app:<app-id> service:<service-id>`
All services of an application	`namespace:<namespace-id> app:<app-id> service:*`
One Spark program	`namespace:<namespace-id> app:<app-id> spark:<spark-id>`
All Spark programs of an application	`namespace:<namespace-id> app:<app-id> spark:*`
One worker	`namespace:<namespace-id> app:<app-id> worker:<worker-id>`
All workers of an application	`namespace:<namespace-id> app:<app-id> workers:*`
All components of an application	`namespace:<namespace-id> app:<app-id>`
All components of all applications	`namespace:<namespace-id> app:*`

Dataset metrics are available at the dataset level, but they can also be queried down to the worker, service, Mapper, or Reducer level:

Dataset Metric	Context
A single dataset in the context of a specific application	`namespace:<namespace-id> dataset:<dataset-id> app:<app-id>`
A single dataset	`namespace:<namespace-id> dataset:<dataset-id>`
All datasets	`namespace:<namespace-id> dataset:*`

Available System Metrics

Note: A user metric may have the same name as a system metric. They are distinguished by prepending the respective prefix when querying: user or system.

These metrics are available in a dataset context:

Dataset Metric	Description
`system.dataset.store.bytes`	Number of bytes written
`system.dataset.store.ops`	Operations (reads and writes) performed
`system.dataset.store.reads`	Read operations performed
`system.dataset.store.writes`	Write operations performed

These metrics are available in a Mappers or Reducers context (specify whether a Mapper or Reducer context is desired, as shown above):

Mappers or Reducers Metric	Description
`system.process.completion`	A number from 0 to 100 indicating the progress of the Map or Reduce phase
`system.process.entries.in`	Number of entries read in by the Map or Reduce phase
`system.process.entries.out`	Number of entries written out by the Map or Reduce phase

These metrics are available in a service context:

Service Metric	Description
`system.requests.count`	Number of requests made to the service
`system.response.successful.count`	Number of successful requests completed by the service
`system.response.server.error.count`	Number of failures seen by the service

These metrics are available in a Spark context, where <spark-id> depends on the Spark program being queried:

Spark Metric	Description
`system.<spark-id>.driver.BlockManager.disk.diskSpaceUsed_MB`	Disk space used by the Block Manager
`system.<spark-id>.driver.BlockManager.memory.maxMem_MB`	Maximum memory given to the Block Manager
`system.<spark-id>.driver.BlockManager.memory.memUsed_MB`	Memory used by the Block Manager
`system.<spark-id>.driver.BlockManager.memory.remainingMem_MB`	Memory remaining to the Block Manager
`system.<spark-id>.driver.DAGScheduler.job.activeJobs`	Number of active jobs
`system.<spark-id>.driver.DAGScheduler.job.allJobs`	Total number of jobs
`system.<spark-id>.driver.DAGScheduler.stage.failedStages`	Number of failed stages
`system.<spark-id>.driver.DAGScheduler.stage.runningStages`	Number of running stages
`system.<spark-id>.driver.DAGScheduler.stage.waitingStages`	Number of waiting stages

These metrics are available for services, for the system services component context or the user services context:

Request and Response Metric	Description
`system.request.received`	Number of requests received for the service
`system.response.successful`	Number of successful responses sent
`system.response.{server-error, client-error}`	Number of `server-error` or `client-error` responses sent

These metrics are available for every application context:

Application Logging Metric	Description
`system.app.log.{error, info, warn}`	Number of `error`, `info`, or `warn` log messages logged by an application or applications

These logging metrics are available for system services, in the system component context:

System Services Logging Metric	Description
`system.services.log.{error, info, warn}`	Number of `error`, `info`, or `warn` log messages logged by a system service or system services

These processing metrics are available for system services, in the system component context:

System Services Metric Processor Metric	Description
`metrics.<metric.processor.id>.process.count`	Number of metrics processed by metric processor instance
`metrics.<metric.processor.id>.process.delay.ms`	Metrics processing delay in milliseconds. Difference between last metric's timestamp and current time

These metrics are available for the CDAP transaction service:

Transaction Metric	Description
`system.start.{short, long}`	Number of `short` or `long` transactions started
`system.start.{short, long}.latency`	Time taken (in milliseconds) to start `short` or `long` transactions
`system.wal.append.count`	Number of transaction edits added to the write-ahead log
`system.{canCommit, commit, committed, inprogress, invalidate, abort}`	Number of transactions in a specified transaction state
`system.{canCommit, commit, committed, inprogress, invalidate, abort}.latency`	Time taken (in milliseconds) to perform a specified transaction state update
`system.{invalid, committing, committed, inprogress}.size`	Number of transactions of a specified type that are active

These metrics are available for the CDAP transactional messaging service:

Transactional Messaging System Metric	Description
`system.persist.requested`	Number of message persist requests
`system.persist.success`	Number of message persist requests succeeded
`system.persist.failure`	Number of message persist requests failed
`system.persist.queue.size`	Number of messages in the queue that are persisted in one batch
`system.cache.add.requests`	Number of entries requested to add to the messaging cache
`system.cache.entries.added`	Number of entries added to the messaging cache
`system.cache.entries.removed`	Number of entries removed from the messaging cache
`system.cache.add.reduce.weight`	Number of times that the cache reduce weight logic was executed while adding entries to the cache. This number ideally should be very small for the cache to have good performance.
`system.cache.scan.reduce.weight`	Number of times that the cache reduce weight logic was executed while scanning the cache. This number ideally should be relative small and steady over time.
`system.cache.scan.requests`	Number of scan requests on the messaging cache
`system.cache.weight`	The current weight of the cache, measured in bytes

These metrics are available for the YARN cluster resources:

YARN Cluster Metric	Description
`system.resources.{total, available, used}.memory`	Size (in megabytes) of total, available, or used cluster memory
`system.resources.{total, available, used}.vcores`	Number of total, available, or used cluster virtual cores

Searches and Queries

The process of retrieving a metric involves these steps:

...

You can also define the query to search in a given context across all values of one or more tags provided in the context by specifying * as a value for a tag. See the examples below for its use.

Parameter	Description
`context` [Optional]	Metrics context to search within. If not provided, the search is provided across all contexts. Consists of a collection of tags.

Examples

HTTP Method	`POST /v3/metrics/search?target=tag`
Returns	`[{"name":"namespace","value":"default"},{"name":"namespace","value":"system"}]`
Description	Returns all first-level tags; in this case, two namespaces.

HTTP Method	`POST /v3/metrics/search?target=tag&tag=namespace:default`
Returns	`[{"name":"app","value":"PurchaseHistory"},` {"name":"component","value":"gateway"},`` {"name":"dataset","value":"frequentCustomers"},`` {"name":"dataset","value":"history"},`` {"name":"dataset","value":"purchases"},`` {"name":"dataset","value":"userProfiles"}]
Description	Returns all tags of the of the given parent context; in this case, all entities in the default namespace.

HTTP Method	`POST /v3/metrics/search?target=tag&tag=`

`namespace:default&tag=app:PurchaseHistory&tag=run:*`
Returns	`[` `{“name”: “spark”, “value”:”PurchaseTracker”}` `]`
Description	Queries all available contexts within the PurchaseHistory for any run.

Search for Metrics

To search for the available metrics within a given context, perform an HTTP POST request:

Code Block
POST /v3/metrics/search?target=metric&tag=<context>

Parameter	Description
`context`	Metrics context to search within. Consists of a collection of tags.

Example

HTTP Method	`POST /v3/metrics/search?target=metric&tag=namespace:default&tag=app:PurchaseHistory`
Returns	`["system.process.events.in","system.process.events.processed","system.process.instance", "system.process.tuples.attempt.read","system.process.tuples.read"]`
Description	Returns all metrics in the context of the application PurchaseHistory of the default namespace; in this case, returns a list of system and (possibly) user-defined metrics.

HTTP Method	`POST /v3/metrics/search?target=metric&tag=namespace:default&tag=app:SportResults&tag=service:UploadService`
Returns	`["system.dataset.store.ops","system.dataset.store.reads","system.requests.count", "system.response.successful.count", "user.uploads.completed"]`
Description	Returns all metrics in the context of the service UploadService of the application SportResults of the default namespace; in this case, returns a list of system and user-defined metrics.

Querying a Metric

Once you know the context and the metric to query, you can formulate a request for the metrics data.

...

Code Block
POST /v3/metrics/query?tag=<context>&metric=<metric>&<time-range>[&groupBy=<tags>]

Parameter	Description
`context`	Metrics context to search within, a collection of tags
`metric`	Metric(s) being queried, a collection of metric names
`time-range`	A time range or `aggregate=true` for all since the application was deployed
`tags` (optional)	Tag list by which to group results (optional)

Query Examples

HTTP Method	`POST /v3/metrics/query?tag=namespace:default&tag=app:HelloWorld&` `&metric=system.process.events.processed&aggregate=true`
Returns	`{"startTime":0,"endTime":1429327964,"series":[{"metricName":"system.process.events.processed","grouping":{},"data":[{"time":0,"value":1}]}]}`
Description	Using a System metric, system.process.events.processed

HTTP Method	`POST /v3/metrics/query?tag=namespace:default&tag=app:HelloWorld&` `&tag=run:13ac3a50-a435-49c8-a752-83b3c1e1b9a8&metric=user.names.bytes&aggregate=true`
Returns	`{"startTime":0,"endTime":1429328212,"series":[{"metricName":"user.names.bytes","grouping":{},"data":[{"time":0,"value":8}]}]}`
Description	Querying the User-defined metric names.bytes by its run-ID

HTTP Method	`POST /v3/metrics/query?tag=namespace:default&tag=app:HelloWorld&metric=user.names.bytes`
Returns	`{"startTime":0,"endTime":1429475995,"series":[]}`
Description	Using a User-defined metric, names.bytes in a service's Handler, called before any data entered, returning an empty series

HTTP Method	`POST /v3/metrics/query?tag=namespace:default&tag=app:HelloWorld&&metric=user.names.bytes`
Returns	`{"startTime":0,"endTime":1429477901,"series":[{"metricName":"user.names.bytes","grouping":{},"data":[{"time":0,"value":44}]}]}`
Description	Using a User-defined metric, names.bytes in a service's Handler

Query Results

Results from a query are returned as a JSON string, in the format:

Code Block
{"startTime":<start-time>, "endTime":<end-time>, "series":<series-array>}

Name	Description
`start-time`	Start time, in seconds, with 0 being from the beginning of the query records
`metric`	End time, in seconds
`series-array`	An array of metric results, which can be one series, a multiple time series, or none (an empty array)

If a particular metric has no value, a query will return an empty array in the "series" of the results, such as:

...

In a query, the optional groupBy parameter defines a list of tags whose values are used to build multiple time series. All data points that have the same values in tags specified in the groupBy parameter will form a single time series. You can define multiple tags for grouping by providing a list, similar to a tag combination list.

Tag List	Description
`groupBy=app`	Retrieves the time series for each application
`groupBy=app&groupBy=spark`	Retrieves a time series for each App and Spark combination.

An example method (re-formatted to fit):

...

By default, queries without a time range retrieve a value based on aggregate=true.

Parameter	Description
`aggregate=true`	Total aggregated value for the metric since the application was deployed. If the metric is a gauge type, the aggregate will return the latest value set for the metric.
`start=<time>&end=<time>`	Time range defined by start and end times, where the times are either in seconds since the start of the Epoch, or a relative time, using `now` and times added to it.
`count=<count>`	Number of time intervals since start with length of time interval defined by resolution. If `count=60` and `resolution=1s`, the time range would be 60 seconds in length.
`resolution=[1s\|1m\|1h\|auto]`	Time resolution in seconds, minutes or hours; or if "auto", one of `{1s, 1m, 1h}` is used based on the time difference.

With a specific time range, a resolution can be included to retrieve a series of data points for a metric. By default, 1 second resolution is used. Acceptable values are noted above. If resolution=auto, the resolution will be determined based on a time difference calculated between the start and end times:

(endTime - startTime) > 36000 seconds (ten hours), resolution will be 1 hour;
(endTime - startTime) > 600 seconds (ten minutes), resolution will be 1 minute;
otherwise, resolution will be 1 second.

Time Range	Description
`start=now-30s&end=now`	The last 30 seconds. The start time is given in seconds relative to the current time. You can apply simple math, using `now` for the current time, `s` for seconds, `m` for minutes, `h` for hours and `d` for days. For example: `now-5d-12h` is 5 days and 12 hours ago.
`start=1385625600&` `end=1385629200`	From `Thu, 28 Nov 2013 08:00:00 GMT` to `Thu, 28 Nov 2013 09:00:00 GMT`, both given as since the start of the Epoch.
`start=1385625600&` `count=3600&` `resolution=1s`	The same as before, the count given as a number of time intervals, each 1 second.
`start=1385625600&` `end=1385629200&` `resolution=1m`	From `Thu, 28 Nov 2013 08:00:00 GMT` to `Thu, 28 Nov 2013 09:00:00 GMT`, with 1 minute resolution, will return 61 data points with metrics aggregated for each minute.
`start=1385625600&` `end=1385632800&` `resolution=1h`	From `Thu, 28 Nov 2013 08:00:00 GMT` to `Thu, 28 Nov 2013 10:00:00 GMT`, with 1 hour resolution, will return 3 data points with metrics aggregated for each hour.

Example:

Code Block
POST /v3/metrics/query?tag=namespace:default&tag=app:CountRandom& metric=system.process.events.processed&start=now-1h&end=now&resolution=1m

...

Versions Compared

Old Version 27

New Version 28

Key