We're updating the issue view to help you get more done. 

Provide option to do aggregation in metrics API

Description

Problem:

Today we have metrics endpoint used in UI without any resolution (plugin metrics in pipeline). This results in a lot of data points, for say realtime pipelines, that the UI cannot visualize in a meaningful way. 

Suggestion

If the metrics endpoint could accept additional inputs, say number of datapoints or aggregates based on number of data points automatically then UI gets back fixed number of data points(say 20 max?) for visualization. This provides a better trend of incoming and outgoing data for plugins in a pipeline.

Release Notes

None

Activity

Show:
Yaojie Feng
June 27, 2019, 11:06 PM

After some discussion, the following change to the metrics API will be made: 

The existing query param "count" and "aggregate" will be used to specify more aggregation options. The "count" will work as a limit, which tells the metrics query the number of data points expected to return. The "aggregate" was previously a flag, when set to true means query from the total resolution table. Two more aggregation options, SUM and LATEST will be added and the supported options are:

TRUE: if the aggregation is set to TRUE, the metrics query will always use the total resolution table for result,
which means the start and end time are ignored, and the number of data point for any metric name and tags will
be 1.
FALSE: if the aggregation is set to FALSE, the metrics query will not do any aggregation on the data points. The
resolution will be determined based on the start and end time, the number of data points returned will be
based on the count specified in the query. If the number of data points > count, the latest data points will
be returned. Else, all the data points will be returned. If no count is specified, all the data points will
be returned.
SUM: if the aggregation is set to SUM, the metrics query will partition the number of metrics data points to the
approximate count depending on whether there is a remainder. Each partition will get aggregated to the sum of
the interval. If no count is specified or count is greater than or equal to the number of data points,
all the data points will be returned. For example, if the metrics query result has 100 data points, and
if the count is 10, then each 10 of the data points will be aggregated to one single point based on the
aggregation option. If there are 100 points and the count is 9, then each 11 of the data points will be
aggregated to one single point, and the rest one data point will also become one single point,
so the result will have 10 data points.
LATEST: if the aggregation is set to LATEST, the metrics query will behave similar to SUM. The only difference is
that the data points in the partitioned interval will get aggregated to the last data point.

Andreas Neumann
June 27, 2019, 11:33 PM

One possible improvement: If N data points are requested and K data points exist in the given resolution, and K is not divisible by N (e.g. 100 and 9), then your above approach would return N+1. This is because of the remainder in the division. If round up instead of down, then each "bucket" is slightly larger, and the number returns remains within the limit of what was requested. 

Example:

  • 10 requested, 50 data points found: 50/10 = 5 values in each bucket, 10 returned

  • 10 requested, 55 data points found 55/10 = 5.5, round up to get 6 values in each bucket, 10 returned (last one with only one data point).

I think that would be a little nicer in terms of API semantics. 

Other than that, looks good.

Yaojie Feng
June 28, 2019, 1:52 AM

I think doing this round up or down will not satisfy all use cases.
For example, if the number of data points is 100, and count is 70, then the partition size is 2(round up), which will only give 50 data points back.
For round down, if number of data points is 100, and count is 70, we will return 100 data points using entirely round down.
I think the correct logic should be as follows: Depending on whether there is a remainder R, the first R partitions will have 1 more data point aggregated. Assuming the partition size(round down) is X, the first R partitions will consume (X + 1)* R data points, and the rest (K - R) partitions will consume X * (K - R), which in total is (X + 1)* R + X * (K - R) = X * R + R + X * K - X * R = X * K + R = N.  
For example, If there are 100 points and the count is 8, since remainder is 4, first 4 intervals will have 100 / 8 + 1 = 13 data points aggregated, the rest 4 intervals will have 12 data points aggregated, 8 data points will get returned.

Yaojie Feng
July 2, 2019, 8:00 PM

We have reached an agreement on how to deal with the remainder:

If there is a remainder R, the first R data points will be discarded.

For example, if the metrics query result has 100 data points, and if the count is 10, then each 10 of the data points will be aggregated to one single point based on the aggregation option. If there are 100 points and the count is 8, since remainder is 4, first 4 data points will be discarded. The buckets will have 100 / 8 = 12 data points aggregated, with 96 data points in total.

Yaojie Feng
July 2, 2019, 10:37 PM

Assignee

Yaojie Feng

Reporter

Ajai Narayanan

Labels

Docs Impact

None

UX Impact

None

Components

Fix versions

Priority

Major
Configure