Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Overview

...

In DefaultStore, each time when a program run stops, the start time and the stop time of the program run will be recorded in the row key in the index table with row key <namespace-id>:<inverted-start-time>:<stop-time>:<program-run-id>

Read path:

Given a query for active runs from a given namespace <query-namespace> within the time range [start, end), the scanning will be started from the start row key:  <query-namespace>:0 since we don't know what is the earliest start time of the active runs within the given time range [start, end), and stop at the stop row key <query-namespace>:<end>. A possible optimization is discussed in the next section for getting a larger start row key with the earliest start time of the runs in a given time range. After getting the runs with program_start_time < end, a row key filter will be applied on these runs to get runs with program_stop_time > begin. Then corresponding run records will be read from the AppMetaStore with these run id's.

...

In the cloud environment, there is a requirement to show the node hour. This requirement can be satisfied by emitting metrics heartbeat to TMS for active runs periodically. With this approach, we can get metrics that are similar to the records in the ActiveProgramRunHistoryTable defined in Approach 2. The read path of this approach will also be almost the same as in Approach 2, except the range of reading around the query time range will A subscriber will persist the messages to a table. Since the longest query time range for Ops dashboard will be 7 days, the table will have TTL a little longer than 7 days, such as one month. The content of the table can either include run ID's or copies of program run records. Saving copies of program run records can eliminate the need for a separate query for run records form AppMetadataStore, but with long running programs, this can occupy more spaces.

Row key design:

<timestamp-rounded-to-hour(alternatively, rounded to date)>:<invertedStartTime>:<stopTime>:<namespace>:<program-run>

The table will be salted to avoid hotspot in region server. 

The read path of this approach will be almost the same as in Approach 2, except the range of reading around the query time range will depend on the period of such metrics is emitted. For instance, if the metric is emitted every 5 30 minutes, and the query time range is 7:30am - 9pm, then scanning range will be 7:25am 00am - 9am.


Performance Experiment with Approach 1 optimization VS. no optimization

...