API Requirements

Graph

Information Provided:

List of namespaces
Start Time
End Time
Time Resolution

Information Needed:

Memory Usage over time per namespace, cluster, and max available
Core Usage over time per namespace, cluster, and max available
Bucketed over time resolution aggregate. (The aggregate, we should be able to identify pipeline vs custom app):
1. Manual start
2. Scheduled start
3. Status (RUNNING, COMPLETED, FAILED)
4. Delay between STARTING and RUNNING
If start time and end time is for future date, show the scheduled apps

Details when Graph Time Range is Clicked

Information Provided:

List of Namespace
Start Time
End Time

Information Needed:

Entity Details:
1. Namespace
2. App Name
3. Program Type
4. Program Name
5. Parent Artifact
6. Duration
7. User
8. Start Method (time schedule, trigger, manual)
9. Status

Reports View

Information Provided:

List of namespaces
List of statuses
Start Time
End Time

Information Needed:

Entity Details:
1. Namespace
2. App Name
3. Program Type
4. Program Name
5. Parent Artifact
6. Duration
7. User
8. Start Method
9. Status
10. Runtime Arguments
11. Memory Usage
12. Number of CPU
13. Number of Containers
14. Number of Log Warnings
15. Number of Log Errors
16. Number of records out
Summary Counts:
1. Runs per namespace
2. Time range
3. Pipelines (Realtime vs Batch), custom apps
4. Durations: min, max, average
5. Last Started: Oldest and Newest
6. List of users & count per user
7. List of start method & count per methods

Answered Questions:

1. For older version of CDAP that gets upgraded to 5.0.0 that doesn’t have some information (ie. program start methods, program parent artifact), those information won't be shown and will be displayed as unknown.

2. Future timeline (design should get updated, grey out the statuses and manually started in graph). Only Time trigger schedules will be displayed.

4. How should the runs list be displayed, Batch vs Realtime vs Custom Apps (collapsed by workflow? What about if the programs started outside workflow?): at the frontend users can choose to expand the custom app to show details of different programs in the app.

5. In Dashboard view, we need to limit the time window to a fixed range such as 24 hours in order to display at realtime.

6. After user selects the options and click generate report, a (Spark?) job will be launched. If the job takes less than a specific time (10 sec?) to finish, UI will directly display the report. Otherwise, UI will ask user to wait for the report. When the job finishes, a permalink will be produced and it will be only accessible by the user who generated it. If the user chooses to share the report with others, a different link will be generated that will be viewable by other users.

7. The report will only contain programs that are readable to the user who generates the report.

Action Items:

1. Feasibility of features (core & memory usage, start methods for programs): Need to modify TWILL ApplicationMaster to get containers information. For MapReduce and Spark, how to get containers info is TBD.

2. Need to clarify in the Memory Usage chart, what's the difference between Namespace(s) Usage and App Usage

3. When zooming in to resolution of an hour, can multiple hours be selected? In each row, what are Detail and Summary?

4. Is it feasible to get resource usage for each namespace?

Operations Dashboard

API Requirements

Graph

Information Provided:

Information Needed:

Details when Graph Time Range is Clicked

Information Provided:

Information Needed:

Reports View

Information Provided:

Information Needed:

Answered Questions:

Action Items: