...
- As a developer of the Workflow, I want the ability to specify that the output of the particular program (MapReduce/Spark) in the Workflow is temporary, so that Workflow system can clean it up. (CDAP-3969)
- As a developer of the MapReduce or Spark program, I should be able to run it on its own as well as inside the Workflow with its output specified as temporary. When the MapReduce is run on its own, the output should not be cleaned up. (CDAP-3969)
- MapReduce program can output to multiple datasets. As a developer of the Workflow, I want the ability to selectively specify some of the output datasets of the MapReduce program as transient. (CDAP-3969).
- I should be able to make the transient dataset as non-transient for the particular run of the Workflow. (CDAP-3969)
- I should be able to make the non-transient dataset as a transient for the particular run of the Workflow. (CDAP-3969)
- As a developer of the Workflow, I want ability to specify the functionality that will get executed when the Workflow finishes successfully. (CDAP-4075)
- As a developer of the Workflow, I want ability to specify the functionality that will get executed when the Workflow fails at any point in time. I want access to the cause of the failure and the node at which the workflow failed. (CDAP-4075)
Approach for CDAP - 3969 (WIP)
Consider again the Workflow mentioned in the use case above.
...