...
- As a developer of the Workflow, I want the ability to specify that the particular dataset used in the Workflow is local for the particular run, so that Workflow system can clean it up after the run completes. (CDAP-3969)
- As a developer of the Workflow, I should be able to specify whether the local datasets created by the Workflow run should be deleted or not after the Workflow run finishes. This way I can do some debugging on the them once the Workflow run is failed. (CDAP-3969)
Terence: If it is retained, how the user gain access and interact with those local datasets? As per the definition, local datasets "should be hidden from the normal list dataset calls and only visible from the Workflow run level UI page for exploring/debugging purpose". [Sagar: When dataset is created in the Workflow driver, it can add a special property say .dataset.workflow.internal to the dataset. While listing the datasets, we can filter out the datasets which have this property, so that the list dataset call does not show them. On the Workflow run level UI page, we can explicitly call the "/data/datasets/localdatasetname" so that it can be explored further given that the dataset itself is explorable. Also local dataset can be accessed from the other applications in the same namespace since the name of the local dataset would simply be datasetName.<workflow_run_id>.] - I want the ability to delete the local datasets generated for the particular Workflow run. (CDAP-3969)
I should be able to specify whether to keep the local dataset even after the Workflow run is finished. (CDAP-3969)
Andreas: This seems identical to 2. [Sagar: Oh yes. Striking it.]- As a developer of the Workflow, I want ability to specify the functionality(such as sending an email) that will get executed when the Workflow finishes successfully. (CDAP-4075)
- As a developer of the Workflow, I want ability to specify the functionality(such as sending an email) that will get executed when the Workflow fails at any point in time. I want access to the cause of the failure and the node at which the workflow failed. (CDAP-4075)
- As a developer of the Workflow, if the workflows fails, I want the ability instruct the workflow system to not delete the local datasets, for triage purposes. [Sagar: This is good point. Will add the design for it.]
...
- Datasets local to the Workflow should not be listed in the list dataset API. This will simply hide the local datasets from the user. However there is still possibility for the user to get access to the local datasets and use them inside the other applications.
Andreas: How? And what do you mean by "inside the applications"? Other applications? How will they do that? [Sagar: How? - Need to figure that out. possible way is through some scoping. User can always query for the REST endpoint which returns the local datasets associated with the particular Workflow run and can use them in the application, right?]
Sagar: When dataset is created in the Workflow driver, it can add a special property say .dataset.workflow.internal to the dataset. While listing the datasets, we can filter out the datasets which have this property, so that the list dataset call does not show them. On the Workflow run level UI page, we can explicitly call the "/data/datasets/localdatasetname" so that it can be explored further given that the dataset itself is explorable. Also local dataset can be accessed from the other applications in the same namespace since the name of the local dataset would simply be datasetName.<workflow_run_id> API to list the local datasets if they are available.
Code Block GET <base-url>/namespaces/{namespace}/apps/{app-id}/workflows/{workflow-id}/runs/{run-id}/localdatasets
We will need a way to delete the local datasets created for a particular Workflow run if user set the dataset.*.keep.local for that run.
Code Block DELETE <base-url>/namespaces/{namespace}/apps/{app-id}/workflows/{workflow-id}/runs/{run-id}/localdatasets
...