program status goes from stopped to running to stopped
Activity
Show:

Chengfeng Mao September 7, 2017 at 7:04 PMEdited
Merged: https://github.com/caskdata/cdap/pull/9342
Fixed since program status REST API's all return statuses from run records

Sreevatsan Raman November 19, 2016 at 1:23 AM
This will be fixed when we use state updates using tx messaging.

Albert Shau October 19, 2016 at 5:12 PM
I'm not sure why status is kept in 2 places, run records and in-memory in the ProgramRuntimeService. Seems like a great way to encourage race conditions like this.
Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Details
Assignee

Reporter

Components
Priority
Created October 19, 2016 at 4:51 PM
Updated September 7, 2017 at 8:19 PM
Resolved September 7, 2017 at 7:04 PM
I've encountered this in unit tests where I do something like:
and then get an exception that the program can't be started because it's already running. On closer look, it appears that ProgramLifecycleServier.getExistingAppProgramStatus() will return a status of stopped, then some time later return a status of running, then transition back to stopped. On faster machines it is hard to reproduce, and on really slow machines it is hard to reproduce.
The first 'stopped' comes when there is still runtime info about the program, with its state as 'killed', and status as 'stopped'. This transitions to 'running' when the runtime info is removed. In that scenario, if the program type is mapreduce or spark, and there are runs of status 'running', the method returns that the program is 'running'. This logic is supposed to handle the case when the program is part of a workflow, but doesn't handle this race condition. The status transitions back to 'stopped' when the runs have been updated to not be 'running'.