Workflow remains 'RUNNING' even if yarn app is killed or MR job completes succesfully

Description

I hit this endpoint:
/v2/apps/cdap-conversion/workflows/StreamConversionWorkflow/runs

It returns:
{"runid":"2ba7f996-0690-4a14-bed1-808c2f8f671d","start":1421541780,"status":"RUNNING"}

This is 5 minutes after I killed the yarn application manually. I expect the status to be something other than RUNNING. Perhaps 'FAILED' or 'KILLED'.

A related issue is that a workflow remains 'RUNNING', even if the Mapreduce job completes on its own (without having to kill the MR job).

Release Notes

None

Activity

Show:

Henry Saputra July 10, 2015 at 12:46 AM

Lets keep it close again since it is close as duplicate. If needs to be reopen then we should work on the https://cdap.atlassian.net/browse/CDAP-1900#icft=CDAP-1900

Henry Saputra July 10, 2015 at 12:44 AM

But is it the same runnable though? If it is in STARTING means that a new run has been made, the cleaner only deal with rogue RUNNING attempt before.

Chris Gianelloni July 10, 2015 at 12:40 AM

The programs in the RUNNING state remained running. The ones in the STARTING state went to KILLED in YARN.

This is CDAP 3.0.1

Henry Saputra July 9, 2015 at 11:53 PM
Edited

Which version of CDAP is this? Yes, is correct, in 3.0.1 there is a cleanup damon that cleans up invalid RUNNING run records after CDAP master is restarted.

Due to the nature of distributed systems there are some delay between the
That the the phenomenon that saw.

Ali Anwar July 9, 2015 at 11:39 PM

In regards to the timing thing, there's a daemon process that goes around and checks if the RunRecords for with status==RUNNING, checks if they really are running. So probably, it went to FAILED, once that daemon process go around to checking.
knows more about it.

My question is: the programs stopped running once cdap master was restarted? Is this expected behavior?

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Affects versions

Components

Fix versions

Due date

Priority

Created January 18, 2015 at 1:44 AM
Updated July 10, 2015 at 8:19 PM
Resolved July 10, 2015 at 8:19 PM