We're updating the issue view to help you get more done. 

Spark programs cannot be stopped after master failover/restart

Description

To reproduce:

Start a spark program (realtime Hydrator pipeline works)

Restart cdap-master

Stop the spark program

The CDAP program status will be stopped, but the yarn containers will still be running. I believe the cause is that DistributedProgramRuntimeService does not create a controller for spark programs discovered from the TwillRunner.

Release Notes

Fixes cases where Spark programs cannot be started after a master failover or restart.

Activity

Show:
Terence Yim
November 9, 2016, 10:10 PM

Seems like it was never there, even after we introduced the first Spark integration.

Albert Shau
November 9, 2016, 10:39 PM
Terence Yim
November 10, 2016, 7:26 PM

During the process for fixing it, I also discovered that the permgen usage will grow linearly to the number of active Spark program runs, which is undesirable.
Including this fix to 3.5 and 3.6 since it will be critical for anyone who try to runs Spark in production.

Terence Yim
November 10, 2016, 8:37 PM
Terence Yim
November 11, 2016, 5:35 AM
Fixed

Assignee

Terence Yim

Reporter

Albert Shau

Labels

None

Docs Impact

None

UX Impact

None

Components

Fix versions

Priority

Critical
Configure