Master backoff restart needs to stop after some runs
Description
On master.service Twill app failure, master starts to backoff and try starting the Twill app again. The backoff retries should stop after 10 tries or something, and master process should exit.
Today the retry happens indefinitely, and the master never exits.
Release Notes
None
Activity
Show:
Terence Yim October 29, 2015 at 11:34 PM
The restart is intended for HA design, hence the master process should never stop itself. We provide Nagios plugin to detect CDAP services healthiness and that should be used instead to detect if the all system services functioning.
Derek Wood September 22, 2015 at 8:37 PM
Perhaps this can be a configurable option
Derek Wood September 22, 2015 at 8:35 PM
This would be very beneficial in a Cloudera Manager environment, so that the issue is surfaced to CM as a red flag for the CDAP Master instance.
Won't Fix
Pinned fields
Click on the next to a field label to start pinning.
On master.service Twill app failure, master starts to backoff and try starting the Twill app again. The backoff retries should stop after 10 tries or something, and master process should exit.
Today the retry happens indefinitely, and the master never exits.