Master backoff restart needs to stop after some runs

Description

On master.service Twill app failure, master starts to backoff and try starting the Twill app again. The backoff retries should stop after 10 tries or something, and master process should exit.

Today the retry happens indefinitely, and the master never exits.

Release Notes

None

Activity

Show:

Terence Yim October 29, 2015 at 11:34 PM

The restart is intended for HA design, hence the master process should never stop itself. We provide Nagios plugin to detect CDAP services healthiness and that should be used instead to detect if the all system services functioning.

Derek Wood September 22, 2015 at 8:37 PM

Perhaps this can be a configurable option

Derek Wood September 22, 2015 at 8:35 PM

This would be very beneficial in a Cloudera Manager environment, so that the issue is surfaced to CM as a red flag for the CDAP Master instance.

Won't Fix
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Affects versions

Components

Fix versions

Priority

Created February 19, 2015 at 12:58 AM
Updated October 29, 2015 at 11:35 PM
Resolved October 29, 2015 at 11:35 PM