Master backoff restart needs to stop after some runs

Description

On master.service Twill app failure, master starts to backoff and try starting the Twill app again. The backoff retries should stop after 10 tries or something, and master process should exit.

Today the retry happens indefinitely, and the master never exits.

Release Notes

None

Activity

Show:

Terence Yim October 29, 2015 at 11:34 PM

The restart is intended for HA design, hence the master process should never stop itself. We provide Nagios plugin to detect CDAP services healthiness and that should be used instead to detect if the all system services functioning.